Moses-support Digest, Vol 142, Issue 21

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Fwd: Different translations are obtained from the same
decoder without alignment information (Ergun Bicici)

----------------------------------------------------------------------

Message: 1
Date: Sun, 26 Aug 2018 23:13:06 +0300
From: Ergun Bicici <bicici@gmail.com>
Subject: Re: [Moses-support] Fwd: Different translations are obtained
from the same decoder without alignment information
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAB59qTP_nQPLZDpuGBe32OMUjoaUkMO2BRLpqn_21kQQqsJQig@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Hieu,

Thank you very much. An issue is that using "--mark-unknown
--unknown-word-prefix UNK" changes casing of text. Example:

1) input: UNK_" the greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business ,
UNK_" said the mayor .
output: UNK_" the greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business ,
UNK_" said the mayor .
2) input: " the greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business , "
said the mayor .
output: " The greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business , "
said the mayor .

I also found out that for de-en, I was using a different language model,
which was decreasing the scores. I used EMS for all experiments before but
made the system skip some parts. Apparently some change in the data paths
caused the language model files for another experiment to be used.

I obtained all translations again and now the scores match. The gain from
additional truecasing step also disappeared. Checking the results further.

Thank you very much for your help.

Regards,
Ergun

On Fri, Aug 24, 2018 at 5:31 PM Hieu Hoang <hieuhoang@gmail.com> wrote:

> could you run with alignments, but WITHOUT -unknown-word-prefix UNK.
>
> alignments shouldn't change the translation but the OOV prefix may do
>
> Hieu Hoang
> http://statmt.org/hieu
>
>
> On Fri, 24 Aug 2018 at 15:29, Ergun Bicici <bicici@gmail.com> wrote:
>
>>
>> ok, thank you. I'll upload and send you a link.
>>
>> On Fri, Aug 24, 2018 at 5:27 PM Hieu Hoang <hieuhoang@gmail.com> wrote:
>>
>>> that would be a bug.
>>>
>>> could you please make the model and input files available for download.
>>> I'll check it out
>>>
>>> Hieu Hoang
>>> http://statmt.org/hieu
>>>
>>>
>>> On Fri, 24 Aug 2018 at 15:15, Ergun Bicici <bicici@gmail.com> wrote:
>>>
>>>>
>>>> only the evaluation decoding steps are repeated that are steps 10, 9,
>>>> and 7 in the following steps in EMS output:
>>>> 48 TRAINING:consolidate -> re-using (1)
>>>> 47 TRAINING:prepare-data -> re-using (1)
>>>> 46 TRAINING:run-giza -> re-using (1)
>>>> 45 TRAINING:run-giza-inverse -> re-using (1)
>>>> 44 TRAINING:symmetrize-giza -> re-using (1)
>>>> 43 TRAINING:build-lex-trans -> re-using (1)
>>>> 40 TRAINING:build-osm -> re-using (1)
>>>> 39 TRAINING:extract-phrases -> re-using (1)
>>>> 38 TRAINING:build-reordering -> re-using (1)
>>>> 37 TRAINING:build-ttable -> re-using (1)
>>>> 34 TRAINING:create-config -> re-using (1)
>>>> 28 TUNING:truecase-input -> re-using (1)
>>>> 24 TUNING:truecase-reference -> re-using (1)
>>>> 21 TUNING:filter -> re-using (1)
>>>> 20 TUNING:apply-filter -> re-using (1)
>>>> 19 TUNING:tune -> re-using (1)
>>>> 18 TUNING:apply-weights -> re-using (1)
>>>> 15 EVALUATION:test:truecase-input -> re-using (1)
>>>> 12 EVALUATION:test:filter -> re-using (1)
>>>> 11 EVALUATION:test:apply-filter -> re-using (1)
>>>>
>>>>
>>>>
>>>> *10 EVALUATION:test:decode -> run 9 EVALUATION:test:remove-markup ->
>>>> run 7 EVALUATION:test:detruecase-output -> run *3
>>>> EVALUATION:test:multi-bleu-c -> run
>>>> 2 EVALUATION:test:analysis-coverage -> re-using (1)
>>>> 1 EVALUATION:test:analysis-precision -> run
>>>>
>>>>
>>>> On Fri, Aug 24, 2018 at 4:39 PM Hieu Hoang <hieuhoang@gmail.com> wrote:
>>>>
>>>>> are you rerunning tuning for each case? Or are you using exactly the
>>>>> same moses.ini file for the with and with alignment experiments?
>>>>>
>>>>> Hieu Hoang
>>>>> http://statmt.org/hieu
>>>>>
>>>>>
>>>>> On Fri, 24 Aug 2018 at 14:34, Ergun Bicici <bicici@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Dear Moses maintainers,
>>>>>>
>>>>>> I discovered that the translations obtained differ when alignment
>>>>>> flags (--mark-unknown --unknown-word-prefix UNK --print-alignment-inf)
>>>>>> are used. Comparison table is attached (en-ru and ru-en are being
>>>>>> recomputed). We expect them to be the same since alignment flags only print
>>>>>> additional information and they are not supposed to alter decoding. In
>>>>>> both, the same EMS system was re-run with the alignment information flags
>>>>>> or not.
>>>>>>
>>>>>> - Average of the absolute difference is 0.0094 BLEU (about 1 BLEU
>>>>>> points).
>>>>>> - Average of the difference is 0.0051 BLEU (about 0.5 BLEU
>>>>>> points, results are better with alignment flags).
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> /opt/Programs/SMT/moses/mosesdecoder/bin/moses --version
>>>>>>
>>>>>> Moses code version (git tag or commit hash):
>>>>>> mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
>>>>>> Libraries used:
>>>>>> Boost version 1.62.0
>>>>>>
>>>>>> git status
>>>>>> On branch RELEASE-4.0
>>>>>> Your branch is up to date with 'origin/RELEASE-4.0'.
>>>>>>
>>>>>>
>>>>>> Note: Using alignment information to recase tokens was tried in [1]
>>>>>> for en-fi and en-tr to claim positive results. We tried this method in all
>>>>>> translation directions we considered as as can be seen in the align row,
>>>>>> this only improves the performance for tr-en and en-tr and for tr-en Moses
>>>>>> provides better translations without the alignment flags.
>>>>>> [1]The JHU Machine Translation Systems for WMT 2016
>>>>>> Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt Post
>>>>>> http://www.statmt.org/wmt16/pdf/W16-2310.pdf
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Ergun
>>>>>>
>>>>>> Ergun Bi?ici
>>>>>> http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> Moses-support@mit.edu
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> Ergun
>>>>
>>>>
>>>>
>>
>> --
>>
>> Regards,
>> Ergun
>>
>>
>>

--

Regards,
Ergun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20180826/d5507130/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 59618 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20180826/d5507130/attachment.png

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 142, Issue 21
**********************************************

Moses-support Digest, Vol 142, Issue 21

0 Response to "Moses-support Digest, Vol 142, Issue 21"

Post a Comment