Moses-support Digest, Vol 142, Issue 18

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Fwd: Different translations are obtained from the same
decoder without alignment information (Tom Hoar)
2. Re: Fwd: Different translations are obtained from the same
decoder without alignment information (Ergun Bicici)


----------------------------------------------------------------------

Message: 1
Date: Fri, 24 Aug 2018 21:49:02 +0700
From: Tom Hoar <tahoar@slate.rocks>
Subject: Re: [Moses-support] Fwd: Different translations are obtained
from the same decoder without alignment information
To: moses-support@mit.edu
Message-ID: <de9d8c67-0cc1-ed85-dedb-7b4c4e7cace3@slate.rocks>
Content-Type: text/plain; charset="utf-8"

I remember 3 years ago, I reported a similar (same?) problem with
--print-alignment-inf flag, without EMS. The time, I was using the
legacy binarized translation and reordering table and everything was
great. Then, I started testing the compact binarized format. The flag
caused translations to change and some were even lost (blank lines). No
one on the support list knew of any reason and I didn't have bandwidth
to troubleshoot. Instead, I continued using the legacy binarized files.
Maybe try changing to the legacy binarized files and see if the problem
disappears. This could help you narrow-down where to look.


Best regards,
Tom Hoar
*Slate Rocks, LLC*
Web: https://www.slate.rocks
Thailand Mobile: +66 87 345-1875 <tel:+66873451875>
Skype: tahoar <skype:tahoar?call>

On 8/24/2018 9:31 PM, moses-support-request@mit.edu wrote:
> Date: Fri, 24 Aug 2018 15:31:14 +0100
> From: Hieu Hoang<hieuhoang@gmail.com>
> Subject: Re: [Moses-support] Fwd: Different translations are obtained
> from the same decoder without alignment information
> To: Ergun Bicici<bicici@gmail.com>
> Cc: moses-support<moses-support@mit.edu>
> Message-ID:
> <CAEKMkbhwykyPzSQDSL-WcgLQwjsyDEaxbGvntKBpC17e7ZutYw@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> could you run with alignments, but WITHOUT -unknown-word-prefix UNK.
>
> alignments shouldn't change the translation but the OOV prefix may do
>
> Hieu Hoang
> http://statmt.org/hieu
>
>
> On Fri, 24 Aug 2018 at 15:29, Ergun Bicici<bicici@gmail.com> wrote:
>
>> ok, thank you. I'll upload and send you a link.
>>
>> On Fri, Aug 24, 2018 at 5:27 PM Hieu Hoang<hieuhoang@gmail.com> wrote:
>>
>>> that would be a bug.
>>>
>>> could you please make the model and input files available for download.
>>> I'll check it out
>>>
>>> Hieu Hoang
>>> http://statmt.org/hieu
>>>
>>>
>>> On Fri, 24 Aug 2018 at 15:15, Ergun Bicici<bicici@gmail.com> wrote:
>>>
>>>> only the evaluation decoding steps are repeated that are steps 10, 9,
>>>> and 7 in the following steps in EMS output:
>>>> 48 TRAINING:consolidate -> re-using (1)
>>>> 47 TRAINING:prepare-data -> re-using (1)
>>>> 46 TRAINING:run-giza -> re-using (1)
>>>> 45 TRAINING:run-giza-inverse -> re-using (1)
>>>> 44 TRAINING:symmetrize-giza -> re-using (1)
>>>> 43 TRAINING:build-lex-trans -> re-using (1)
>>>> 40 TRAINING:build-osm -> re-using (1)
>>>> 39 TRAINING:extract-phrases -> re-using (1)
>>>> 38 TRAINING:build-reordering -> re-using (1)
>>>> 37 TRAINING:build-ttable -> re-using (1)
>>>> 34 TRAINING:create-config -> re-using (1)
>>>> 28 TUNING:truecase-input -> re-using (1)
>>>> 24 TUNING:truecase-reference -> re-using (1)
>>>> 21 TUNING:filter -> re-using (1)
>>>> 20 TUNING:apply-filter -> re-using (1)
>>>> 19 TUNING:tune -> re-using (1)
>>>> 18 TUNING:apply-weights -> re-using (1)
>>>> 15 EVALUATION:test:truecase-input -> re-using (1)
>>>> 12 EVALUATION:test:filter -> re-using (1)
>>>> 11 EVALUATION:test:apply-filter -> re-using (1)
>>>>
>>>>
>>>>
>>>> *10 EVALUATION:test:decode -> run 9 EVALUATION:test:remove-markup ->
>>>> run 7 EVALUATION:test:detruecase-output -> run *3
>>>> EVALUATION:test:multi-bleu-c -> run
>>>> 2 EVALUATION:test:analysis-coverage -> re-using (1)
>>>> 1 EVALUATION:test:analysis-precision -> run
>>>>
>>>>
>>>> On Fri, Aug 24, 2018 at 4:39 PM Hieu Hoang<hieuhoang@gmail.com> wrote:
>>>>
>>>>> are you rerunning tuning for each case? Or are you using exactly the
>>>>> same moses.ini file for the with and with alignment experiments?
>>>>>
>>>>> Hieu Hoang
>>>>> http://statmt.org/hieu
>>>>>
>>>>>
>>>>> On Fri, 24 Aug 2018 at 14:34, Ergun Bicici<bicici@gmail.com> wrote:
>>>>>
>>>>>> Dear Moses maintainers,
>>>>>>
>>>>>> I discovered that the translations obtained differ when alignment
>>>>>> flags (--mark-unknown --unknown-word-prefix UNK --print-alignment-inf)
>>>>>> are used. Comparison table is attached (en-ru and ru-en are being
>>>>>> recomputed). We expect them to be the same since alignment flags only print
>>>>>> additional information and they are not supposed to alter decoding. In
>>>>>> both, the same EMS system was re-run with the alignment information flags
>>>>>> or not.
>>>>>>
>>>>>> - Average of the absolute difference is 0.0094 BLEU (about 1 BLEU
>>>>>> points).
>>>>>> - Average of the difference is 0.0051 BLEU (about 0.5 BLEU points,
>>>>>> results are better with alignment flags).
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> /opt/Programs/SMT/moses/mosesdecoder/bin/moses --version
>>>>>>
>>>>>> Moses code version (git tag or commit hash):
>>>>>> mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
>>>>>> Libraries used:
>>>>>> Boost version 1.62.0
>>>>>>
>>>>>> git status
>>>>>> On branch RELEASE-4.0
>>>>>> Your branch is up to date with 'origin/RELEASE-4.0'.
>>>>>>
>>>>>>
>>>>>> Note: Using alignment information to recase tokens was tried in [1]
>>>>>> for en-fi and en-tr to claim positive results. We tried this method in all
>>>>>> translation directions we considered as as can be seen in the align row,
>>>>>> this only improves the performance for tr-en and en-tr and for tr-en Moses
>>>>>> provides better translations without the alignment flags.
>>>>>> [1]The JHU Machine Translation Systems for WMT 2016
>>>>>> Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt Post
>>>>>> http://www.statmt.org/wmt16/pdf/W16-2310.pdf
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Ergun
>>>>>>
>>>>>> Ergun Bi?ici
>>>>>> http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> Moses-support@mit.edu
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>> --
>>>>
>>>> Regards,
>>>> Ergun
>>>>
>>>>
>>>>
>> --
>>
>> Regards,
>> Ergun
>>
>>
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/2bd1c008/attachment.html
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: image.png
> Type: image/png
> Size: 59618 bytes
> Desc: not available
> Url :http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/2bd1c008/attachment.png

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/7f0b6f9c/attachment-0001.html

------------------------------

Message: 2
Date: Fri, 24 Aug 2018 18:40:30 +0300
From: Ergun Bicici <bicici@gmail.com>
Subject: Re: [Moses-support] Fwd: Different translations are obtained
from the same decoder without alignment information
To: tahoar@slate.rocks
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAB59qTM9vNw8VvA5vb61x-iazsijreVp_abwUDQVBfUw=YorUg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Tom,

Thank you for sharing your finding. This does not apply in this case since
I re-compiled the code to build the initial Moses 4.0 model. Then moses
binary is not changed and even though I am observing different scores, they
are better when the alignment flags are included. I am waiting for de-en
results with "-print-alignment-info" flag.

I tried to debug some decentralized Moses server-client model before that
was encountering similar symptoms where the error could source from
additional sources such as the network being interrupted, issues with the
syncing of buffers etc. With a binarized version you get a translation, but
the translation options are somewhat fixed. Could Moses provide a better
translation? Turns out that truecasing before detruecasing improves the
scores by 0.002 BLEU for instance on average of 8 translation directions in
WMT18.

Regards,
Ergun
bicici.github.com

On Fri, Aug 24, 2018 at 5:55 PM Tom Hoar <tahoar@slate.rocks> wrote:

> I remember 3 years ago, I reported a similar (same?) problem with
> --print-alignment-inf flag, without EMS. The time, I was using the legacy
> binarized translation and reordering table and everything was great. Then,
> I started testing the compact binarized format. The flag caused
> translations to change and some were even lost (blank lines). No one on the
> support list knew of any reason and I didn't have bandwidth to
> troubleshoot. Instead, I continued using the legacy binarized files. Maybe
> try changing to the legacy binarized files and see if the problem
> disappears. This could help you narrow-down where to look.
>
> Best regards,
> Tom Hoar
> *Slate Rocks, LLC*
> Web: https://www.slate.rocks
> Thailand Mobile: +66 87 345-1875 <+66873451875>
> Skype: tahoar
>
> On 8/24/2018 9:31 PM, moses-support-request@mit.edu wrote:
>
> Date: Fri, 24 Aug 2018 15:31:14 +0100
> From: Hieu Hoang <hieuhoang@gmail.com> <hieuhoang@gmail.com>
> Subject: Re: [Moses-support] Fwd: Different translations are obtained
> from the same decoder without alignment information
> To: Ergun Bicici <bicici@gmail.com> <bicici@gmail.com>
> Cc: moses-support <moses-support@mit.edu> <moses-support@mit.edu>
> Message-ID:
> <CAEKMkbhwykyPzSQDSL-WcgLQwjsyDEaxbGvntKBpC17e7ZutYw@mail.gmail.com> <CAEKMkbhwykyPzSQDSL-WcgLQwjsyDEaxbGvntKBpC17e7ZutYw@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> could you run with alignments, but WITHOUT -unknown-word-prefix UNK.
>
> alignments shouldn't change the translation but the OOV prefix may do
>
> Hieu Hoanghttp://statmt.org/hieu
>
>
> On Fri, 24 Aug 2018 at 15:29, Ergun Bicici <bicici@gmail.com> <bicici@gmail.com> wrote:
>
>
> ok, thank you. I'll upload and send you a link.
>
> On Fri, Aug 24, 2018 at 5:27 PM Hieu Hoang <hieuhoang@gmail.com> <hieuhoang@gmail.com> wrote:
>
>
> that would be a bug.
>
> could you please make the model and input files available for download.
> I'll check it out
>
> Hieu Hoanghttp://statmt.org/hieu
>
>
> On Fri, 24 Aug 2018 at 15:15, Ergun Bicici <bicici@gmail.com> <bicici@gmail.com> wrote:
>
>
> only the evaluation decoding steps are repeated that are steps 10, 9,
> and 7 in the following steps in EMS output:
> 48 TRAINING:consolidate -> re-using (1)
> 47 TRAINING:prepare-data -> re-using (1)
> 46 TRAINING:run-giza -> re-using (1)
> 45 TRAINING:run-giza-inverse -> re-using (1)
> 44 TRAINING:symmetrize-giza -> re-using (1)
> 43 TRAINING:build-lex-trans -> re-using (1)
> 40 TRAINING:build-osm -> re-using (1)
> 39 TRAINING:extract-phrases -> re-using (1)
> 38 TRAINING:build-reordering -> re-using (1)
> 37 TRAINING:build-ttable -> re-using (1)
> 34 TRAINING:create-config -> re-using (1)
> 28 TUNING:truecase-input -> re-using (1)
> 24 TUNING:truecase-reference -> re-using (1)
> 21 TUNING:filter -> re-using (1)
> 20 TUNING:apply-filter -> re-using (1)
> 19 TUNING:tune -> re-using (1)
> 18 TUNING:apply-weights -> re-using (1)
> 15 EVALUATION:test:truecase-input -> re-using (1)
> 12 EVALUATION:test:filter -> re-using (1)
> 11 EVALUATION:test:apply-filter -> re-using (1)
>
>
>
> *10 EVALUATION:test:decode -> run 9 EVALUATION:test:remove-markup ->
> run 7 EVALUATION:test:detruecase-output -> run *3
> EVALUATION:test:multi-bleu-c -> run
> 2 EVALUATION:test:analysis-coverage -> re-using (1)
> 1 EVALUATION:test:analysis-precision -> run
>
>
> On Fri, Aug 24, 2018 at 4:39 PM Hieu Hoang <hieuhoang@gmail.com> <hieuhoang@gmail.com> wrote:
>
>
> are you rerunning tuning for each case? Or are you using exactly the
> same moses.ini file for the with and with alignment experiments?
>
> Hieu Hoanghttp://statmt.org/hieu
>
>
> On Fri, 24 Aug 2018 at 14:34, Ergun Bicici <bicici@gmail.com> <bicici@gmail.com> wrote:
>
>
> Dear Moses maintainers,
>
> I discovered that the translations obtained differ when alignment
> flags (--mark-unknown --unknown-word-prefix UNK --print-alignment-inf)
> are used. Comparison table is attached (en-ru and ru-en are being
> recomputed). We expect them to be the same since alignment flags only print
> additional information and they are not supposed to alter decoding. In
> both, the same EMS system was re-run with the alignment information flags
> or not.
>
> - Average of the absolute difference is 0.0094 BLEU (about 1 BLEU
> points).
> - Average of the difference is 0.0051 BLEU (about 0.5 BLEU points,
> results are better with alignment flags).
>
> ?
>
> /opt/Programs/SMT/moses/mosesdecoder/bin/moses --version
>
> Moses code version (git tag or commit hash):
> mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
> Libraries used:
> Boost version 1.62.0
>
> git status
> On branch RELEASE-4.0
> Your branch is up to date with 'origin/RELEASE-4.0'.
>
>
> Note: Using alignment information to recase tokens was tried in [1]
> for en-fi and en-tr to claim positive results. We tried this method in all
> translation directions we considered as as can be seen in the align row,
> this only improves the performance for tr-en and en-tr and for tr-en Moses
> provides better translations without the alignment flags.
> [1]The JHU Machine Translation Systems for WMT 2016
> Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt Posthttp://www.statmt.org/wmt16/pdf/W16-2310.pdf
>
>
> Best Regards,
> Ergun
>
> Ergun Bi?icihttp://bicici.github.com/ <http://ergunbicici.blogspot.com/> <http://ergunbicici.blogspot.com/>
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
>
> Regards,
> Ergun
>
>
>
>
> --
>
> Regards,
> Ergun
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/2bd1c008/attachment.html
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: image.png
> Type: image/png
> Size: 59618 bytes
> Desc: not available
> Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/2bd1c008/attachment.png
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


--

Regards,
Ergun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/efad8d80/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 142, Issue 18
**********************************************

0 Response to "Moses-support Digest, Vol 142, Issue 18"

Post a Comment