Moses-support Digest, Vol 104, Issue 72

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: BLEU Score Variance: Which score to use? (Hokage Sama)
2. Re: BLEU Score Variance: Which score to use?
(Marcin Junczys-Dowmunt)
3. Re: BLEU Score Variance: Which score to use? (Hokage Sama)


----------------------------------------------------------------------

Message: 1
Date: Mon, 22 Jun 2015 01:12:22 -0500
From: Hokage Sama <nvncbol@gmail.com>
Subject: Re: [Moses-support] BLEU Score Variance: Which score to use?
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAD3ogMbzebEw_CS0-WnVXTDDrxd3tN=n8yJNbBKvE-pOvv9cMQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Thanks Marcin. Its for a new resource-poor language so I only trained it
with what I could collect so far (i.e. only 190,630 words of parallel
data). I retrained the entire system each time without any tuning.

On 22 June 2015 at 01:00, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:

> Hi,
> I think the average is OK, your variance is however quite high. Did you
> retrain the entire system or just optimize parameters a couple of times?
>
> Two useful papers on the topic:
>
> https://www.cs.cmu.edu/~jhclark/pubs/significance.pdf
> http://www.mt-archive.info/MTS-2011-Cettolo.pdf
>
>
> On 22.06.2015 02:37, Hokage Sama wrote:
> > Hi,
> >
> > Since MT training is non-convex and thus the BLEU score varies, which
> > score should I use for my system? I trained my system three times
> > using the same data and obtained the three different scores below.
> > Should I take the average or the best score?
> >
> > BLEU = 17.84, 49.1/22.0/12.5/7.5 (BP=1.000, ratio=1.095, hyp_len=3952,
> > ref_len=3609)
> > BLEU = 16.51, 48.4/20.7/11.4/6.5 (BP=1.000, ratio=1.093, hyp_len=3945,
> > ref_len=3609)
> > BLEU = 15.33, 48.2/20.1/10.3/5.5 (BP=1.000, ratio=1.087, hyp_len=3924,
> > ref_len=3609)
> >
> > Thanks,
> > Hilton
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150622/27578a01/attachment-0001.htm

------------------------------

Message: 2
Date: Mon, 22 Jun 2015 08:20:50 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] BLEU Score Variance: Which score to use?
To: Hokage Sama <nvncbol@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <5587A942.8010706@amu.edu.pl>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hm. That's interesting. The language should not matter.

1) Do not report results without tuning. They are meaningless. There is
a whole thread on that, look for "Major bug found in Moses". If you
ignore the trollish aspects it contains may good descriptions why this
is a mistake.

2) Assuming it was the same data every time (was it?), without tuning
however I do not quite see where the variance is coming from. This
rather suggests you have something weird in your pipeline. Mgiza is the
only stochastic element there, but usually its results are quite
consistent. For the same weights in your ini-file you should have very
similar results. Tuning would be the part that introduces instability,
but even then these differences would be a little on the extreme end,
though possible.

On 22.06.2015 08:12, Hokage Sama wrote:
> Thanks Marcin. Its for a new resource-poor language so I only trained
> it with what I could collect so far (i.e. only 190,630 words of
> parallel data). I retrained the entire system each time without any
> tuning.
>
> On 22 June 2015 at 01:00, Marcin Junczys-Dowmunt <junczys@amu.edu.pl
> <mailto:junczys@amu.edu.pl>> wrote:
>
> Hi,
> I think the average is OK, your variance is however quite high.
> Did you
> retrain the entire system or just optimize parameters a couple of
> times?
>
> Two useful papers on the topic:
>
> https://www.cs.cmu.edu/~jhclark/pubs/significance.pdf
> <https://www.cs.cmu.edu/%7Ejhclark/pubs/significance.pdf>
> http://www.mt-archive.info/MTS-2011-Cettolo.pdf
>
>
> On 22.06.2015 02 <tel:22.06.2015%2002>:37, Hokage Sama wrote:
> > Hi,
> >
> > Since MT training is non-convex and thus the BLEU score varies,
> which
> > score should I use for my system? I trained my system three times
> > using the same data and obtained the three different scores below.
> > Should I take the average or the best score?
> >
> > BLEU = 17.84, 49.1/22.0/12.5/7.5 (BP=1.000, ratio=1.095,
> hyp_len=3952,
> > ref_len=3609)
> > BLEU = 16.51, 48.4/20.7/11.4/6.5 (BP=1.000, ratio=1.093,
> hyp_len=3945,
> > ref_len=3609)
> > BLEU = 15.33, 48.2/20.1/10.3/5.5 (BP=1.000, ratio=1.087,
> hyp_len=3924,
> > ref_len=3609)
> >
> > Thanks,
> > Hilton
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



------------------------------

Message: 3
Date: Mon, 22 Jun 2015 02:34:52 -0500
From: Hokage Sama <nvncbol@gmail.com>
Subject: Re: [Moses-support] BLEU Score Variance: Which score to use?
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAD3ogMaP7gQP0pO7QfRui56LkJpzhHSUCKybQXVQ4Aj7TfEeuA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Wow that was a long read. Still reading though :) but I see that tuning is
essential. I am fairly new to Moses so could you please check if the
commands I ran were correct (minus the tuning part). I just modified the
commands on the Moses website for building a baseline system. Below are the
commands I ran. My training files are "compilation.en" and "compilation.sm".
My test files are "test.en" and "test.sm".

~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en <
~/corpus/training/compilation.en > ~/corpus/compilation.tok.en
~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l sm < ~/corpus/training/
compilation.sm > ~/corpus/compilation.tok.sm
~/mosesdecoder/scripts/recaser/train-truecaser.perl --model
~/corpus/truecase-model.en --corpus ~/corpus/compilation.tok.en
~/mosesdecoder/scripts/recaser/train-truecaser.perl --model ~/corpus/
truecase-model.sm --corpus ~/corpus/compilation.tok.sm
~/mosesdecoder/scripts/recaser/truecase.perl --model
~/corpus/truecase-model.en < ~/corpus/compilation.tok.en >
~/corpus/compilation.true.en
~/mosesdecoder/scripts/recaser/truecase.perl --model ~/corpus/
truecase-model.sm < ~/corpus/compilation.tok.sm > ~/corpus/
compilation.true.sm
~/mosesdecoder/scripts/training/clean-corpus-n.perl
~/corpus/compilation.true sm en ~/corpus/compilation.clean 1 80

cd ~/working
nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir train
-corpus ~/corpus/compilation.clean -f sm -e en -alignment
grow-diag-final-and -reordering msd-bidirectional-fe -lm
0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8 -external-bin-dir
~/mosesdecoder/tools >& training.out &

cd ~/corpus
~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en < test.en >
test.tok.en
~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l sm < test.sm >
test.tok.sm
~/mosesdecoder/scripts/recaser/truecase.perl --model truecase-model.en <
test.tok.en > test.true.en
~/mosesdecoder/scripts/recaser/truecase.perl --model truecase-model.sm <
test.tok.sm > test.true.sm

cd ~/working
~/mosesdecoder/scripts/training/filter-model-given-input.pl filtered-test
train/model/moses.ini ~/corpus/test.true.sm -Binarizer
~/mosesdecoder/bin/processPhraseTableMin
nohup nice ~/mosesdecoder/bin/moses -f ~/working/filtered-test/moses.ini <
~/corpus/test.true.sm > ~/working/test.translated.en 2> ~/working/test.out
~/mosesdecoder/scripts/generic/multi-bleu.perl -lc ~/corpus/test.true.en <
~/working/test.translated.en

On 22 June 2015 at 01:20, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:

> Hm. That's interesting. The language should not matter.
>
> 1) Do not report results without tuning. They are meaningless. There is a
> whole thread on that, look for "Major bug found in Moses". If you ignore
> the trollish aspects it contains may good descriptions why this is a
> mistake.
>
> 2) Assuming it was the same data every time (was it?), without tuning
> however I do not quite see where the variance is coming from. This rather
> suggests you have something weird in your pipeline. Mgiza is the only
> stochastic element there, but usually its results are quite consistent. For
> the same weights in your ini-file you should have very similar results.
> Tuning would be the part that introduces instability, but even then these
> differences would be a little on the extreme end, though possible.
>
> On 22.06.2015 08:12, Hokage Sama wrote:
>
>> Thanks Marcin. Its for a new resource-poor language so I only trained it
>> with what I could collect so far (i.e. only 190,630 words of parallel
>> data). I retrained the entire system each time without any tuning.
>>
>> On 22 June 2015 at 01:00, Marcin Junczys-Dowmunt <junczys@amu.edu.pl
>> <mailto:junczys@amu.edu.pl>> wrote:
>>
>> Hi,
>> I think the average is OK, your variance is however quite high.
>> Did you
>> retrain the entire system or just optimize parameters a couple of
>> times?
>>
>> Two useful papers on the topic:
>>
>> https://www.cs.cmu.edu/~jhclark/pubs/significance.pdf
>> <https://www.cs.cmu.edu/%7Ejhclark/pubs/significance.pdf>
>> http://www.mt-archive.info/MTS-2011-Cettolo.pdf
>>
>>
>> On 22.06.2015 02 <tel:22.06.2015%2002>:37, Hokage Sama wrote:
>> > Hi,
>> >
>> > Since MT training is non-convex and thus the BLEU score varies,
>> which
>> > score should I use for my system? I trained my system three times
>> > using the same data and obtained the three different scores below.
>> > Should I take the average or the best score?
>> >
>> > BLEU = 17.84, 49.1/22.0/12.5/7.5 (BP=1.000, ratio=1.095,
>> hyp_len=3952,
>> > ref_len=3609)
>> > BLEU = 16.51, 48.4/20.7/11.4/6.5 (BP=1.000, ratio=1.093,
>> hyp_len=3945,
>> > ref_len=3609)
>> > BLEU = 15.33, 48.2/20.1/10.3/5.5 (BP=1.000, ratio=1.087,
>> hyp_len=3924,
>> > ref_len=3609)
>> >
>> > Thanks,
>> > Hilton
>> >
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150622/8d2d5a28/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 104, Issue 72
**********************************************

0 Response to "Moses-support Digest, Vol 104, Issue 72"

Post a Comment