Moses-support Digest, Vol 104, Issue 73

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: BLEU Score Variance: Which score to use?
(Marcin Junczys-Dowmunt)
2. Re: BLEU Score Variance: Which score to use? (Hokage Sama)


----------------------------------------------------------------------

Message: 1
Date: Mon, 22 Jun 2015 09:52:28 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] BLEU Score Variance: Which score to use?
To: Hokage Sama <nvncbol@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <5587BEBC.90100@amu.edu.pl>
Content-Type: text/plain; charset=UTF-8; format=flowed

Don't see any reason for indeterminism here. Unless mgiza is less stable
for small data than I thought. The lm lm/news-commentary-v8.fr-en.blm.en
has been built earlier somewhere?

And to be sure: for all three runs you used exactly the same data,
training and test set?

On 22.06.2015 09:34, Hokage Sama wrote:
> Wow that was a long read. Still reading though :) but I see that
> tuning is essential. I am fairly new to Moses so could you please
> check if the commands I ran were correct (minus the tuning part). I
> just modified the commands on the Moses website for building a
> baseline system. Below are the commands I ran. My training files are
> "compilation.en" and "compilation.sm <http://compilation.sm>". My test
> files are "test.en" and "test.sm <http://test.sm>".
>
> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en <
> ~/corpus/training/compilation.en > ~/corpus/compilation.tok.en
> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l sm <
> ~/corpus/training/compilation.sm <http://compilation.sm> >
> ~/corpus/compilation.tok.sm <http://compilation.tok.sm>
> ~/mosesdecoder/scripts/recaser/train-truecaser.perl --model
> ~/corpus/truecase-model.en --corpus ~/corpus/compilation.tok.en
> ~/mosesdecoder/scripts/recaser/train-truecaser.perl --model
> ~/corpus/truecase-model.sm <http://truecase-model.sm> --corpus
> ~/corpus/compilation.tok.sm <http://compilation.tok.sm>
> ~/mosesdecoder/scripts/recaser/truecase.perl --model
> ~/corpus/truecase-model.en < ~/corpus/compilation.tok.en >
> ~/corpus/compilation.true.en
> ~/mosesdecoder/scripts/recaser/truecase.perl --model
> ~/corpus/truecase-model.sm <http://truecase-model.sm> <
> ~/corpus/compilation.tok.sm <http://compilation.tok.sm> >
> ~/corpus/compilation.true.sm <http://compilation.true.sm>
> ~/mosesdecoder/scripts/training/clean-corpus-n.perl
> ~/corpus/compilation.true sm en ~/corpus/compilation.clean 1 80
>
> cd ~/working
> nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir
> train -corpus ~/corpus/compilation.clean -f sm -e en -alignment
> grow-diag-final-and -reordering msd-bidirectional-fe -lm
> 0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8 -external-bin-dir
> ~/mosesdecoder/tools >& training.out &
>
> cd ~/corpus
> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en < test.en >
> test.tok.en
> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l sm < test.sm
> <http://test.sm> > test.tok.sm <http://test.tok.sm>
> ~/mosesdecoder/scripts/recaser/truecase.perl --model truecase-model.en
> < test.tok.en > test.true.en
> ~/mosesdecoder/scripts/recaser/truecase.perl --model truecase-model.sm
> <http://truecase-model.sm> < test.tok.sm <http://test.tok.sm> >
> test.true.sm <http://test.true.sm>
>
> cd ~/working
> ~/mosesdecoder/scripts/training/filter-model-given-input.pl
> <http://filter-model-given-input.pl> filtered-test
> train/model/moses.ini ~/corpus/test.true.sm <http://test.true.sm>
> -Binarizer ~/mosesdecoder/bin/processPhraseTableMin
> nohup nice ~/mosesdecoder/bin/moses -f
> ~/working/filtered-test/moses.ini < ~/corpus/test.true.sm
> <http://test.true.sm> > ~/working/test.translated.en 2> ~/working/test.out
> ~/mosesdecoder/scripts/generic/multi-bleu.perl -lc
> ~/corpus/test.true.en < ~/working/test.translated.en
>
> On 22 June 2015 at 01:20, Marcin Junczys-Dowmunt <junczys@amu.edu.pl
> <mailto:junczys@amu.edu.pl>> wrote:
>
> Hm. That's interesting. The language should not matter.
>
> 1) Do not report results without tuning. They are meaningless.
> There is a whole thread on that, look for "Major bug found in
> Moses". If you ignore the trollish aspects it contains may good
> descriptions why this is a mistake.
>
> 2) Assuming it was the same data every time (was it?), without
> tuning however I do not quite see where the variance is coming
> from. This rather suggests you have something weird in your
> pipeline. Mgiza is the only stochastic element there, but usually
> its results are quite consistent. For the same weights in your
> ini-file you should have very similar results. Tuning would be the
> part that introduces instability, but even then these differences
> would be a little on the extreme end, though possible.
>
> On 22.06.2015 08 <tel:22.06.2015%2008>:12, Hokage Sama wrote:
>
> Thanks Marcin. Its for a new resource-poor language so I only
> trained it with what I could collect so far (i.e. only 190,630
> words of parallel data). I retrained the entire system each
> time without any tuning.
>
> On 22 June 2015 at 01:00, Marcin Junczys-Dowmunt
> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>
> <mailto:junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>>> wrote:
>
> Hi,
> I think the average is OK, your variance is however quite
> high.
> Did you
> retrain the entire system or just optimize parameters a
> couple of
> times?
>
> Two useful papers on the topic:
>
> https://www.cs.cmu.edu/~jhclark/pubs/significance.pdf
> <https://www.cs.cmu.edu/%7Ejhclark/pubs/significance.pdf>
> <https://www.cs.cmu.edu/%7Ejhclark/pubs/significance.pdf>
> http://www.mt-archive.info/MTS-2011-Cettolo.pdf
>
>
> On 22.06.2015 02 <tel:22.06.2015%2002>
> <tel:22.06.2015%2002>:37, Hokage Sama wrote:
> > Hi,
> >
> > Since MT training is non-convex and thus the BLEU score
> varies,
> which
> > score should I use for my system? I trained my system
> three times
> > using the same data and obtained the three different
> scores below.
> > Should I take the average or the best score?
> >
> > BLEU = 17.84, 49.1/22.0/12.5/7.5 (BP=1.000, ratio=1.095,
> hyp_len=3952,
> > ref_len=3609)
> > BLEU = 16.51, 48.4/20.7/11.4/6.5 (BP=1.000, ratio=1.093,
> hyp_len=3945,
> > ref_len=3609)
> > BLEU = 15.33, 48.2/20.1/10.3/5.5 (BP=1.000, ratio=1.087,
> hyp_len=3924,
> > ref_len=3609)
> >
> > Thanks,
> > Hilton
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> <mailto:Moses-support@mit.edu <mailto:Moses-support@mit.edu>>
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> <mailto:Moses-support@mit.edu <mailto:Moses-support@mit.edu>>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>



------------------------------

Message: 2
Date: Mon, 22 Jun 2015 03:17:00 -0500
From: Hokage Sama <nvncbol@gmail.com>
Subject: Re: [Moses-support] BLEU Score Variance: Which score to use?
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAD3ogMY381hbeG0JVYsE2MuOiD7HbQG51eURoxocS4QCRcDADw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Yes the language model was built earlier when I first went through the
manual to build a French-English baseline system. So I just reused it for
my Samoan-English system.
Yes for all three runs I used the same training and testing files.
How can I determine how much parallel data I should set aside for tuning
and testing? I have only 10,028 segments (198,385 words) altogether. At the
moment I'm using 259 segments for testing and the rest for training.

Thanks,
Hilton

On 22 June 2015 at 02:52, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:

> Don't see any reason for indeterminism here. Unless mgiza is less stable
> for small data than I thought. The lm lm/news-commentary-v8.fr-en.blm.en
> has been built earlier somewhere?
>
> And to be sure: for all three runs you used exactly the same data,
> training and test set?
>
> On 22.06.2015 09:34, Hokage Sama wrote:
>
>> Wow that was a long read. Still reading though :) but I see that tuning
>> is essential. I am fairly new to Moses so could you please check if the
>> commands I ran were correct (minus the tuning part). I just modified the
>> commands on the Moses website for building a baseline system. Below are the
>> commands I ran. My training files are "compilation.en" and "
>> compilation.sm <http://compilation.sm>". My test files are "test.en" and
>> "test.sm <http://test.sm>".
>>
>> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en <
>> ~/corpus/training/compilation.en > ~/corpus/compilation.tok.en
>> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l sm < ~/corpus/training/
>> compilation.sm <http://compilation.sm> > ~/corpus/compilation.tok.sm <
>> http://compilation.tok.sm>
>> ~/mosesdecoder/scripts/recaser/train-truecaser.perl --model
>> ~/corpus/truecase-model.en --corpus ~/corpus/compilation.tok.en
>> ~/mosesdecoder/scripts/recaser/train-truecaser.perl --model ~/corpus/
>> truecase-model.sm <http://truecase-model.sm> --corpus ~/corpus/
>> compilation.tok.sm <http://compilation.tok.sm>
>> ~/mosesdecoder/scripts/recaser/truecase.perl --model
>> ~/corpus/truecase-model.en < ~/corpus/compilation.tok.en >
>> ~/corpus/compilation.true.en
>> ~/mosesdecoder/scripts/recaser/truecase.perl --model ~/corpus/
>> truecase-model.sm <http://truecase-model.sm> < ~/corpus/
>> compilation.tok.sm <http://compilation.tok.sm> > ~/corpus/
>> compilation.true.sm <http://compilation.true.sm>
>> ~/mosesdecoder/scripts/training/clean-corpus-n.perl
>> ~/corpus/compilation.true sm en ~/corpus/compilation.clean 1 80
>>
>> cd ~/working
>> nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir
>> train -corpus ~/corpus/compilation.clean -f sm -e en -alignment
>> grow-diag-final-and -reordering msd-bidirectional-fe -lm
>> 0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8 -external-bin-dir
>> ~/mosesdecoder/tools >& training.out &
>>
>> cd ~/corpus
>> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en < test.en >
>> test.tok.en
>> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l sm < test.sm <
>> http://test.sm> > test.tok.sm <http://test.tok.sm>
>> ~/mosesdecoder/scripts/recaser/truecase.perl --model truecase-model.en <
>> test.tok.en > test.true.en
>> ~/mosesdecoder/scripts/recaser/truecase.perl --model truecase-model.sm <
>> http://truecase-model.sm> < test.tok.sm <http://test.tok.sm> >
>> test.true.sm <http://test.true.sm>
>>
>> cd ~/working
>> ~/mosesdecoder/scripts/training/filter-model-given-input.pl <
>> http://filter-model-given-input.pl> filtered-test train/model/moses.ini
>> ~/corpus/test.true.sm <http://test.true.sm> -Binarizer
>> ~/mosesdecoder/bin/processPhraseTableMin
>> nohup nice ~/mosesdecoder/bin/moses -f ~/working/filtered-test/moses.ini
>> < ~/corpus/test.true.sm <http://test.true.sm> >
>> ~/working/test.translated.en 2> ~/working/test.out
>> ~/mosesdecoder/scripts/generic/multi-bleu.perl -lc ~/corpus/test.true.en
>> < ~/working/test.translated.en
>>
>> On 22 June 2015 at 01:20, Marcin Junczys-Dowmunt <junczys@amu.edu.pl
>> <mailto:junczys@amu.edu.pl>> wrote:
>>
>> Hm. That's interesting. The language should not matter.
>>
>> 1) Do not report results without tuning. They are meaningless.
>> There is a whole thread on that, look for "Major bug found in
>> Moses". If you ignore the trollish aspects it contains may good
>> descriptions why this is a mistake.
>>
>> 2) Assuming it was the same data every time (was it?), without
>> tuning however I do not quite see where the variance is coming
>> from. This rather suggests you have something weird in your
>> pipeline. Mgiza is the only stochastic element there, but usually
>> its results are quite consistent. For the same weights in your
>> ini-file you should have very similar results. Tuning would be the
>> part that introduces instability, but even then these differences
>> would be a little on the extreme end, though possible.
>>
>> On 22.06.2015 08 <tel:22.06.2015%2008>:12, Hokage Sama wrote:
>>
>> Thanks Marcin. Its for a new resource-poor language so I only
>> trained it with what I could collect so far (i.e. only 190,630
>> words of parallel data). I retrained the entire system each
>> time without any tuning.
>>
>> On 22 June 2015 at 01:00, Marcin Junczys-Dowmunt
>> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>
>> <mailto:junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>>> wrote:
>>
>> Hi,
>> I think the average is OK, your variance is however quite
>> high.
>> Did you
>> retrain the entire system or just optimize parameters a
>> couple of
>> times?
>>
>> Two useful papers on the topic:
>>
>> https://www.cs.cmu.edu/~jhclark/pubs/significance.pdf
>> <https://www.cs.cmu.edu/%7Ejhclark/pubs/significance.pdf>
>> <https://www.cs.cmu.edu/%7Ejhclark/pubs/significance.pdf>
>> http://www.mt-archive.info/MTS-2011-Cettolo.pdf
>>
>>
>> On 22.06.2015 02 <tel:22.06.2015%2002>
>> <tel:22.06.2015%2002>:37, Hokage Sama wrote:
>> > Hi,
>> >
>> > Since MT training is non-convex and thus the BLEU score
>> varies,
>> which
>> > score should I use for my system? I trained my system
>> three times
>> > using the same data and obtained the three different
>> scores below.
>> > Should I take the average or the best score?
>> >
>> > BLEU = 17.84, 49.1/22.0/12.5/7.5 (BP=1.000, ratio=1.095,
>> hyp_len=3952,
>> > ref_len=3609)
>> > BLEU = 16.51, 48.4/20.7/11.4/6.5 (BP=1.000, ratio=1.093,
>> hyp_len=3945,
>> > ref_len=3609)
>> > BLEU = 15.33, 48.2/20.1/10.3/5.5 (BP=1.000, ratio=1.087,
>> hyp_len=3924,
>> > ref_len=3609)
>> >
>> > Thanks,
>> > Hilton
>> >
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> <mailto:Moses-support@mit.edu <mailto:Moses-support@mit.edu>>
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> <mailto:Moses-support@mit.edu <mailto:Moses-support@mit.edu>>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150622/d1c45a0f/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 104, Issue 73
**********************************************

0 Response to "Moses-support Digest, Vol 104, Issue 73"

Post a Comment