Moses-support Digest, Vol 104, Issue 76

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: BLEU Score Variance: Which score to use? (Hokage Sama)
2. Re: Major bug found in Moses (Ondrej Bojar)
3. Re: Major bug found in Moses (Marcin Junczys-Dowmunt)
4. Re: BLEU Score Variance: Which score to use?
(Marcin Junczys-Dowmunt)


----------------------------------------------------------------------

Message: 1
Date: Mon, 22 Jun 2015 17:25:47 -0500
From: Hokage Sama <nvncbol@gmail.com>
Subject: Re: [Moses-support] BLEU Score Variance: Which score to use?
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAD3ogMbR7dcDiCjEi2B8mtOpua7GtpQAee=CVH+_k_JMer56DQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi I delete all the files (I think) generated during a training job before
rerunning the entire training. You think this could cause variation? Here's
the commands I run to delete:

rm ~/corpus/train.tok.en
rm ~/corpus/train.tok.sm
rm ~/corpus/train.true.en
rm ~/corpus/train.true.sm
rm ~/corpus/train.clean.en
rm ~/corpus/train.clean.sm
rm ~/corpus/truecase-model.en
rm ~/corpus/truecase-model.sm
rm ~/corpus/test.tok.en
rm ~/corpus/test.tok.sm
rm ~/corpus/test.true.en
rm ~/corpus/test.true.sm
rm -rf ~/working/filtered-test
rm ~/working/test.out
rm ~/working/test.translated.en
rm ~/working/training.out
rm -rf ~/working/train/corpus
rm -rf ~/working/train/giza.en-sm
rm -rf ~/working/train/giza.sm-en
rm -rf ~/working/train/model

On 22 June 2015 at 03:35, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:

> You're welcome. Take another close look at those varying bleu scores
> though. That would make me worry if it happened to me for the same data and
> the same weights.
>
> On 22.06.2015 10:31, Hokage Sama wrote:
>
>> Ok thanks. Appreciate your help.
>>
>> On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt <junczys@amu.edu.pl
>> <mailto:junczys@amu.edu.pl>> wrote:
>>
>> Difficult to tell with that little data. Once you get beyond
>> 100,000 segments (or 50,000 at least) i would say 2000 per dev
>> (for tuning) and test set, rest for training. With that few
>> segments it's hard to give you any recommendations since it might
>> just not give meaningful results. It's currently a toy model, good
>> for learning and playing around with options. But not good for
>> trying to infer anything from BLEU scores.
>>
>>
>> On 22.06.2015 10 <tel:22.06.2015%2010>:17, Hokage Sama wrote:
>>
>> Yes the language model was built earlier when I first went
>> through the manual to build a French-English baseline system.
>> So I just reused it for my Samoan-English system.
>> Yes for all three runs I used the same training and testing files.
>> How can I determine how much parallel data I should set aside
>> for tuning and testing? I have only 10,028 segments (198,385
>> words) altogether. At the moment I'm using 259 segments for
>> testing and the rest for training.
>>
>> Thanks,
>> Hilton
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150622/435e09ef/attachment-0001.htm

------------------------------

Message: 2
Date: Tue, 23 Jun 2015 00:27:27 +0200 (CEST)
From: Ondrej Bojar <bojar@ufal.mff.cuni.cz>
Subject: Re: [Moses-support] Major bug found in Moses
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: moses-support@mit.edu
Message-ID:
<1727160514.284241.1435012047944.JavaMail.zimbra@ufal.mff.cuni.cz>
Content-Type: text/plain; charset=utf-8

...and I wouldn't be surprised to find Moses also behind this Java-to-C# automatic translation:

https://www.youtube.com/watch?v=CHDDNnRm-g8

O.

----- Original Message -----
> From: "Marcin Junczys-Dowmunt" <junczys@amu.edu.pl>
> To: moses-support@mit.edu
> Sent: Friday, 19 June, 2015 19:21:45
> Subject: Re: [Moses-support] Major bug found in Moses

> On that interesting idea that moses should be naturally good at
> translating things, just for general considerations.
>
> Since some said this thread has educational value I would like to share
> something that might not be obvious due to the SMT-biased posts here.
> Moses is also the _leading_ tool for automatic grammatical error
> correction (GEC) right now. The first and third system of the CoNLL
> shared task 2014 were based on Moses. By now I have results that surpass
> the CoNLL results by far by adding some specialized features to Moses
> (which thanks to Hieu is very easy).
>
> It even gets good results for GEC when you do crazy things like
> inverting the TM (so it should actually make the input worse) provided
> you tune on the correct metric and for the correct task. The interaction
> of all the other features after tuning makes that possible.
>
> So, if anything, Moses is just a very flexible text-rewriting tool.
> Tuning (and data) turns into a translator, GEC tool, POS-tagger,
> Chunker, Semantic Tagger etc.
>
> On 19.06.2015 18:40, Lane Schwartz wrote:
>> On Fri, Jun 19, 2015 at 11:28 AM, Read, James C <jcread@essex.ac.uk
>> <mailto:jcread@essex.ac.uk>> wrote:
>>
>> What I take issue with is the en-masse denial that there is a
>> problem with the system if it behaves in such a way with no LM +
>> no pruning and/or tuning.
>>
>>
>> There is no mass denial taking place.
>>
>> Regardless of whether or not you tune, the decoder will do its best to
>> find translations with the highest model score. That is the expected
>> behavior.
>>
>> What I have tried to tell you, and what other people have tried to
>> tell you, is that translations with high model scores are not
>> necessarily good translations.
>>
>> We all want our models to be such that high model scores correspond to
>> good translations, and that low model scores correspond with bad
>> translations. But unfortunately, our models do not innately have this
>> characteristic. We all know this. We also know a good way to deal with
>> this shortcoming, namely tuning. Tuning is the process by which we
>> attempt to ensure that high model scores correspond to high quality
>> translations, and that low model scores correspond to low quality
>> translations.
>>
>> If you can design models that naturally correspond with translation
>> quality without tuning, that's great. If you can do that, you've got a
>> great shot at winning a Best Paper award at ACL.
>>
>> In the meantime, you may want to consider an apology for your rude
>> behavior and unprofessional attitude.
>>
>> Goodbye.
>> Lane
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

--
Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo


------------------------------

Message: 3
Date: Tue, 23 Jun 2015 00:43:13 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Major bug found in Moses
To: Ondrej Bojar <bojar@ufal.mff.cuni.cz>
Cc: moses-support@mit.edu
Message-ID: <55888F81.7070901@amu.edu.pl>
Content-Type: text/plain; charset=UTF-8; format=flowed

That would make very cool student projects.
Also that video is acing it, even the voice-over is synthetic :)

On 23.06.2015 00:27, Ondrej Bojar wrote:
> ...and I wouldn't be surprised to find Moses also behind this Java-to-C# automatic translation:
>
> https://www.youtube.com/watch?v=CHDDNnRm-g8
>
> O.
>
> ----- Original Message -----
>> From: "Marcin Junczys-Dowmunt" <junczys@amu.edu.pl>
>> To: moses-support@mit.edu
>> Sent: Friday, 19 June, 2015 19:21:45
>> Subject: Re: [Moses-support] Major bug found in Moses
>> On that interesting idea that moses should be naturally good at
>> translating things, just for general considerations.
>>
>> Since some said this thread has educational value I would like to share
>> something that might not be obvious due to the SMT-biased posts here.
>> Moses is also the _leading_ tool for automatic grammatical error
>> correction (GEC) right now. The first and third system of the CoNLL
>> shared task 2014 were based on Moses. By now I have results that surpass
>> the CoNLL results by far by adding some specialized features to Moses
>> (which thanks to Hieu is very easy).
>>
>> It even gets good results for GEC when you do crazy things like
>> inverting the TM (so it should actually make the input worse) provided
>> you tune on the correct metric and for the correct task. The interaction
>> of all the other features after tuning makes that possible.
>>
>> So, if anything, Moses is just a very flexible text-rewriting tool.
>> Tuning (and data) turns into a translator, GEC tool, POS-tagger,
>> Chunker, Semantic Tagger etc.
>>
>> On 19.06.2015 18:40, Lane Schwartz wrote:
>>> On Fri, Jun 19, 2015 at 11:28 AM, Read, James C <jcread@essex.ac.uk
>>> <mailto:jcread@essex.ac.uk>> wrote:
>>>
>>> What I take issue with is the en-masse denial that there is a
>>> problem with the system if it behaves in such a way with no LM +
>>> no pruning and/or tuning.
>>>
>>>
>>> There is no mass denial taking place.
>>>
>>> Regardless of whether or not you tune, the decoder will do its best to
>>> find translations with the highest model score. That is the expected
>>> behavior.
>>>
>>> What I have tried to tell you, and what other people have tried to
>>> tell you, is that translations with high model scores are not
>>> necessarily good translations.
>>>
>>> We all want our models to be such that high model scores correspond to
>>> good translations, and that low model scores correspond with bad
>>> translations. But unfortunately, our models do not innately have this
>>> characteristic. We all know this. We also know a good way to deal with
>>> this shortcoming, namely tuning. Tuning is the process by which we
>>> attempt to ensure that high model scores correspond to high quality
>>> translations, and that low model scores correspond to low quality
>>> translations.
>>>
>>> If you can design models that naturally correspond with translation
>>> quality without tuning, that's great. If you can do that, you've got a
>>> great shot at winning a Best Paper award at ACL.
>>>
>>> In the meantime, you may want to consider an apology for your rude
>>> behavior and unprofessional attitude.
>>>
>>> Goodbye.
>>> Lane
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support



------------------------------

Message: 4
Date: Tue, 23 Jun 2015 00:47:53 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] BLEU Score Variance: Which score to use?
To: Hokage Sama <nvncbol@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <55889099.6000902@amu.edu.pl>
Content-Type: text/plain; charset=UTF-8; format=flowed

I don't think so. However, when you repeat those experiments, you might
try to identify where two trainings are starting to diverge by pairwise
comparisions of the same files between two runs. Maybe then we can
deduce something.

On 23.06.2015 00:25, Hokage Sama wrote:
> Hi I delete all the files (I think) generated during a training job
> before rerunning the entire training. You think this could cause
> variation? Here's the commands I run to delete:
>
> rm ~/corpus/train.tok.en
> rm ~/corpus/train.tok.sm <http://train.tok.sm>
> rm ~/corpus/train.true.en
> rm ~/corpus/train.true.sm <http://train.true.sm>
> rm ~/corpus/train.clean.en
> rm ~/corpus/train.clean.sm <http://train.clean.sm>
> rm ~/corpus/truecase-model.en
> rm ~/corpus/truecase-model.sm <http://truecase-model.sm>
> rm ~/corpus/test.tok.en
> rm ~/corpus/test.tok.sm <http://test.tok.sm>
> rm ~/corpus/test.true.en
> rm ~/corpus/test.true.sm <http://test.true.sm>
> rm -rf ~/working/filtered-test
> rm ~/working/test.out
> rm ~/working/test.translated.en
> rm ~/working/training.out
> rm -rf ~/working/train/corpus
> rm -rf ~/working/train/giza.en-sm
> rm -rf ~/working/train/giza.sm-en
> rm -rf ~/working/train/model
>
> On 22 June 2015 at 03:35, Marcin Junczys-Dowmunt <junczys@amu.edu.pl
> <mailto:junczys@amu.edu.pl>> wrote:
>
> You're welcome. Take another close look at those varying bleu
> scores though. That would make me worry if it happened to me for
> the same data and the same weights.
>
> On 22.06.2015 10 <tel:22.06.2015%2010>:31, Hokage Sama wrote:
>
> Ok thanks. Appreciate your help.
>
> On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt
> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>
> <mailto:junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>>> wrote:
>
> Difficult to tell with that little data. Once you get beyond
> 100,000 segments (or 50,000 at least) i would say 2000 per dev
> (for tuning) and test set, rest for training. With that few
> segments it's hard to give you any recommendations since
> it might
> just not give meaningful results. It's currently a toy
> model, good
> for learning and playing around with options. But not good for
> trying to infer anything from BLEU scores.
>
>
> On 22.06.2015 10 <tel:22.06.2015%2010>
> <tel:22.06.2015%2010>:17, Hokage Sama wrote:
>
> Yes the language model was built earlier when I first went
> through the manual to build a French-English baseline
> system.
> So I just reused it for my Samoan-English system.
> Yes for all three runs I used the same training and
> testing files.
> How can I determine how much parallel data I should
> set aside
> for tuning and testing? I have only 10,028 segments
> (198,385
> words) altogether. At the moment I'm using 259
> segments for
> testing and the rest for training.
>
> Thanks,
> Hilton
>
>
>
>
>



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 104, Issue 76
**********************************************

0 Response to "Moses-support Digest, Vol 104, Issue 76"

Post a Comment