Moses-support Digest, Vol 106, Issue 23

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: EMS results - makes sense ? (Vincent Nguyen)
2. Re: how much disk sapce for the Giga fr-en corpus ?
(Vincent Nguyen)
3. Re: EMS results - makes sense ? (Vincent Nguyen)
4. Re: EMS results - makes sense ? (Rico Sennrich)


----------------------------------------------------------------------

Message: 1
Date: Sun, 9 Aug 2015 23:07:09 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] EMS results - makes sense ?
To: Barry Haddow <bhaddow@inf.ed.ac.uk>, Hieu Hoang
<hieuhoang@gmail.com>, moses-support <moses-support@mit.edu>
Message-ID: <55C7C0FD.2090406@neuf.fr>
Content-Type: text/plain; charset=windows-1252; format=flowed


Still looking at WMT 11 in fact something looks weird to me :
This table suggests that the Karlsruhe IT obtained 30.5 Bleu score for
FR to EN : http://matrix.statmt.org/matrix/systems_list/1669
But reading the paper http://www.statmt.org/wmt11/pdf/WMT45.pdf shows
28.34 as the final score

I am trying not to focus too much on Bleu scores but this is my only
reference to compare my experiments.


Le 04/08/2015 17:28, Barry Haddow a ?crit :
> Hi Vincent
>
> If you are comparing to the results of WMT11, then you can look at the
> system descriptions to see what the authors did. In fact it's worth
> looking at the WMT14 descriptions (WMT15 will be available next month)
> to see how state-of-the-art systems are built.
>
> For fr-en or en-fr, the first thing to look at is the data. There are
> some large data sets released for WMT and you can get a good gain from
> just crunching more data (monolingual and parallel). Unfortunately
> this takes more resources (disk, cpu etc) so you may run into trouble
> here.
>
> The hierarchical models are much bigger so yes you will need more
> disk. For fr-en/en-fr it's probably not worth the extra effort,
>
> cheers - Barry
>
> On 04/08/15 15:58, Vincent Nguyen wrote:
>> thanks for your insights.
>>
>> I am just stuck by the Bleu difference between my 26 and the 30 of
>> WMT11, and some results of WMT14 close to 36 or even 39
>>
>> I am currently having trouble with hierarchical rule set instead of
>> lexical reordering
>> wondering if I will get better results but I have an error message
>> filesystem root low disk space before it crashes.
>> is this model taking more disk space in some ways ?
>>
>> I will next try to use more corpora of which in domain with my
>> internal TMX
>>
>> thanks for your answers.
>>
>> Le 04/08/2015 16:02, Hieu Hoang a ?crit :
>>>
>>> On 03/08/2015 13:00, Vincent Nguyen wrote:
>>>> Hi,
>>>>
>>>> Just a heads up on some EMS results, to get your experienced opinions.
>>>>
>>>> Corpus: Europarlv7 + NC2010
>>>> fr => en
>>>> Evaluation NC2011.
>>>>
>>>> 1) IRSTLM vs KenLM is much slower for training / tuning.
>>> that sounds right. KenLM is also multithreaded, IRSTLM can only be
>>> used in single-threaded decoding.
>>>> 2) BLEU results are almost the same (25.7 with Irstlm, 26.14 with
>>>> KenLM)
>>> true
>>>> 3) Compact Mode is faster than onDisk with a short test (77
>>>> segments 96
>>>> seconds, vs 126 seconds)
>>> true
>>>> 4) One last thing I do not understand though :
>>>> For sake of checking, I replaced NC2011 by NC2010 in the evaluation (I
>>>> know since NC2010 is part of training, should not be relevant)
>>>> I got roughly the same BLEU score. I would have expected a higher
>>>> score
>>>> with a test set inculded in the training corpus.
>>>>
>>>> makes sense ?
>>>>
>>>>
>>>> Next steps :
>>>> What path should I use to get better scores ? I read the 'optimize'
>>>> section of the website which deals more with speed
>>>> and of course I will appply all of this but I was interested in
>>>> tips to
>>>> get more quality if possible.
>>> look into domain adaptation if you have multiple training corpora,
>>> some of which is in-domain and some out-of-domain.
>>>
>>> Other than that, getting good bleu score is a research open question.
>>>
>>> Well done on getting this far
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>


------------------------------

Message: 2
Date: Sun, 9 Aug 2015 23:38:08 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] how much disk sapce for the Giga fr-en
corpus ?
To: moses-support@mit.edu
Message-ID: <55C7C840.4050609@neuf.fr>
Content-Type: text/plain; charset="windows-1252"


So after a couple of days and a few glitches (memory for fast align and
disk space for the training)

my best options were (system Ubuntu 14.04 wiped from all previous models
/ LM / ...) :
Giga release 2 FR EN parallel corpus.
500000 sentences for fast align
500GB for the disk space
tuning with newstest2013
testing with newstest2013 (I know not very clever, running again with 2012)

Bleu score 28.84
A little disappointed but better than before with the Europarlv7 corpus.

Maybe I will next cumulate more corpus for training.

I read in a paper from WMT14 that some of you guys cumulated some
newstest data sets from several years for tuning.
does it help a lot to tune with bigger sets ?

Cheers,
Vincent


Le 09/08/2015 13:47, Vincent Nguyen a ?crit :
>
> I think at 400GB I was not very far. 500GB was more than enough
> without the -sort-compress gzip options.
> Now it's binarizing / compact, taking very very long too...
> I will update timings when tuning done.
>
> Le 08/08/2015 12:06, Hieu Hoang a ?crit :
>> i don't think anyone's measured it. If you have any measurements,
>> perhaps you can let us know.
>>
>> if you have a fairly recent version of unix sort, you can also add
>> [TRAINING]
>> training-options = "-sort-compress gzip"
>> to reduce disk space requirement.
>>
>> however, i would say you need PLENTY of space. If you just have
>> enough to do extraction and no more, you're gonna have a hard time
>> doing the rest of the experiments.
>>
>>
>> Hieu Hoang
>> Researcher
>> New York University, Abu Dhabi
>> http://www.hoang.co.uk/hieu
>>
>> On 8 August 2015 at 13:55, Vincent Nguyen <vnguyen@neuf.fr
>> <mailto:vnguyen@neuf.fr>> wrote:
>>
>> Hi,
>> I keep adding 100GB on my space, even at 400GB it crashed at sorting
>> time after the extract tables....
>> now trying 500GB
>> Will I need more ?
>> is there a rule ?
>> cheers,
>> Vincent
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150809/6dd3d092/attachment-0001.htm

------------------------------

Message: 3
Date: Mon, 10 Aug 2015 09:32:36 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] EMS results - makes sense ?
To: moses-support@mit.edu
Message-ID: <55C85394.8050203@neuf.fr>
Content-Type: text/plain; charset=windows-1252; format=flowed


similarly reading the WMT14 paper from UEDIN, If not mistaken I read :
35.9 in the matrix : http://matrix.statmt.org/systems/show/2106
31.76 for B1 best system on page 101 of :
http://www.statmt.org/wmt14/pdf/W14-3309.pdf

Maybe I do not take the good information.

Le 09/08/2015 23:07, Vincent Nguyen a ?crit :
> Still looking at WMT 11 in fact something looks weird to me :
> This table suggests that the Karlsruhe IT obtained 30.5 Bleu score for
> FR to EN : http://matrix.statmt.org/matrix/systems_list/1669
> But reading the paper http://www.statmt.org/wmt11/pdf/WMT45.pdf shows
> 28.34 as the final score
>
> I am trying not to focus too much on Bleu scores but this is my only
> reference to compare my experiments.
>
>
> Le 04/08/2015 17:28, Barry Haddow a ?crit :
>> Hi Vincent
>>
>> If you are comparing to the results of WMT11, then you can look at the
>> system descriptions to see what the authors did. In fact it's worth
>> looking at the WMT14 descriptions (WMT15 will be available next month)
>> to see how state-of-the-art systems are built.
>>
>> For fr-en or en-fr, the first thing to look at is the data. There are
>> some large data sets released for WMT and you can get a good gain from
>> just crunching more data (monolingual and parallel). Unfortunately
>> this takes more resources (disk, cpu etc) so you may run into trouble
>> here.
>>
>> The hierarchical models are much bigger so yes you will need more
>> disk. For fr-en/en-fr it's probably not worth the extra effort,
>>
>> cheers - Barry
>>
>> On 04/08/15 15:58, Vincent Nguyen wrote:
>>> thanks for your insights.
>>>
>>> I am just stuck by the Bleu difference between my 26 and the 30 of
>>> WMT11, and some results of WMT14 close to 36 or even 39
>>>
>>> I am currently having trouble with hierarchical rule set instead of
>>> lexical reordering
>>> wondering if I will get better results but I have an error message
>>> filesystem root low disk space before it crashes.
>>> is this model taking more disk space in some ways ?
>>>
>>> I will next try to use more corpora of which in domain with my
>>> internal TMX
>>>
>>> thanks for your answers.
>>>
>>> Le 04/08/2015 16:02, Hieu Hoang a ?crit :
>>>> On 03/08/2015 13:00, Vincent Nguyen wrote:
>>>>> Hi,
>>>>>
>>>>> Just a heads up on some EMS results, to get your experienced opinions.
>>>>>
>>>>> Corpus: Europarlv7 + NC2010
>>>>> fr => en
>>>>> Evaluation NC2011.
>>>>>
>>>>> 1) IRSTLM vs KenLM is much slower for training / tuning.
>>>> that sounds right. KenLM is also multithreaded, IRSTLM can only be
>>>> used in single-threaded decoding.
>>>>> 2) BLEU results are almost the same (25.7 with Irstlm, 26.14 with
>>>>> KenLM)
>>>> true
>>>>> 3) Compact Mode is faster than onDisk with a short test (77
>>>>> segments 96
>>>>> seconds, vs 126 seconds)
>>>> true
>>>>> 4) One last thing I do not understand though :
>>>>> For sake of checking, I replaced NC2011 by NC2010 in the evaluation (I
>>>>> know since NC2010 is part of training, should not be relevant)
>>>>> I got roughly the same BLEU score. I would have expected a higher
>>>>> score
>>>>> with a test set inculded in the training corpus.
>>>>>
>>>>> makes sense ?
>>>>>
>>>>>
>>>>> Next steps :
>>>>> What path should I use to get better scores ? I read the 'optimize'
>>>>> section of the website which deals more with speed
>>>>> and of course I will appply all of this but I was interested in
>>>>> tips to
>>>>> get more quality if possible.
>>>> look into domain adaptation if you have multiple training corpora,
>>>> some of which is in-domain and some out-of-domain.
>>>>
>>>> Other than that, getting good bleu score is a research open question.
>>>>
>>>> Well done on getting this far
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


------------------------------

Message: 4
Date: Mon, 10 Aug 2015 08:56:36 +0100
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] EMS results - makes sense ?
To: moses-support@mit.edu
Message-ID: <55C85934.4090003@gmx.ch>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hi Vincent,

the KIT paper reports scores on newstest2010 (and newstest2009) in their
system description paper, while the matrix shows scores on newstest2011.
The UEDIN WMT14 paper reports scores on newstest2012, newstest2013, and
newstest2014 (it may admittedly be hard to see which is which:
newstest2013 is the default in that paper). The reason why people report
results on an older test set is that system development for a shared
task happens without access to the test set to avoid overfitting to the
task. Time and space permitting, some experiments are repeated on that
year's test set for the system description (like in table 6 in the UEDIN
paper).

best wishes,
Rico

On 10/08/15 08:32, Vincent Nguyen wrote:
> similarly reading the WMT14 paper from UEDIN, If not mistaken I read :
> 35.9 in the matrix : http://matrix.statmt.org/systems/show/2106
> 31.76 for B1 best system on page 101 of :
> http://www.statmt.org/wmt14/pdf/W14-3309.pdf
>
> Maybe I do not take the good information.
>
> Le 09/08/2015 23:07, Vincent Nguyen a ?crit :
>> Still looking at WMT 11 in fact something looks weird to me :
>> This table suggests that the Karlsruhe IT obtained 30.5 Bleu score for
>> FR to EN : http://matrix.statmt.org/matrix/systems_list/1669
>> But reading the paper http://www.statmt.org/wmt11/pdf/WMT45.pdf shows
>> 28.34 as the final score
>>
>> I am trying not to focus too much on Bleu scores but this is my only
>> reference to compare my experiments.
>>
>>
>> Le 04/08/2015 17:28, Barry Haddow a ?crit :
>>> Hi Vincent
>>>
>>> If you are comparing to the results of WMT11, then you can look at the
>>> system descriptions to see what the authors did. In fact it's worth
>>> looking at the WMT14 descriptions (WMT15 will be available next month)
>>> to see how state-of-the-art systems are built.
>>>
>>> For fr-en or en-fr, the first thing to look at is the data. There are
>>> some large data sets released for WMT and you can get a good gain from
>>> just crunching more data (monolingual and parallel). Unfortunately
>>> this takes more resources (disk, cpu etc) so you may run into trouble
>>> here.
>>>
>>> The hierarchical models are much bigger so yes you will need more
>>> disk. For fr-en/en-fr it's probably not worth the extra effort,
>>>
>>> cheers - Barry
>>>
>>> On 04/08/15 15:58, Vincent Nguyen wrote:
>>>> thanks for your insights.
>>>>
>>>> I am just stuck by the Bleu difference between my 26 and the 30 of
>>>> WMT11, and some results of WMT14 close to 36 or even 39
>>>>
>>>> I am currently having trouble with hierarchical rule set instead of
>>>> lexical reordering
>>>> wondering if I will get better results but I have an error message
>>>> filesystem root low disk space before it crashes.
>>>> is this model taking more disk space in some ways ?
>>>>
>>>> I will next try to use more corpora of which in domain with my
>>>> internal TMX
>>>>
>>>> thanks for your answers.
>>>>
>>>> Le 04/08/2015 16:02, Hieu Hoang a ?crit :
>>>>> On 03/08/2015 13:00, Vincent Nguyen wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Just a heads up on some EMS results, to get your experienced opinions.
>>>>>>
>>>>>> Corpus: Europarlv7 + NC2010
>>>>>> fr => en
>>>>>> Evaluation NC2011.
>>>>>>
>>>>>> 1) IRSTLM vs KenLM is much slower for training / tuning.
>>>>> that sounds right. KenLM is also multithreaded, IRSTLM can only be
>>>>> used in single-threaded decoding.
>>>>>> 2) BLEU results are almost the same (25.7 with Irstlm, 26.14 with
>>>>>> KenLM)
>>>>> true
>>>>>> 3) Compact Mode is faster than onDisk with a short test (77
>>>>>> segments 96
>>>>>> seconds, vs 126 seconds)
>>>>> true
>>>>>> 4) One last thing I do not understand though :
>>>>>> For sake of checking, I replaced NC2011 by NC2010 in the evaluation (I
>>>>>> know since NC2010 is part of training, should not be relevant)
>>>>>> I got roughly the same BLEU score. I would have expected a higher
>>>>>> score
>>>>>> with a test set inculded in the training corpus.
>>>>>>
>>>>>> makes sense ?
>>>>>>
>>>>>>
>>>>>> Next steps :
>>>>>> What path should I use to get better scores ? I read the 'optimize'
>>>>>> section of the website which deals more with speed
>>>>>> and of course I will appply all of this but I was interested in
>>>>>> tips to
>>>>>> get more quality if possible.
>>>>> look into domain adaptation if you have multiple training corpora,
>>>>> some of which is in-domain and some out-of-domain.
>>>>>
>>>>> Other than that, getting good bleu score is a research open question.
>>>>>
>>>>> Well done on getting this far
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> Moses-support@mit.edu
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 106, Issue 23
**********************************************

0 Response to "Moses-support Digest, Vol 106, Issue 23"

Post a Comment