Moses-support Digest, Vol 106, Issue 17

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: EMS results - makes sense ? (Philipp Koehn)

----------------------------------------------------------------------

Message: 1
Date: Thu, 6 Aug 2015 12:16:01 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] EMS results - makes sense ?
To: Dingyuan Wang <abcdoyle888@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAAFADDB=ZnHpyW1uH2h7na89DxxkFrZBf26RK_4AEzoaV0NncQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

this feature was added last month.

You can restrict the number of parallel processes that
EMS runs with a switch, e.g., "--max-active 1" for at
most 1 job at a time.

-phi

On Thu, Aug 6, 2015 at 12:12 PM, Dingyuan Wang <abcdoyle888@gmail.com>
wrote:

> When is the 'fast-align-max-lines' added? That's convenient. I had to
> write a wrapper script before to limit the lines to process.
> Also, can I not run two direction's fast-aligns/mgizas in parallel? I have
> the memory to run one at a time, but not two. (I also wrote a wrapper
> script to block one.)
>
> > 2015?8?7? 00:01? "Philipp Koehn" <phi@jhu.edu>???
> >>
> >> Hi,
> >>
> >> if you run into memory problems with fast align, you can
> >> add the following in the [TRAINING] section:
> >>
> >> fast-align-max-lines = 1000000
> >>
> >> This will run fast-align in parts of 1 million sentence pairs.
> >>
> >> -phi
> >>
> >>
> >> On Thu, Aug 6, 2015 at 7:28 AM, Barry Haddow <bhaddow@inf.ed.ac.uk>
> wrote:
> >>>
> >>> Hi Vincent
> >>>
> >>> It's a SIGKILL. Probably means it ran out of memory.
> >>>
> >>> I'd recommend fast_align for this data set. Even if you manage to get
> it running with mgiza it will still take a week or so.
> >>>
> >>> Just add
> >>> fast-align-settings = "-d -o -v"
> >>> to the TRAINING section of ems, and make sure that fast_align is in
> your external-bin-dir.
> >>>
> >>> cheers - Barry
> >>>
> >>>
> >>> On 06/08/15 08:40, Vincent Nguyen wrote:
> >>>>
> >>>>
> >>>> so I dropped my hierarchical model since I got an error.
> >>>> Switched back to the "more data" by adding the Giga FR EN source
> >>>> but now another error pops un running Giza Inverse :
> >>>>
> >>>> Using SCRIPTS_ROOTDIR: /home/moses/mosesdecoder/scripts
> >>>> Using multi-thread GIZA
> >>>> using gzip
> >>>> (2) running giza @ Wed Aug 5 21:03:56 CEST 2015
> >>>> (2.1a) running snt2cooc fr-en @ Wed Aug 5 21:03:56 CEST 2015
> >>>> Executing: mkdir -p /home/moses/working/training/giza-inverse.7
> >>>> Executing: /home/moses/working/bin/training-tools/mgizapp/snt2cooc
> /home/moses/working/training/giza-inverse.7/fr-en.cooc
> /home/moses/working/training/prepared.7/en.vcb
> /home/moses/working/training/prepared.7/fr.vcb
> /home/moses/working/training/prepared.7/fr-en-int-train.snt
> >>>> line 1000
> >>>> line 2000
> >>>>
> >>>> ...
> >>>> line 6609000
> >>>> line 6610000
> >>>> ERROR: Execution of:
> /home/moses/working/bin/training-tools/mgizapp/snt2cooc
> /home/moses/working/training/giza-inverse.7/fr-en.cooc
> /home/moses/working/training/prepared.7/en.vcb
> /home/moses/working/training/prepared.7/fr.vcb
> /home/moses/working/training/prepared.7/fr-en-int-train.snt
> >>>> died with signal 9, without coredump
> >>>>
> >>>>
> >>>> any clue what signal 9 means ?
> >>>>
> >>>>
> >>>>
> >>>> Le 04/08/2015 17:28, Barry Haddow a ?crit :
> >>>>>
> >>>>> Hi Vincent
> >>>>>
> >>>>> If you are comparing to the results of WMT11, then you can look at
> the system descriptions to see what the authors did. In fact it's worth
> looking at the WMT14 descriptions (WMT15 will be available next month) to
> see how state-of-the-art systems are built.
> >>>>>
> >>>>> For fr-en or en-fr, the first thing to look at is the data. There
> are some large data sets released for WMT and you can get a good gain from
> just crunching more data (monolingual and parallel). Unfortunately this
> takes more resources (disk, cpu etc) so you may run into trouble here.
> >>>>>
> >>>>> The hierarchical models are much bigger so yes you will need more
> disk. For fr-en/en-fr it's probably not worth the extra effort,
> >>>>>
> >>>>> cheers - Barry
> >>>>>
> >>>>> On 04/08/15 15:58, Vincent Nguyen wrote:
> >>>>>>
> >>>>>> thanks for your insights.
> >>>>>>
> >>>>>> I am just stuck by the Bleu difference between my 26 and the 30 of
> >>>>>> WMT11, and some results of WMT14 close to 36 or even 39
> >>>>>>
> >>>>>> I am currently having trouble with hierarchical rule set instead of
> >>>>>> lexical reordering
> >>>>>> wondering if I will get better results but I have an error message
> >>>>>> filesystem root low disk space before it crashes.
> >>>>>> is this model taking more disk space in some ways ?
> >>>>>>
> >>>>>> I will next try to use more corpora of which in domain with my
> internal TMX
> >>>>>>
> >>>>>> thanks for your answers.
> >>>>>>
> >>>>>> Le 04/08/2015 16:02, Hieu Hoang a ?crit :
> >>>>>>>
> >>>>>>>
> >>>>>>> On 03/08/2015 13:00, Vincent Nguyen wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Just a heads up on some EMS results, to get your experienced
> opinions.
> >>>>>>>>
> >>>>>>>> Corpus: Europarlv7 + NC2010
> >>>>>>>> fr => en
> >>>>>>>> Evaluation NC2011.
> >>>>>>>>
> >>>>>>>> 1) IRSTLM vs KenLM is much slower for training / tuning.
> >>>>>>>
> >>>>>>> that sounds right. KenLM is also multithreaded, IRSTLM can only be
> >>>>>>> used in single-threaded decoding.
> >>>>>>>>
> >>>>>>>> 2) BLEU results are almost the same (25.7 with Irstlm, 26.14 with
> KenLM)
> >>>>>>>
> >>>>>>> true
> >>>>>>>>
> >>>>>>>> 3) Compact Mode is faster than onDisk with a short test (77
> segments 96
> >>>>>>>> seconds, vs 126 seconds)
> >>>>>>>
> >>>>>>> true
> >>>>>>>>
> >>>>>>>> 4) One last thing I do not understand though :
> >>>>>>>> For sake of checking, I replaced NC2011 by NC2010 in the
> evaluation (I
> >>>>>>>> know since NC2010 is part of training, should not be relevant)
> >>>>>>>> I got roughly the same BLEU score. I would have expected a higher
> score
> >>>>>>>> with a test set inculded in the training corpus.
> >>>>>>>>
> >>>>>>>> makes sense ?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Next steps :
> >>>>>>>> What path should I use to get better scores ? I read the
> 'optimize'
> >>>>>>>> section of the website which deals more with speed
> >>>>>>>> and of course I will appply all of this but I was interested in
> tips to
> >>>>>>>> get more quality if possible.
> >>>>>>>
> >>>>>>> look into domain adaptation if you have multiple training corpora,
> >>>>>>> some of which is in-domain and some out-of-domain.
> >>>>>>>
> >>>>>>> Other than that, getting good bleu score is a research open
> question.
> >>>>>>>
> >>>>>>> Well done on getting this far
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Moses-support mailing list
> >>>>>>>> Moses-support@mit.edu
> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>
> >>>>>> _______________________________________________
> >>>>>> Moses-support mailing list
> >>>>>> Moses-support@mit.edu
> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>> The University of Edinburgh is a charitable body, registered in
> >>> Scotland, with registration number SC005336.
> >>>
> >>> _______________________________________________
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>
> >>> The University of Edinburgh is a charitable body, registered in
> >>> Scotland, with registration number SC005336.
> >>>
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150806/a7f4f4f5/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 106, Issue 17
**********************************************

Moses-support Digest, Vol 106, Issue 17

0 Response to "Moses-support Digest, Vol 106, Issue 17"

Post a Comment