Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Faster decoding with multiple moses instances
(Marcin Junczys-Dowmunt)
2. Re: Faster decoding with multiple moses instances (Vincent Nguyen)
3. KenLM poison (???)
----------------------------------------------------------------------
Message: 1
Date: Mon, 05 Oct 2015 18:14:03 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Faster decoding with multiple moses
instances
To: moses-support@mit.edu
Message-ID: <5612A1CB.8090406@amu.edu.pl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Very bad unpruned and with mulithreading! :)
Is this with the nonblockpt branch? I am slowly running out of ideas
what might be the cause of this. Frequent vector realloaction?
On 05.10.2015 16:48, Hieu Hoang wrote:
> what pt implementation did you use, and had it been pre-pruned so that
> there's a limit on how many target phrase for a particular source
> phrase? ie. don't have 10,000 entries for 'the' .
>
> I've been digging around multithreading in the last few weeks. I've
> noticed that the compact pt is VERY bad at handling unpruned pt.
> Cores
> 1 5 10 15 20 25
> Unpruned compact pt 143 42 32 38 52 62
> probing pt 245 58 33 25 24 21
> Pruned compact pt 119 24 15 10 10 10
> probing pt 117 25 25 10 10 10
>
>
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 5 October 2015 at 15:15, Michael Denkowski
> <michael.j.denkowski@gmail.com <mailto:michael.j.denkowski@gmail.com>>
> wrote:
>
> Hi all,
>
> Like some other Moses users, I noticed diminishing returns from
> running Moses with several threads. To work around this, I added
> a script to run multiple single-threaded instances of moses
> instead of one multi-threaded instance. In practice, this sped
> things up by about 2.5x for 16 cpus and using memory mapped models
> still allowed everything to fit into memory.
>
> If anyone else is interested in using this, you can prefix a moses
> command with scripts/generic/multi_moses.py. To use multiple
> instances in mert-moses.pl <http://mert-moses.pl>, specify
> --multi-moses and control the number of parallel instances with
> --decoder-flags='-threads N'.
>
> Below is a benchmark on WMT fr-en data (2M training sentences,
> 400M words mono, suffix array PT, compact reordering, 5-gram
> KenLM) testing default stack decoding vs cube pruning without and
> with the parallelization script (+multi):
>
> ---
> 1cpu sent/sec
> stack 1.04
> cube 2.10
> ---
> 16cpu sent/sec
> stack 7.63
> +multi 12.20
> cube 7.63
> +multi 18.18
> ---
>
> --Michael
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
------------------------------
Message: 2
Date: Mon, 5 Oct 2015 19:49:37 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] Faster decoding with multiple moses
instances
To: moses-support@mit.edu
Message-ID: <5612B831.4080403@neuf.fr>
Content-Type: text/plain; charset="windows-1252"
After many tests, as mentioned before I had made these changes in EMS
score-settings = "--GoodTuring --MinScore 2:0.001"
and
pop limit cube pruning at 400 (instead of 5000 in EMS !!!!)
speed is much much higher (without impact on translation)
Le 05/10/2015 17:20, Philipp Koehn a ?crit :
> Hi,
>
> with regard to pruning ---
>
> the example EMS config files have
>
> [TRAINING]
> score-settings = "--GoodTuring --MinScore 2:0.0001"
>
> which carries out threshold pruning during phrase table construction,
> going a good way towards avoiding too many translation options per phrase.
>
> -phi
>
> On Mon, Oct 5, 2015 at 11:08 AM, Barry Haddow <bhaddow@inf.ed.ac.uk
> <mailto:bhaddow@inf.ed.ac.uk>> wrote:
>
> Hi Hieu
>
> That's exactly why I took to pre-pruning the phrase table, as I
> mentioned on Friday. I had something like 750,000 translations of
> the most common word, and it took half-an-hour to get the first
> sentence translated.
>
> cheers - Barry
>
>
> On 05/10/15 15:48, Hieu Hoang wrote:
>> what pt implementation did you use, and had it been pre-pruned so
>> that there's a limit on how many target phrase for a particular
>> source phrase? ie. don't have 10,000 entries for 'the' .
>>
>> I've been digging around multithreading in the last few weeks.
>> I've noticed that the compact pt is VERY bad at handling unpruned
>> pt.
>> Cores
>> 1 5 10 15 20 25
>> Unpruned compact pt 143 42 32 38 52 62
>> probing pt 245 58 33 25 24 21
>> Pruned compact pt 119 24 15 10 10 10
>> probing pt 117 25 25 10 10 10
>>
>>
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 5 October 2015 at 15:15, Michael Denkowski
>> <michael.j.denkowski@gmail.com
>> <mailto:michael.j.denkowski@gmail.com>> wrote:
>>
>> Hi all,
>>
>> Like some other Moses users, I noticed diminishing returns
>> from running Moses with several threads. To work around
>> this, I added a script to run multiple single-threaded
>> instances of moses instead of one multi-threaded instance. In
>> practice, this sped things up by about 2.5x for 16 cpus and
>> using memory mapped models still allowed everything to fit
>> into memory.
>>
>> If anyone else is interested in using this, you can prefix a
>> moses command with scripts/generic/multi_moses.py. To use
>> multiple instances in mert-moses.pl <http://mert-moses.pl>,
>> specify --multi-moses and control the number of parallel
>> instances with --decoder-flags='-threads N'.
>>
>> Below is a benchmark on WMT fr-en data (2M training
>> sentences, 400M words mono, suffix array PT, compact
>> reordering, 5-gram KenLM) testing default stack decoding vs
>> cube pruning without and with the parallelization script
>> (+multi):
>>
>> ---
>> 1cpu sent/sec
>> stack 1.04
>> cube 2.10
>> ---
>> 16cpu sent/sec
>> stack 7.63
>> +multi 12.20
>> cube 7.63
>> +multi 18.18
>> ---
>>
>> --Michael
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151005/12f6463b/attachment.html
------------------------------
Message: 3
Date: Mon, 5 Oct 2015 20:49:13 +0300
From: ??? <fountain1128@gmail.com>
Subject: [Moses-support] KenLM poison
To: moses-support@mit.edu
Message-ID: <21FEFF49-F91E-40DB-B62B-EFCBA449F63C@gmail.com>
Content-Type: text/plain; charset=gb2312
Hi,
Yes, you are right. I released some space, and it?s working now.
The error message could have been clearer anyway.
Thank you, Ken.
Best,
Fangting
>
> Message: 2
> Date: Mon, 5 Oct 2015 15:15:20 +0100
> From: Kenneth Heafield <moses@kheafield.com>
> Subject: Re: [Moses-support] KenLM poison
> To: moses-support@mit.edu
> Message-ID: <561285F8.2030708@kheafield.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hi,
>
> I'm still betting it's out of disk space writing the ARPA.
> Multithreaded exception handling is annoying. This is there to prevent
> deadlock.
>
> Kenneth
>
> On 10/05/2015 01:52 PM, ??? wrote:
>> Dear all,
>>
>> I?m building the baseline system, and some error occurred during the
>> last step of LM training process as the first attached file shows.
>>
>> I checked another case of ?Last input should have been poison?, but that
>> one has more detailed information ?no space left on device?, while mine
>> has nothing but that sentence.
>>
>> The exact command I use for Kenlm is:
>> $MOSES/bin/lmplz -o 3 < ~/es-fi/OpenSubtitles2013.es-fi.true.fi
>> <http://opensubtitles2013.es-fi.true.fi/> > OpenSubtitles2013.es-fi.arpa.fi
>> <http://opensubtitles2013.es-fi.arpa.fi/>
>>
>> As mosesdecoder is installed at the administrator?s directory instead of
>> my own, "~/mosesdecoder "is replaced by $MOSES.
>>
>> my corpus(the language pair is Spanish to Finnish) was downloaded from
>> Opus(http://opus.lingfil.uu.se/OpenSubtitles2013.php) in the Moses format.
>>
>> The downloaded profile contains three files: OpenSubtitles2013.es-fi.es
>> <http://OpenSubtitles2013.es>, OpenSubtitles2013.es
>> <http://OpenSubtitles2013.es>-fi.fi <http://OpenSubtitles2013.fi>,
>> and OpenSubtitles2013.es <http://OpenSubtitles2013.es>-fi.ids.
>>
>> The tokenization, truecasing and cleaning are all completed with the
>> ?es" and ?fi? files. Is it possible if the error has something to do
>> with the ?ids? file?
>>
>> Here attaches the output of LM process, and the command I used for
>> corpus preparation.
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 5 Oct 2015 10:15:45 -0400
> From: Michael Denkowski <michael.j.denkowski@gmail.com>
> Subject: [Moses-support] Faster decoding with multiple moses instances
> To: Moses Support <moses-support@mit.edu>
> Message-ID:
> <CA+-GegK2xqzHz2G39eRe=3VsBFjtTOcwtFongMsr4dGaGza_Uw@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi all,
>
> Like some other Moses users, I noticed diminishing returns from running
> Moses with several threads. To work around this, I added a script to run
> multiple single-threaded instances of moses instead of one multi-threaded
> instance. In practice, this sped things up by about 2.5x for 16 cpus and
> using memory mapped models still allowed everything to fit into memory.
>
> If anyone else is interested in using this, you can prefix a moses command
> with scripts/generic/multi_moses.py. To use multiple instances in
> mert-moses.pl, specify --multi-moses and control the number of parallel
> instances with --decoder-flags='-threads N'.
>
> Below is a benchmark on WMT fr-en data (2M training sentences, 400M words
> mono, suffix array PT, compact reordering, 5-gram KenLM) testing default
> stack decoding vs cube pruning without and with the parallelization script
> (+multi):
>
> ---
> 1cpu sent/sec
> stack 1.04
> cube 2.10
> ---
> 16cpu sent/sec
> stack 7.63
> +multi 12.20
> cube 7.63
> +multi 18.18
> ---
>
> --Michael
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151005/ec781c46/attachment-0001.html
>
> ------------------------------
>
> Message: 4
> Date: Mon, 5 Oct 2015 15:24:51 +0100
> From: Hieu Hoang <hieuhoang@gmail.com>
> Subject: Re: [Moses-support] Do debugging in the decoder?
> To: Matthias Huck <mhuck@inf.ed.ac.uk>
> Cc: moses-support <moses-support@mit.edu>
> Message-ID:
> <CAEKMkbi5D3LHVYFD8zfynM0xuXDyBrDge+jgx8nG-y6zGwVkPA@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> You can use gdb to put breakpoints on the code and step through it.
>
> I personally use Eclipse+CDT to do my debugging, it's just a front end to
> gdb. You can see this video by Dominik to see how to set up Eclipse with
> moses
> https://vimeo.com/129306919
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 5 October 2015 at 15:13, Matthias Huck <mhuck@inf.ed.ac.uk> wrote:
>
>> Hi Yuqi,
>>
>> I don't know. But maybe something like running a profiler on a
>> small-scale setup and printing the call graph would be more convenient
>> anyway? If you don't just want to try and read the source code right
>> away.
>>
>> Maybe someone else has better suggestions.
>>
>> Cheers,
>> Matthias
>>
>>
>> On Mon, 2015-10-05 at 09:59 +0200, Yuqi Zhang wrote:
>>> Thanks a lot, Matthias and Hieu!
>>>
>>>
>>> I have the debug version in Eclipse already and can compiled it
>>> without errors.
>>> I could follow the debugging until to decoder(translation):
>>>
>>>
>>> pool.Submit(task); // in Exportinterface.cpp
>>>
>>> I didn't find a way to see what happen in the 'translation' task, e.g.
>>> how a source segment looks for its translations in PT. Is there a way
>>> to let me know what happened in 'translation' task?
>>>
>>> Thanks!
>>>
>>> Best regards,
>>>
>>> Yuqi
>>>
>>>
>>>
>>> 2015-10-05 1:07 GMT+02:00 Hieu Hoang <hieuhoang@gmail.com>:
>>> i think it might be
>>> ./bjam .... variant=debug
>>> not
>>> ./bjam ... --variant=debug
>>>
>>> Also, please git pull. There was a minor compile error when
>>> using this option, which has now been fixed
>>>
>> https://github.com/moses-smt/mosesdecoder/commit/72bef00781de9821f2cff227ca7417939041d4e1
>>>
>>>
>>> On 04/10/2015 23:25, Matthias Huck wrote:
>>> Hi Yuqi,
>>>
>>> You can build a debug compile by calling bjam with:
>>>
>>> --variant=debug
>>>
>>> Cheers,
>>> Matthias
>>>
>>>
>>> On Sun, 2015-10-04 at 23:05 +0200, Yuqi Zhang wrote:
>>> Hello,
>>>
>>>
>>> How can I debug the decoder?
>>>
>>>
>>> Must I turn off the pre-compile signal
>>> "WITH_THREADS"?
>>> Can it be turned off? (Since I have a try, but
>>> some head files
>>> regarding threads are always included.)
>>> Or is there any other way to allow me to get
>>> into the decoder?
>>>
>>>
>>> Thanks a lot!
>>>
>>>
>>> Best regards,
>>> Yuqi
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151005/85a840d1/attachment.html
>
> ------------------------------
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> End of Moses-support Digest, Vol 108, Issue 10
> **********************************************
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 108, Issue 13
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 108, Issue 13"
Post a Comment