Moses-support Digest, Vol 108, Issue 24

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Faster decoding with multiple moses instances (Vincent Nguyen)

----------------------------------------------------------------------

Message: 1
Date: Thu, 8 Oct 2015 20:04:48 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] Faster decoding with multiple moses
instances
To: moses-support@mit.edu
Message-ID: <5616B040.5030208@neuf.fr>
Content-Type: text/plain; charset="windows-1252"

Michael,
what score-setting do you use to achieve these results ?
if search algo= 1 what cube pruning number ?

Le 08/10/2015 19:05, Michael Denkowski a ?crit :
> Hi all,
>
> I extended the multi_moses.py script to support multi-threaded moses
> instances for cases where memory limits the number of decoders that
> can run in parallel. The threads arg now takes the form "--threads
> P:T:E" to run P processes using T threads each and an optional extra
> process running E threads. The script sends input lines to instances
> as they have free threads so all CPUs stay busy for the full decoding run.
>
> I ran some more bench marks with the CompactPT system trading off
> between threads and processes:
>
> procs/threads per sent/sec
> 1x16 5.46
> 2x8 7.58
> 4x4 9.71
> 8x2 12.50
> 16x1 14.08
>
> From the results so far, it's best to use as many instances as will
> fit into memory and evenly distribute CPUs. For example, a system
> with 32 CPUs that could fit 3 copies of moses into memory could use
> "--threads 2:11:10" to run 2 instances with 11 threads each and 1
> instance with 10 threads. The script can be used with mert-moses.pl
> <http://mert-moses.pl> via the --multi-moses flag and
> --decoder-flags='--threads P:T:E'.
>
> Best,
> Michael
>
>
> On Tue, Oct 6, 2015 at 4:39 PM, Michael Denkowski
> <michael.j.denkowski@gmail.com <mailto:michael.j.denkowski@gmail.com>>
> wrote:
>
> Hi Hieu and all,
>
> I just checked in a bug fix for the multi_moses.py script. I
> forgot to override the number of threads for each moses command,
> so if [threads] were specified in the moses.ini, the multi-moses
> runs were cheating by running a bunch of multi-threaded
> instances. If threads were only being specified on the command
> line, the script was correctly stripping the flag so everything
> should be good. I finished a benchmark on my system with an
> unpruned compact PT (with the fixed script) and got the following:
>
> 16 threads 5.38 sent/sec
> 16 procs 13.51 sent/sec
>
> This definitely used a lot more memory though. Based on some very
> rough estimates looking at free system memory, the memory mapped
> suffix array PT went from 2G to 6G with 16 processes while the
> compact PT went from 3G to 37G. For cases where everything fits
> into memory, I've seen significant speedup from multi-process
> decoding.
>
> For cases where things don't fit into memory, the multi-moses
> script could be extended to start as many multi-threaded instances
> as will fit into ram and farm out sentences in a way that keeps
> all of the CPUs busy. I know Marcin has mentioned using GNU parallel.
>
> Best,
> Michael
>
> On Tue, Oct 6, 2015 at 4:16 PM, Hieu Hoang <hieuhoang@gmail.com
> <mailto:hieuhoang@gmail.com>> wrote:
>
> I've just run some comparison between multithreaded decoder
> and the multi_moses.py script. It's good stuff.
>
> It make me seriously wonder whether we should use abandon
> multi-threading and go all out for the multi-process approach.
>
> There's some advantage to multi-thread - eg. where model files
> are loaded into memory rather than memory map. But there's
> disadvantages too - it more difficult to maintain and there's
> about a 10% overhead.
>
> What do people think?
>
> Phrase-based:
>
> 1 5 10 15 20 25 30
> 32 real4m37.000s real1m15.391s real0m51.217s
> real0m48.287s real0m50.719s real0m52.027s real0m53.045s
> Baseline (Compact pt) user4m21.544s user5m28.597s
> user6m38.227s user8m0.975s user8m21.122s user8m3.195s
> user8m4.663s
>
> sys0m15.451s sys0m34.669s sys0m53.867s sys1m10.515s
> sys1m20.746s sys1m24.368s sys1m23.677s
>
>
>
>
>
>
>
>
> 34 4m49.474s real1m17.867s real0m43.096s real0m31.999s
> 0m26.497s 0m26.296s killed
> (32) + multi_moses 4m33.580s user4m40.486s user4m56.749s
> user5m6.692s 5m43.845s 7m34.617s
>
> 0m15.957s sys0m32.347s sys0m51.016s sys1m11.106s
> 1m44.115s 2m21.263s
>
>
>
>
>
>
>
>
> 38 real4m46.254s real1m16.637s real0m49.711s
> real0m48.389s real0m49.144s real0m51.676s real0m52.472s
> Baseline (Probing pt) user4m30.596s user5m32.500s
> user6m23.706s user7m40.791s user7m51.946s user7m52.892s
> user7m53.569s
>
> sys0m15.624s sys0m36.169s sys0m49.433s sys1m6.812s
> sys1m9.614s sys1m13.108s sys1m12.644s
>
>
>
>
>
>
>
>
> 39 real4m43.882s real1m17.849s real0m34.245s
> real0m31.318s real0m28.054s real0m24.120s real0m22.520s
> (38) + multi moses user4m29.212s user4m47.693s
> user5m5.750s user5m33.573s user6m18.847s user7m19.642s
> user8m38.013s
>
> sys0m15.835s sys0m25.398s sys0m36.716s sys0m41.349s
> sys0m48.494s sys1m0.843s sys1m13.215s
>
>
> Hiero:
> 3 real5m33.011s real1m28.935s real0m59.470s real1m0.315s
> real0m55.619s real0m57.347s real0m59.191s 1m2.786s
> 6/10 baseline user4m53.187s user6m23.521s user8m17.170s
> user12m48.303s user14m45.954s user17m58.109s
> user20m22.891s 21m13.605s
>
> sys0m39.696s sys0m51.519s sys1m3.788s sys1m22.125s
> sys1m58.718s sys2m51.249s sys4m4.807s 4m37.691s
>
>
>
>
>
>
>
>
>
> 4
> real1m27.215s real0m40.495s real0m36.206s real0m28.623s
> real0m26.631s real0m25.817s 0m25.401s
> (3) + multi_moses
> user5m4.819s user5m42.070s user5m35.132s user6m46.001s
> user7m38.151s user9m6.500s 10m32.739s
>
>
> sys0m38.039s sys0m45.753s sys0m44.117s sys0m52.285s
> sys0m56.655s sys1m6.749s 1m16.935s
>
>
> On 05/10/2015 16:05, Michael Denkowski wrote:
>> Hi Philipp,
>>
>> Unfortunately I don't have a precise measurement. If anyone
>> knows of a good way to benchmark a process tree with lots of
>> memory mapping the same files, I would be glad to run it.
>>
>> --Michael
>>
>> On Mon, Oct 5, 2015 at 10:26 AM, Philipp Koehn <phi@jhu.edu
>> <mailto:phi@jhu.edu>> wrote:
>>
>> Hi,
>>
>> great - that will be very useful.
>>
>> Since you just ran the comparison - do you have any
>> numbers on "still allowed everything to fit into memory",
>> i.e., how much more memory is used by running parallel
>> instances?
>>
>> -phi
>>
>> On Mon, Oct 5, 2015 at 10:15 AM, Michael Denkowski
>> <michael.j.denkowski@gmail.com
>> <mailto:michael.j.denkowski@gmail.com>> wrote:
>>
>> Hi all,
>>
>> Like some other Moses users, I noticed diminishing
>> returns from running Moses with several threads. To
>> work around this, I added a script to run multiple
>> single-threaded instances of moses instead of one
>> multi-threaded instance. In practice, this sped
>> things up by about 2.5x for 16 cpus and using memory
>> mapped models still allowed everything to fit into
>> memory.
>>
>> If anyone else is interested in using this, you can
>> prefix a moses command with
>> scripts/generic/multi_moses.py. To use multiple
>> instances in mert-moses.pl <http://mert-moses.pl>,
>> specify --multi-moses and control the number of
>> parallel instances with --decoder-flags='-threads N'.
>>
>> Below is a benchmark on WMT fr-en data (2M training
>> sentences, 400M words mono, suffix array PT, compact
>> reordering, 5-gram KenLM) testing default stack
>> decoding vs cube pruning without and with the
>> parallelization script (+multi):
>>
>> ---
>> 1cpu sent/sec
>> stack 1.04
>> cube 2.10
>> ---
>> 16cpu sent/sec
>> stack 7.63
>> +multi 12.20
>> cube 7.63
>> +multi 18.18
>> ---
>>
>> --Michael
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151008/f54aec04/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 108, Issue 24
**********************************************

Moses-support Digest, Vol 108, Issue 24

0 Response to "Moses-support Digest, Vol 108, Issue 24"

Post a Comment