Moses-support Digest, Vol 108, Issue 22

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Faster decoding with multiple moses instances (Mike Ladwig)
2. Re: Faster decoding with multiple moses instances (Hieu Hoang)

----------------------------------------------------------------------

Message: 1
Date: Thu, 8 Oct 2015 09:48:33 -0400
From: Mike Ladwig <mdladwig@gmail.com>
Subject: Re: [Moses-support] Faster decoding with multiple moses
instances
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: Moses Support <moses-support@mit.edu>
Message-ID:
<CAB3VaD2JxU=+oB8f3msRUX+nx1ZKywnb_Dm+fwqdMS5gt_VY9g@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I knew you would ask for results and hoped to unearth my notes but no luck.
I know I don't have models archived. What I remember:

- Moses command line
- Measured total time to translate test set
- Compared .91 (or whatever the long-term stable version was then) to 2.11
- Big enterprise server (4 x quad core Xeon, 64GB memory)
- Scaling from 1 to 16 threads / processes
- In memory tables (binary), srilm .91 kenlm 2.11
- Large models, de->en, trained from everything I had - Europarl v5, UN,
etc.
- Threaded optimal was ~9, processes ~6

On Thu, Oct 8, 2015 at 9:15 AM, Hieu Hoang <hieuhoang@gmail.com> wrote:

>
> Mike Ladwig - do you still have the results and/or model files you can
> share? I just did a comparison between v 0.91 and current master. Master is
> way better at process and multi-threaded. The old code is similarly
> afflicted the problem of not scaling above 10-15 threads. I'm surprised
> that the old code is slower in single threaded, but i'm not surprised about
> multi-threading. We've haven't traditionally looked at the that problem.
> However, v0.91 does use less memory.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151008/813ef985/attachment-0001.html

------------------------------

Message: 2
Date: Thu, 8 Oct 2015 14:53:00 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Faster decoding with multiple moses
instances
To: Vito Mandorino <vito.mandorino@linguacustodia.com>, Moses Support
<moses-support@mit.edu>, Mike Ladwig <mdladwig@gmail.com>
Message-ID: <5616753C.5000004@gmail.com>
Content-Type: text/plain; charset="windows-1252"

oh, i forgot to attach results

1 5 10 15 20 25 30 35
Current master 1m50.835s real0m24.373s real0m14.991s real0m12.999s
real0m11.012s real0m10.012s real0m10.108s 0m11.226s

1m48.409s user1m51.587s user2m6.720s user2m37.313s user2m42.219s
user2m54.870s user3m20.491s 4m5.443s

0m2.412s sys0m2.663s sys0m3.350s sys0m5.051s sys0m7.094s
sys0m12.152s sys0m17.519s 0m20.036s

v 0.91 real9m48.969s real2m0.393s real1m3.791s real0m48.806s
real0m41.872s real0m42.046s real0m39.679s 0m43.448s

user9m47.835s user9m34.309s user9m33.838s user10m24.816s
user11m46.145s user13m41.972s user15m13.436s 15m28.816s

sys0m1.064s sys0m1.473s sys0m1.318s sys0m2.586s sys0m3.890s
sys0m4.922s sys0m7.033s 0m25.970s

Current master (cube pruning 400) real0m17.623s real0m5.720s
real0m4.512s real0m4.605s real0m4.791s real0m4.965s real0m5.035s
0m4.940s

user0m14.856s user0m18.203s user0m21.629s user0m26.831s
user0m29.115s user0m29.152s user0m27.692s 0m27.632s

sys0m2.780s sys0m3.922s sys0m6.138s sys0m8.377s sys0m10.442s
sys0m12.397s sys0m12.446s 0m10.814s

v0.91 (cube pruning 400) 1m30.621s real0m38.222s real0m41.789s
real1m17.326s real1m34.733s real1m49.563s real2m15.697s 2m20.562s

1m21.425s user2m12.044s user3m19.893s user5m41.552s user6m27.785s
user6m25.569s user6m25.127s 5m40.199s

0m9.163s sys0m9.914s sys0m6.334s sys0m18.870s sys0m46.127s
sys2m53.603s sys5m33.046s 6m59.326s

On 08/10/2015 14:15, Hieu Hoang wrote:
> thanks for all your comments. It may look like we'll keep both
> multi-process and multi-thread for the time being. There may be use
> for both further down the line.
>
> Vito - no-one's written a wrapper to do multi-process, rather than
> multi-thread, with mosesserver. I would think the speed gain would be
> the same.
>
> Mike Ladwig - do you still have the results and/or model files you can
> share? I just did a comparison between v 0.91 and current master.
> Master is way better at process and multi-threaded. The old code is
> similarly afflicted the problem of not scaling above 10-15 threads.
> I'm surprised that the old code is slower in single threaded, but i'm
> not surprised about multi-threading. We've haven't traditionally
> looked at the that problem. However, v0.91 does use less memory.
>
> On 08/10/2015 09:25, Vito Mandorino wrote:
>> Hi all,
>>
>> what about mosesserver? Do you think the same speed gains would occur?
>>
>> Best,
>> Vito
>>
>> 2015-10-06 22:39 GMT+02:00 Michael Denkowski
>> <michael.j.denkowski@gmail.com <mailto:michael.j.denkowski@gmail.com>>:
>>
>> Hi Hieu and all,
>>
>> I just checked in a bug fix for the multi_moses.py script. I
>> forgot to override the number of threads for each moses command,
>> so if [threads] were specified in the moses.ini, the multi-moses
>> runs were cheating by running a bunch of multi-threaded
>> instances. If threads were only being specified on the command
>> line, the script was correctly stripping the flag so everything
>> should be good. I finished a benchmark on my system with an
>> unpruned compact PT (with the fixed script) and got the following:
>>
>> 16 threads 5.38 sent/sec
>> 16 procs 13.51 sent/sec
>>
>> This definitely used a lot more memory though. Based on some
>> very rough estimates looking at free system memory, the memory
>> mapped suffix array PT went from 2G to 6G with 16 processes while
>> the compact PT went from 3G to 37G. For cases where everything
>> fits into memory, I've seen significant speedup from
>> multi-process decoding.
>>
>> For cases where things don't fit into memory, the multi-moses
>> script could be extended to start as many multi-threaded
>> instances as will fit into ram and farm out sentences in a way
>> that keeps all of the CPUs busy. I know Marcin has mentioned
>> using GNU parallel.
>>
>> Best,
>> Michael
>>
>> On Tue, Oct 6, 2015 at 4:16 PM, Hieu Hoang <hieuhoang@gmail.com>
>> wrote:
>>
>> I've just run some comparison between multithreaded decoder
>> and the multi_moses.py script. It's good stuff.
>>
>> It make me seriously wonder whether we should use abandon
>> multi-threading and go all out for the multi-process approach.
>>
>> There's some advantage to multi-thread - eg. where model
>> files are loaded into memory rather than memory map. But
>> there's disadvantages too - it more difficult to maintain and
>> there's about a 10% overhead.
>>
>> What do people think?
>>
>> Phrase-based:
>>
>> 1 5 10 15 20 25 30
>> 32 real4m37.000s real1m15.391s real0m51.217s
>> real0m48.287s real0m50.719s real0m52.027s real0m53.045s
>> Baseline (Compact pt) user4m21.544s user5m28.597s
>> user6m38.227s user8m0.975s user8m21.122s user8m3.195s
>> user8m4.663s
>>
>> sys0m15.451s sys0m34.669s sys0m53.867s sys1m10.515s
>> sys1m20.746s sys1m24.368s sys1m23.677s
>>
>>
>>
>>
>>
>>
>>
>>
>> 34 4m49.474s real1m17.867s real0m43.096s real0m31.999s
>> 0m26.497s 0m26.296s killed
>> (32) + multi_moses 4m33.580s user4m40.486s user4m56.749s
>> user5m6.692s 5m43.845s 7m34.617s
>>
>> 0m15.957s sys0m32.347s sys0m51.016s sys1m11.106s
>> 1m44.115s 2m21.263s
>>
>>
>>
>>
>>
>>
>>
>>
>> 38 real4m46.254s real1m16.637s real0m49.711s
>> real0m48.389s real0m49.144s real0m51.676s real0m52.472s
>> Baseline (Probing pt) user4m30.596s user5m32.500s
>> user6m23.706s user7m40.791s user7m51.946s user7m52.892s
>> user7m53.569s
>>
>> sys0m15.624s sys0m36.169s sys0m49.433s sys1m6.812s
>> sys1m9.614s sys1m13.108s sys1m12.644s
>>
>>
>>
>>
>>
>>
>>
>>
>> 39 real4m43.882s real1m17.849s real0m34.245s
>> real0m31.318s real0m28.054s real0m24.120s real0m22.520s
>> (38) + multi moses user4m29.212s user4m47.693s
>> user5m5.750s user5m33.573s user6m18.847s user7m19.642s
>> user8m38.013s
>>
>> sys0m15.835s sys0m25.398s sys0m36.716s sys0m41.349s
>> sys0m48.494s sys1m0.843s sys1m13.215s
>>
>>
>> Hiero:
>> 3 real5m33.011s real1m28.935s real0m59.470s
>> real1m0.315s real0m55.619s real0m57.347s real0m59.191s
>> 1m2.786s
>> 6/10 baseline user4m53.187s user6m23.521s user8m17.170s
>> user12m48.303s user14m45.954s user17m58.109s
>> user20m22.891s 21m13.605s
>>
>> sys0m39.696s sys0m51.519s sys1m3.788s sys1m22.125s
>> sys1m58.718s sys2m51.249s sys4m4.807s 4m37.691s
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 4
>> real1m27.215s real0m40.495s real0m36.206s real0m28.623s
>> real0m26.631s real0m25.817s 0m25.401s
>> (3) + multi_moses
>> user5m4.819s user5m42.070s user5m35.132s user6m46.001s
>> user7m38.151s user9m6.500s 10m32.739s
>>
>>
>> sys0m38.039s sys0m45.753s sys0m44.117s sys0m52.285s
>> sys0m56.655s sys1m6.749s 1m16.935s
>>
>>
>> On 05/10/2015 16:05, Michael Denkowski wrote:
>>> Hi Philipp,
>>>
>>> Unfortunately I don't have a precise measurement. If anyone
>>> knows of a good way to benchmark a process tree with lots of
>>> memory mapping the same files, I would be glad to run it.
>>>
>>> --Michael
>>>
>>> On Mon, Oct 5, 2015 at 10:26 AM, Philipp Koehn <phi@jhu.edu>
>>> wrote:
>>>
>>> Hi,
>>>
>>> great - that will be very useful.
>>>
>>> Since you just ran the comparison - do you have any
>>> numbers on "still allowed everything to fit into
>>> memory", i.e., how much more memory is used by running
>>> parallel instances?
>>>
>>> -phi
>>>
>>> On Mon, Oct 5, 2015 at 10:15 AM, Michael Denkowski
>>> <michael.j.denkowski@gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> Like some other Moses users, I noticed diminishing
>>> returns from running Moses with several threads. To
>>> work around this, I added a script to run multiple
>>> single-threaded instances of moses instead of one
>>> multi-threaded instance. In practice, this sped
>>> things up by about 2.5x for 16 cpus and using memory
>>> mapped models still allowed everything to fit into
>>> memory.
>>>
>>> If anyone else is interested in using this, you can
>>> prefix a moses command with
>>> scripts/generic/multi_moses.py. To use multiple
>>> instances in mert-moses.pl <http://mert-moses.pl>,
>>> specify --multi-moses and control the number of
>>> parallel instances with --decoder-flags='-threads N'.
>>>
>>> Below is a benchmark on WMT fr-en data (2M training
>>> sentences, 400M words mono, suffix array PT, compact
>>> reordering, 5-gram KenLM) testing default stack
>>> decoding vs cube pruning without and with the
>>> parallelization script (+multi):
>>>
>>> ---
>>> 1cpu sent/sec
>>> stack 1.04
>>> cube 2.10
>>> ---
>>> 16cpu sent/sec
>>> stack 7.63
>>> +multi 12.20
>>> cube 7.63
>>> +multi 18.18
>>> ---
>>>
>>> --Michael
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> --
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> --
>> *M**. Vito MANDORINO -- Chief Scientist*
>>
>> Description : Description : lingua_custodia_final full logo
>>
>> */The Translation Trustee/*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89*
>>
>> *Email :****<mailto:massinissa.ahmim@linguacustodia.com>vito.mandorino@linguacustodia.com***
>>
>> *Website :****www.linguacustodia.com
>> <http://www.linguacustodia.com/> - www.thetranslationtrustee.com *
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> Hieu Hoang
> http://www.hoang.co.uk/hieu

--
Hieu Hoang
http://www.hoang.co.uk/hieu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151008/1d7f0090/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20151008/1d7f0090/attachment.jpe

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 108, Issue 22
**********************************************

Moses-support Digest, Vol 108, Issue 22

0 Response to "Moses-support Digest, Vol 108, Issue 22"

Post a Comment