Moses-support Digest, Vol 108, Issue 23

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Faster decoding with multiple moses instances
(Michael Denkowski)


----------------------------------------------------------------------

Message: 1
Date: Thu, 8 Oct 2015 13:05:03 -0400
From: Michael Denkowski <michael.j.denkowski@gmail.com>
Subject: Re: [Moses-support] Faster decoding with multiple moses
instances
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: Moses Support <moses-support@mit.edu>
Message-ID:
<CA+-GegJ=1+grvdZv43OCeUNU4R0BfZyN8gJvQFBPaLz-+Oe-8w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi all,

I extended the multi_moses.py script to support multi-threaded moses
instances for cases where memory limits the number of decoders that can run
in parallel. The threads arg now takes the form "--threads P:T:E" to run P
processes using T threads each and an optional extra process running E
threads. The script sends input lines to instances as they have free
threads so all CPUs stay busy for the full decoding run.

I ran some more bench marks with the CompactPT system trading off between
threads and processes:

procs/threads per sent/sec
1x16 5.46
2x8 7.58
4x4 9.71
8x2 12.50
16x1 14.08

>From the results so far, it's best to use as many instances as will fit
into memory and evenly distribute CPUs. For example, a system with 32 CPUs
that could fit 3 copies of moses into memory could use "--threads 2:11:10"
to run 2 instances with 11 threads each and 1 instance with 10 threads.
The script can be used with mert-moses.pl via the --multi-moses flag and
--decoder-flags='--threads P:T:E'.

Best,
Michael


On Tue, Oct 6, 2015 at 4:39 PM, Michael Denkowski <
michael.j.denkowski@gmail.com> wrote:

> Hi Hieu and all,
>
> I just checked in a bug fix for the multi_moses.py script. I forgot to
> override the number of threads for each moses command, so if [threads] were
> specified in the moses.ini, the multi-moses runs were cheating by running a
> bunch of multi-threaded instances. If threads were only being specified on
> the command line, the script was correctly stripping the flag so everything
> should be good. I finished a benchmark on my system with an unpruned
> compact PT (with the fixed script) and got the following:
>
> 16 threads 5.38 sent/sec
> 16 procs 13.51 sent/sec
>
> This definitely used a lot more memory though. Based on some very rough
> estimates looking at free system memory, the memory mapped suffix array PT
> went from 2G to 6G with 16 processes while the compact PT went from 3G to
> 37G. For cases where everything fits into memory, I've seen significant
> speedup from multi-process decoding.
>
> For cases where things don't fit into memory, the multi-moses script could
> be extended to start as many multi-threaded instances as will fit into ram
> and farm out sentences in a way that keeps all of the CPUs busy. I know
> Marcin has mentioned using GNU parallel.
>
> Best,
> Michael
>
> On Tue, Oct 6, 2015 at 4:16 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>
>> I've just run some comparison between multithreaded decoder and the
>> multi_moses.py script. It's good stuff.
>>
>> It make me seriously wonder whether we should use abandon multi-threading
>> and go all out for the multi-process approach.
>>
>> There's some advantage to multi-thread - eg. where model files are loaded
>> into memory rather than memory map. But there's disadvantages too - it more
>> difficult to maintain and there's about a 10% overhead.
>>
>> What do people think?
>>
>> Phrase-based:
>>
>> 1 5 10 15 20 25 30 32 real 4m37.000s real 1m15.391s real
>> 0m51.217s real 0m48.287s real 0m50.719s real 0m52.027s real
>> 0m53.045s Baseline (Compact pt) user 4m21.544s user 5m28.597s user
>> 6m38.227s user 8m0.975s user 8m21.122s user 8m3.195s user
>> 8m4.663s
>> sys 0m15.451s sys 0m34.669s sys 0m53.867s sys 1m10.515s
>> sys 1m20.746s sys 1m24.368s sys 1m23.677s
>>
>>
>>
>>
>>
>>
>>
>> 34 4m49.474s real 1m17.867s real 0m43.096s real 0m31.999s
>> 0m26.497s 0m26.296s killed (32) + multi_moses 4m33.580s user 4m40.486s
>> user 4m56.749s user 5m6.692s 5m43.845s 7m34.617s
>>
>> 0m15.957s sys 0m32.347s sys 0m51.016s sys 1m11.106s 1m44.115s
>> 2m21.263s
>>
>>
>>
>>
>>
>>
>>
>>
>> 38 real 4m46.254s real 1m16.637s real 0m49.711s real
>> 0m48.389s real 0m49.144s real 0m51.676s real 0m52.472s Baseline
>> (Probing pt) user 4m30.596s user 5m32.500s user 6m23.706s user
>> 7m40.791s user 7m51.946s user 7m52.892s user 7m53.569s
>> sys 0m15.624s sys 0m36.169s sys 0m49.433s sys 1m6.812s
>> sys 1m9.614s sys 1m13.108s sys 1m12.644s
>>
>>
>>
>>
>>
>>
>>
>> 39 real 4m43.882s real 1m17.849s real 0m34.245s real
>> 0m31.318s real 0m28.054s real 0m24.120s real 0m22.520s (38) +
>> multi moses user 4m29.212s user 4m47.693s user 5m5.750s user
>> 5m33.573s user 6m18.847s user 7m19.642s user 8m38.013s
>> sys 0m15.835s sys 0m25.398s sys 0m36.716s sys 0m41.349s
>> sys 0m48.494s sys 1m0.843s sys 1m13.215s
>> Hiero:
>> 3 real 5m33.011s real 1m28.935s real 0m59.470s real 1m0.315s
>> real 0m55.619s real 0m57.347s real 0m59.191s 1m2.786s 6/10
>> baseline user 4m53.187s user 6m23.521s user 8m17.170s user
>> 12m48.303s user 14m45.954s user 17m58.109s user 20m22.891s
>> 21m13.605s
>> sys 0m39.696s sys 0m51.519s sys 1m3.788s sys 1m22.125s
>> sys 1m58.718s sys 2m51.249s sys 4m4.807s 4m37.691s
>>
>>
>>
>>
>>
>>
>>
>>
>> 4
>> real 1m27.215s real 0m40.495s real 0m36.206s real 0m28.623s
>> real 0m26.631s real 0m25.817s 0m25.401s (3) + multi_moses
>> user 5m4.819s user 5m42.070s user 5m35.132s user 6m46.001s
>> user 7m38.151s user 9m6.500s 10m32.739s
>>
>> sys 0m38.039s sys 0m45.753s sys 0m44.117s sys 0m52.285s
>> sys 0m56.655s sys 1m6.749s 1m16.935s
>>
>> On 05/10/2015 16:05, Michael Denkowski wrote:
>>
>> Hi Philipp,
>>
>> Unfortunately I don't have a precise measurement. If anyone knows of a
>> good way to benchmark a process tree with lots of memory mapping the same
>> files, I would be glad to run it.
>>
>> --Michael
>>
>> On Mon, Oct 5, 2015 at 10:26 AM, Philipp Koehn <phi@jhu.edu> wrote:
>>
>>> Hi,
>>>
>>> great - that will be very useful.
>>>
>>> Since you just ran the comparison - do you have any numbers on "still
>>> allowed everything to fit into memory", i.e., how much more memory is used
>>> by running parallel instances?
>>>
>>> -phi
>>>
>>> On Mon, Oct 5, 2015 at 10:15 AM, Michael Denkowski <
>>> <michael.j.denkowski@gmail.com>michael.j.denkowski@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Like some other Moses users, I noticed diminishing returns from running
>>>> Moses with several threads. To work around this, I added a script to run
>>>> multiple single-threaded instances of moses instead of one multi-threaded
>>>> instance. In practice, this sped things up by about 2.5x for 16 cpus and
>>>> using memory mapped models still allowed everything to fit into memory.
>>>>
>>>> If anyone else is interested in using this, you can prefix a moses
>>>> command with scripts/generic/multi_moses.py. To use multiple instances in
>>>> mert-moses.pl, specify --multi-moses and control the number of
>>>> parallel instances with --decoder-flags='-threads N'.
>>>>
>>>> Below is a benchmark on WMT fr-en data (2M training sentences, 400M
>>>> words mono, suffix array PT, compact reordering, 5-gram KenLM) testing
>>>> default stack decoding vs cube pruning without and with the parallelization
>>>> script (+multi):
>>>>
>>>> ---
>>>> 1cpu sent/sec
>>>> stack 1.04
>>>> cube 2.10
>>>> ---
>>>> 16cpu sent/sec
>>>> stack 7.63
>>>> +multi 12.20
>>>> cube 7.63
>>>> +multi 18.18
>>>> ---
>>>>
>>>> --Michael
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>
>>
>> _______________________________________________
>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> --
>> Hieu Hoanghttp://www.hoang.co.uk/hieu
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151008/ae53693b/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 108, Issue 23
**********************************************

0 Response to "Moses-support Digest, Vol 108, Issue 23"

Post a Comment