Moses-support Digest, Vol 108, Issue 15

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Faster decoding with multiple moses instances (Hieu Hoang)

----------------------------------------------------------------------

Message: 1
Date: Tue, 6 Oct 2015 21:16:59 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Faster decoding with multiple moses
instances
To: Michael Denkowski <michael.j.denkowski@gmail.com>, Philipp Koehn
<phi@jhu.edu>
Cc: Moses Support <moses-support@mit.edu>
Message-ID: <56142C3B.6040701@gmail.com>
Content-Type: text/plain; charset="windows-1252"

I've just run some comparison between multithreaded decoder and the
multi_moses.py script. It's good stuff.

It make me seriously wonder whether we should use abandon
multi-threading and go all out for the multi-process approach.

There's some advantage to multi-thread - eg. where model files are
loaded into memory rather than memory map. But there's disadvantages too
- it more difficult to maintain and there's about a 10% overhead.

What do people think?

Phrase-based:

1 5 10 15 20 25 30
32 real4m37.000s real1m15.391s real0m51.217s real0m48.287s
real0m50.719s real0m52.027s real0m53.045s
Baseline (Compact pt) user4m21.544s user5m28.597s user6m38.227s
user8m0.975s user8m21.122s user8m3.195s user8m4.663s

sys0m15.451s sys0m34.669s sys0m53.867s sys1m10.515s sys1m20.746s
sys1m24.368s sys1m23.677s

34 4m49.474s real1m17.867s real0m43.096s real0m31.999s 0m26.497s
0m26.296s killed
(32) + multi_moses 4m33.580s user4m40.486s user4m56.749s
user5m6.692s 5m43.845s 7m34.617s

0m15.957s sys0m32.347s sys0m51.016s sys1m11.106s 1m44.115s 2m21.263s

38 real4m46.254s real1m16.637s real0m49.711s real0m48.389s
real0m49.144s real0m51.676s real0m52.472s
Baseline (Probing pt) user4m30.596s user5m32.500s user6m23.706s
user7m40.791s user7m51.946s user7m52.892s user7m53.569s

sys0m15.624s sys0m36.169s sys0m49.433s sys1m6.812s sys1m9.614s
sys1m13.108s sys1m12.644s

39 real4m43.882s real1m17.849s real0m34.245s real0m31.318s
real0m28.054s real0m24.120s real0m22.520s
(38) + multi moses user4m29.212s user4m47.693s user5m5.750s
user5m33.573s user6m18.847s user7m19.642s user8m38.013s

sys0m15.835s sys0m25.398s sys0m36.716s sys0m41.349s sys0m48.494s
sys1m0.843s sys1m13.215s

Hiero:
3 real5m33.011s real1m28.935s real0m59.470s real1m0.315s
real0m55.619s real0m57.347s real0m59.191s 1m2.786s
6/10 baseline user4m53.187s user6m23.521s user8m17.170s
user12m48.303s user14m45.954s user17m58.109s user20m22.891s 21m13.605s

sys0m39.696s sys0m51.519s sys1m3.788s sys1m22.125s sys1m58.718s
sys2m51.249s sys4m4.807s 4m37.691s

4
real1m27.215s real0m40.495s real0m36.206s real0m28.623s
real0m26.631s real0m25.817s 0m25.401s
(3) + multi_moses
user5m4.819s user5m42.070s user5m35.132s user6m46.001s
user7m38.151s user9m6.500s 10m32.739s

sys0m38.039s sys0m45.753s sys0m44.117s sys0m52.285s sys0m56.655s
sys1m6.749s 1m16.935s

On 05/10/2015 16:05, Michael Denkowski wrote:
> Hi Philipp,
>
> Unfortunately I don't have a precise measurement. If anyone knows of
> a good way to benchmark a process tree with lots of memory mapping the
> same files, I would be glad to run it.
>
> --Michael
>
> On Mon, Oct 5, 2015 at 10:26 AM, Philipp Koehn <phi@jhu.edu
> <mailto:phi@jhu.edu>> wrote:
>
> Hi,
>
> great - that will be very useful.
>
> Since you just ran the comparison - do you have any numbers on
> "still allowed everything to fit into memory", i.e., how much more
> memory is used by running parallel instances?
>
> -phi
>
> On Mon, Oct 5, 2015 at 10:15 AM, Michael Denkowski
> <michael.j.denkowski@gmail.com
> <mailto:michael.j.denkowski@gmail.com>> wrote:
>
> Hi all,
>
> Like some other Moses users, I noticed diminishing returns
> from running Moses with several threads. To work around this,
> I added a script to run multiple single-threaded instances of
> moses instead of one multi-threaded instance. In practice,
> this sped things up by about 2.5x for 16 cpus and using memory
> mapped models still allowed everything to fit into memory.
>
> If anyone else is interested in using this, you can prefix a
> moses command with scripts/generic/multi_moses.py. To use
> multiple instances in mert-moses.pl <http://mert-moses.pl>,
> specify --multi-moses and control the number of parallel
> instances with --decoder-flags='-threads N'.
>
> Below is a benchmark on WMT fr-en data (2M training sentences,
> 400M words mono, suffix array PT, compact reordering, 5-gram
> KenLM) testing default stack decoding vs cube pruning without
> and with the parallelization script (+multi):
>
> ---
> 1cpu sent/sec
> stack 1.04
> cube 2.10
> ---
> 16cpu sent/sec
> stack 7.63
> +multi 12.20
> cube 7.63
> +multi 18.18
> ---
>
> --Michael
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

--
Hieu Hoang
http://www.hoang.co.uk/hieu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151006/fed717aa/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 108, Issue 15
**********************************************

Moses-support Digest, Vol 108, Issue 15

0 Response to "Moses-support Digest, Vol 108, Issue 15"

Post a Comment