Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Moses vs Moses2 in its multi-threading (Hieu Hoang)
2. Re: Moses vs Moses2 in its multi-threading (Ivan Zapreev)
----------------------------------------------------------------------
Message: 1
Date: Sun, 9 Apr 2017 19:42:16 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Moses vs Moses2 in its multi-threading
To: Ivan Zapreev <ivan.zapreev@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbjrCZWB9H6TbBtXgiER_UfoDSsZGpcPg4yP5Ov9k3DQPQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi Ivan
I've finished running my experiments with the vanilla phrase-based
algorithm and memory phrase-table. The results are here:
https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-AyHyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
Summary:
1. Moses2 is about 3 times faster than Moses.
2. Both decoder is 15-16 times faster running with 32 threads than on 1
thread (on a 16 cores/32 hyperthread server).
3. Moses2 with binary phrase-table is slightly faster than loading the pt
into memory.
I'm happy with the speed with of Moses2, and the scalability wrt number of
cores. The scalability is in line with that reported on the website on and
in the paper.
The original Moses decoder also seem to have similar scalability, contrary
to my previous results. I have some explanation for it but I'm not too
concerned, it's great that Moses is also good!
This doesn't correlate with some of your findings, I've outlined some
possible reasons in the last email.
* Looking for MT/NLP opportunities *
Hieu Hoang
http://moses-smt.org/
On 4 April 2017 at 11:01, Ivan Zapreev <ivan.zapreev@gmail.com> wrote:
> Dear Hieu,
>
> Thank you for the feedback and the info. I am not sure what you mean by
> "good scalability", I can not really visualize the plots from numbers in my
> head. Sorry.
>
> Using bigger models is indeed always good but I used the biggest there
> were available.
>
> I did make sure there was no swapping, I already mentioned it.
>
> I did take the average run times for loading and decoding and just loading
> with standard deviations.
> The latter show that the way things are measured were reliable.
>
> The L1 and L2 cache issues do not sound convincing to me. The caches are
> just up to some Mbs and the models you work with are gigabytes. There will
> always be cache misses in this setting. The only issue I can think of is if
> the data is not fully pre-loaded into RAM then you have a cold-run but not
> more than that.
>
> I think if you finish the runs and then could plot the results, then we
> could see the clearer picture...
>
> Thanks again!
>
> Kind regards,
>
> Ivan
>
>
> On Tue, Apr 4, 2017 at 11:40 AM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>
>> Thanks Ivan,
>>
>> I'm running the experiments with my models, using the text-based
>> phrase-table that you used. Experiments are still running, they may take a
>> week to finish.
>>
>> However, preliminary results suggest good scalability with Moses2.
>> https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-Ay
>> HyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
>> My models are here for you to download and test yourself if you like:
>> http://statmt.org/~s0565741/download/for-ivan/
>>
>> Below are my thoughts on possible reasons why there are discrepencies in
>> what we're seeing:
>> 1. You may have parameters in your moses.ini which are vastly
>> different from mine and suboptimal for speed and scability. We won't know
>> until we compare our 2 setups
>> 2. Yours and mine phrase-tables are vastly different sizes. Your
>> phrase-table is 1.3GB, mine is 40GB (unzipped). I've also tested on a 15GB
>> phrase-table in our AMTA paper and also got good scalability, but have I
>> not tried one as small as yours. There may be phenomena that causes Moses2
>> to be bad with small models
>> 3. You loaded all models into memory, I loaded the phrase-table into
>> memory binary but had to use binary LM and reordering models. My models are
>> too large to load into RAM (they take up more RAM than the file size
>> suggest).
>> 4. You may be also running our of RAM by loading everything into
>> memory, causing disk swapping
>> 5. Your test set (1788 sentences) is too small. My test set is 800,000
>> sentences (5,847,726 tokens). The decoders rely on CPU caches (L1, L2 etc)
>> for speed. There are also setup costs for each decoding thread (eg.
>> creating memory pools in Moses2). If your experiment are over too quickly,
>> you may be measuring the decoder in the 'warm-up lap' rather than when it's
>> running at terminal velocity. Your quickest decoding experiments took 25
>> sec, my quickest took 200 sec.
>> 6. I think the way you exclude load time is unreliable. You exclude
>> load time by subtracting the average load time from the total time.
>> However, load time is multiple times more than decoding time so any
>> variation in load time will swamp the decoding time. I use the decoder's
>> debugging timing output.
>>
>> If you can share your models, we might be able to find out the reason for
>> the difference in our results. I can provide you with ssh/scp access to my
>> server if you need to.
>>
>>
>>
>> * Looking for MT/NLP opportunities *
>> Hieu Hoang
>> http://moses-smt.org/
>>
>>
>> On 4 April 2017 at 09:00, Ivan Zapreev <ivan.zapreev@gmail.com> wrote:
>>
>>> Dear Hieu,
>>>
>>> Please see the answers below.
>>>
>>> Can you clarify a few things for me.
>>>> 1. How many sentences, words were in the test set you used to measure
>>>> decoding speed? Are there many duplicate sentence - ie. did you create a
>>>> large test set by concatenating the same small test set multiple times?
>>>>
>>>
>>> We run experiments on the same MT04 Chinese text as we tuned the system.
>>> The text consists of 1788 unique sentences and 49582 tokens.
>>>
>>>
>>>> 2. Are the model sizes you quoted the gzipped text files or unzipped,
>>>> or the model size as it is when loaded into memory?
>>>>
>>>
>>> This is the plain text models as stored on the hard drive.
>>>
>>>
>>>> 3. Can you please reformat this graph
>>>> https://github.com/ivan-zapreev/Basic-Translation-Infrastruc
>>>> ture/blob/master/doc/images/experiments/servers/stats.time.t
>>>> ools.log.png
>>>> as #threads v. words per second. Ie. don't use log, don't use
>>>> decoding time.
>>>>
>>>
>>> The plot is attached, but this one is not about words per second, it
>>> shows the decoding run-times (as in the link you sent). The non-log scale
>>> plot, as you will see, is hard to read. I also attach the plain data files
>>> for moses and moses2 with the column values as follows:
>>>
>>> number of threads | average runtime decoding + model loading | standard
>>> deviation | average runtime model loading | standard deviation
>>>
>>> --
>>> Best regards,
>>>
>>> Ivan
>>> <http://www.tainichok.ru/>
>>>
>>
>>
>
>
> --
> Best regards,
>
> Ivan
> <http://www.tainichok.ru/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170409/aebd266d/attachment-0001.html
------------------------------
Message: 2
Date: Mon, 10 Apr 2017 11:07:10 +0200
From: Ivan Zapreev <ivan.zapreev@gmail.com>
Subject: Re: [Moses-support] Moses vs Moses2 in its multi-threading
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAOwV4iudskFTDviDUZvVuj6Rr7ehdS5VzajWeAmcmV3Nzp9n5Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Dear Hieu,
Thank you very much for your e-mail and all the effort you put into
re-running the experiments!
I think however that an exel of values gives a rather obscured view on the
results. I would rather prefer to see the plots of pure decoding times or
wps or speed increment per number of cores.
Any how, I see that my main concern about the Moses vs Moses2 scalability
is actulally. The speed-up of Moses 2 is not so much related to better
multi-threading but is more of a single thread decoding speed improvement
and the original results listed on the website are flawed.
Regarding the reasons you gave me in the previous e-mail. Not that I find
it important now that I see that you have the same results, but I already
pointed out that the cold/hot data issue is not possible, so we can rule it
out. Some other "parameter miss match" did sound strange to me as all the
parameters I used are listed in the experimental set-up section:
https://github.com/ivan-zapreev/Basic-Translation-Infrastructure#test-set-up-1
So there is nothing else that could be different except for the models and
the texts themselves.
Kind regards,
Dr. Ivan S. Zapreev
On Sun, Apr 9, 2017 at 8:42 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
> Hi Ivan
>
> I've finished running my experiments with the vanilla phrase-based
> algorithm and memory phrase-table. The results are here:
> https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-
> AyHyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
> Summary:
> 1. Moses2 is about 3 times faster than Moses.
> 2. Both decoder is 15-16 times faster running with 32 threads than on 1
> thread (on a 16 cores/32 hyperthread server).
> 3. Moses2 with binary phrase-table is slightly faster than loading the
> pt into memory.
>
> I'm happy with the speed with of Moses2, and the scalability wrt number of
> cores. The scalability is in line with that reported on the website on and
> in the paper.
>
> The original Moses decoder also seem to have similar scalability, contrary
> to my previous results. I have some explanation for it but I'm not too
> concerned, it's great that Moses is also good!
>
> This doesn't correlate with some of your findings, I've outlined some
> possible reasons in the last email.
>
>
> * Looking for MT/NLP opportunities *
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 4 April 2017 at 11:01, Ivan Zapreev <ivan.zapreev@gmail.com> wrote:
>
>> Dear Hieu,
>>
>> Thank you for the feedback and the info. I am not sure what you mean by
>> "good scalability", I can not really visualize the plots from numbers in my
>> head. Sorry.
>>
>> Using bigger models is indeed always good but I used the biggest there
>> were available.
>>
>> I did make sure there was no swapping, I already mentioned it.
>>
>> I did take the average run times for loading and decoding and just
>> loading with standard deviations.
>> The latter show that the way things are measured were reliable.
>>
>> The L1 and L2 cache issues do not sound convincing to me. The caches are
>> just up to some Mbs and the models you work with are gigabytes. There will
>> always be cache misses in this setting. The only issue I can think of is if
>> the data is not fully pre-loaded into RAM then you have a cold-run but not
>> more than that.
>>
>> I think if you finish the runs and then could plot the results, then we
>> could see the clearer picture...
>>
>> Thanks again!
>>
>> Kind regards,
>>
>> Ivan
>>
>>
>> On Tue, Apr 4, 2017 at 11:40 AM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>>
>>> Thanks Ivan,
>>>
>>> I'm running the experiments with my models, using the text-based
>>> phrase-table that you used. Experiments are still running, they may take a
>>> week to finish.
>>>
>>> However, preliminary results suggest good scalability with Moses2.
>>> https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-Ay
>>> HyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
>>> My models are here for you to download and test yourself if you like:
>>> http://statmt.org/~s0565741/download/for-ivan/
>>>
>>> Below are my thoughts on possible reasons why there are discrepencies in
>>> what we're seeing:
>>> 1. You may have parameters in your moses.ini which are vastly
>>> different from mine and suboptimal for speed and scability. We won't know
>>> until we compare our 2 setups
>>> 2. Yours and mine phrase-tables are vastly different sizes. Your
>>> phrase-table is 1.3GB, mine is 40GB (unzipped). I've also tested on a 15GB
>>> phrase-table in our AMTA paper and also got good scalability, but have I
>>> not tried one as small as yours. There may be phenomena that causes Moses2
>>> to be bad with small models
>>> 3. You loaded all models into memory, I loaded the phrase-table into
>>> memory binary but had to use binary LM and reordering models. My models are
>>> too large to load into RAM (they take up more RAM than the file size
>>> suggest).
>>> 4. You may be also running our of RAM by loading everything into
>>> memory, causing disk swapping
>>> 5. Your test set (1788 sentences) is too small. My test set is 800,000
>>> sentences (5,847,726 tokens). The decoders rely on CPU caches (L1, L2 etc)
>>> for speed. There are also setup costs for each decoding thread (eg.
>>> creating memory pools in Moses2). If your experiment are over too quickly,
>>> you may be measuring the decoder in the 'warm-up lap' rather than when it's
>>> running at terminal velocity. Your quickest decoding experiments took 25
>>> sec, my quickest took 200 sec.
>>> 6. I think the way you exclude load time is unreliable. You exclude
>>> load time by subtracting the average load time from the total time.
>>> However, load time is multiple times more than decoding time so any
>>> variation in load time will swamp the decoding time. I use the decoder's
>>> debugging timing output.
>>>
>>> If you can share your models, we might be able to find out the reason
>>> for the difference in our results. I can provide you with ssh/scp access to
>>> my server if you need to.
>>>
>>>
>>>
>>> * Looking for MT/NLP opportunities *
>>> Hieu Hoang
>>> http://moses-smt.org/
>>>
>>>
>>> On 4 April 2017 at 09:00, Ivan Zapreev <ivan.zapreev@gmail.com> wrote:
>>>
>>>> Dear Hieu,
>>>>
>>>> Please see the answers below.
>>>>
>>>> Can you clarify a few things for me.
>>>>> 1. How many sentences, words were in the test set you used to
>>>>> measure decoding speed? Are there many duplicate sentence - ie. did you
>>>>> create a large test set by concatenating the same small test set multiple
>>>>> times?
>>>>>
>>>>
>>>> We run experiments on the same MT04 Chinese text as we tuned the
>>>> system. The text consists of 1788 unique sentences and 49582 tokens.
>>>>
>>>>
>>>>> 2. Are the model sizes you quoted the gzipped text files or
>>>>> unzipped, or the model size as it is when loaded into memory?
>>>>>
>>>>
>>>> This is the plain text models as stored on the hard drive.
>>>>
>>>>
>>>>> 3. Can you please reformat this graph
>>>>> https://github.com/ivan-zapreev/Basic-Translation-Infrastruc
>>>>> ture/blob/master/doc/images/experiments/servers/stats.time.t
>>>>> ools.log.png
>>>>> as #threads v. words per second. Ie. don't use log, don't use
>>>>> decoding time.
>>>>>
>>>>
>>>> The plot is attached, but this one is not about words per second, it
>>>> shows the decoding run-times (as in the link you sent). The non-log scale
>>>> plot, as you will see, is hard to read. I also attach the plain data files
>>>> for moses and moses2 with the column values as follows:
>>>>
>>>> number of threads | average runtime decoding + model loading | standard
>>>> deviation | average runtime model loading | standard deviation
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Ivan
>>>> <http://www.tainichok.ru/>
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Ivan
>> <http://www.tainichok.ru/>
>>
>
>
--
Best regards,
Ivan
<http://www.tainichok.ru/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170410/cef710b3/attachment.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 126, Issue 12
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 126, Issue 12"
Post a Comment