Moses-support Digest, Vol 126, Issue 13

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Adding new aligned phrases to the existing phrase table
(sriram)
2. Re: Moses vs Moses2 in its multi-threading (Hieu Hoang)

----------------------------------------------------------------------

Message: 1
Date: Mon, 10 Apr 2017 23:48:27 +0530
From: sriram <sriramchaudhury@gmail.com>
Subject: Re: [Moses-support] Adding new aligned phrases to the
existing phrase table
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAA-zSKaiugUUbEseuaiiTqTQj=FQRJC5YYB9+M+JnNC23oPfuQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Hieu,

I got the following pointer to use multiple translation models.

http://www.statmt.org/moses/?n=Advanced.Models#ntoc7

but when I use the options , I am getting the following error.

Feature name PhraseDictionaryGroup is not registered.

Does it mean I have to re-compile moses with the "PhraseDictionaryGroup"
options.

Please provide help for the correct information.

Thanks,
sriram

On Sun, Apr 9, 2017 at 9:51 AM, sriram <sriramchaudhury@gmail.com> wrote:

> Hi Hieu,
>
> Thanks for the suggestion.
>
> In regard to point 2 . How can I use multiple phrase table inside moses?
>
>
> Regards,
> Sriram
>
> On Fri, Apr 7, 2017 at 5:44 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>
>> there's no tools to do this but you can write it yourself. You need to
>> make up some scores to give each phrase.
>>
>> The other methods to use your phrases are:
>> 1. Add it to the training data and retrain your model.
>> 2. Create a 2nd phrase-table with just your phrases and get the
>> decoder to use it, in addition to the existing phrase-table
>>
>> * Looking for MT/NLP opportunities *
>> Hieu Hoang
>> http://moses-smt.org/
>>
>>
>> On 6 April 2017 at 19:14, sriram <sriramchaudhury@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have some good aligned phrases collection and I want to add to the
>>> existing phrase table. Is there any existing tool to add the same in Moses.
>>>
>>> Thanks,
>>> Sriram
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>
>
> --
>
> Open source English-Hindi MT system
> http://anusaaraka.iiit.ac.in/
>
>

--

Open source English-Hindi MT system
http://anusaaraka.iiit.ac.in/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170410/49aeeb70/attachment-0001.html

------------------------------

Message: 2
Date: Mon, 10 Apr 2017 21:08:08 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Moses vs Moses2 in its multi-threading
To: Ivan Zapreev <ivan.zapreev@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbgKJAd=AMwkiHNvt=TOCW8KvN5=Qv4tMnJxBcMyW4Yuww@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

It's true that for this experiment (non-cube-pruning, loading the
phrase-table into memory) that Moses scales as well as Moses2, and that the
Moses2 is only 3 times faster, rather than 10 times.

However, the results on the website and paper is correct when using
cube-pruning and binary phrase-table (compact pt for Moses, probing pt for
Moses2). I've never tested your setup until now, and you've never tested
mine so we're talking at cross purposes.

I will add my recent results to the website to make it more complete

* Looking for MT/NLP opportunities *
Hieu Hoang
http://moses-smt.org/

On 10 April 2017 at 10:07, Ivan Zapreev <ivan.zapreev@gmail.com> wrote:

> Dear Hieu,
>
> Thank you very much for your e-mail and all the effort you put into
> re-running the experiments!
>
> I think however that an exel of values gives a rather obscured view on the
> results. I would rather prefer to see the plots of pure decoding times or
> wps or speed increment per number of cores.
>
> Any how, I see that my main concern about the Moses vs Moses2 scalability
> is actulally. The speed-up of Moses 2 is not so much related to better
> multi-threading but is more of a single thread decoding speed improvement
> and the original results listed on the website are flawed.
>
> Regarding the reasons you gave me in the previous e-mail. Not that I find
> it important now that I see that you have the same results, but I already
> pointed out that the cold/hot data issue is not possible, so we can rule it
> out. Some other "parameter miss match" did sound strange to me as all the
> parameters I used are listed in the experimental set-up section:
> https://github.com/ivan-zapreev/Basic-Translation-
> Infrastructure#test-set-up-1 So there is nothing else that could be
> different except for the models and the texts themselves.
>
> Kind regards,
>
> Dr. Ivan S. Zapreev
>
> On Sun, Apr 9, 2017 at 8:42 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>
>> Hi Ivan
>>
>> I've finished running my experiments with the vanilla phrase-based
>> algorithm and memory phrase-table. The results are here:
>> https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-Ay
>> HyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
>> Summary:
>> 1. Moses2 is about 3 times faster than Moses.
>> 2. Both decoder is 15-16 times faster running with 32 threads than on 1
>> thread (on a 16 cores/32 hyperthread server).
>> 3. Moses2 with binary phrase-table is slightly faster than loading the
>> pt into memory.
>>
>> I'm happy with the speed with of Moses2, and the scalability wrt number
>> of cores. The scalability is in line with that reported on the website on
>> and in the paper.
>>
>> The original Moses decoder also seem to have similar scalability,
>> contrary to my previous results. I have some explanation for it but I'm not
>> too concerned, it's great that Moses is also good!
>>
>> This doesn't correlate with some of your findings, I've outlined some
>> possible reasons in the last email.
>>
>>
>> * Looking for MT/NLP opportunities *
>> Hieu Hoang
>> http://moses-smt.org/
>>
>>
>> On 4 April 2017 at 11:01, Ivan Zapreev <ivan.zapreev@gmail.com> wrote:
>>
>>> Dear Hieu,
>>>
>>> Thank you for the feedback and the info. I am not sure what you mean by
>>> "good scalability", I can not really visualize the plots from numbers in my
>>> head. Sorry.
>>>
>>> Using bigger models is indeed always good but I used the biggest there
>>> were available.
>>>
>>> I did make sure there was no swapping, I already mentioned it.
>>>
>>> I did take the average run times for loading and decoding and just
>>> loading with standard deviations.
>>> The latter show that the way things are measured were reliable.
>>>
>>> The L1 and L2 cache issues do not sound convincing to me. The caches are
>>> just up to some Mbs and the models you work with are gigabytes. There will
>>> always be cache misses in this setting. The only issue I can think of is if
>>> the data is not fully pre-loaded into RAM then you have a cold-run but not
>>> more than that.
>>>
>>> I think if you finish the runs and then could plot the results, then we
>>> could see the clearer picture...
>>>
>>> Thanks again!
>>>
>>> Kind regards,
>>>
>>> Ivan
>>>
>>>
>>> On Tue, Apr 4, 2017 at 11:40 AM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>>>
>>>> Thanks Ivan,
>>>>
>>>> I'm running the experiments with my models, using the text-based
>>>> phrase-table that you used. Experiments are still running, they may take a
>>>> week to finish.
>>>>
>>>> However, preliminary results suggest good scalability with Moses2.
>>>> https://docs.google.com/spreadsheets/d/15S1m-6MXNmxc47UiS-Ay
>>>> HyHCLWtGEdxaAvU_0plsPjo/edit?usp=sharing
>>>> My models are here for you to download and test yourself if you like:
>>>> http://statmt.org/~s0565741/download/for-ivan/
>>>>
>>>> Below are my thoughts on possible reasons why there are discrepencies
>>>> in what we're seeing:
>>>> 1. You may have parameters in your moses.ini which are vastly
>>>> different from mine and suboptimal for speed and scability. We won't know
>>>> until we compare our 2 setups
>>>> 2. Yours and mine phrase-tables are vastly different sizes. Your
>>>> phrase-table is 1.3GB, mine is 40GB (unzipped). I've also tested on a 15GB
>>>> phrase-table in our AMTA paper and also got good scalability, but have I
>>>> not tried one as small as yours. There may be phenomena that causes Moses2
>>>> to be bad with small models
>>>> 3. You loaded all models into memory, I loaded the phrase-table into
>>>> memory binary but had to use binary LM and reordering models. My models are
>>>> too large to load into RAM (they take up more RAM than the file size
>>>> suggest).
>>>> 4. You may be also running our of RAM by loading everything into
>>>> memory, causing disk swapping
>>>> 5. Your test set (1788 sentences) is too small. My test set is
>>>> 800,000 sentences (5,847,726 tokens). The decoders rely on CPU caches (L1,
>>>> L2 etc) for speed. There are also setup costs for each decoding thread (eg.
>>>> creating memory pools in Moses2). If your experiment are over too quickly,
>>>> you may be measuring the decoder in the 'warm-up lap' rather than when it's
>>>> running at terminal velocity. Your quickest decoding experiments took 25
>>>> sec, my quickest took 200 sec.
>>>> 6. I think the way you exclude load time is unreliable. You exclude
>>>> load time by subtracting the average load time from the total time.
>>>> However, load time is multiple times more than decoding time so any
>>>> variation in load time will swamp the decoding time. I use the decoder's
>>>> debugging timing output.
>>>>
>>>> If you can share your models, we might be able to find out the reason
>>>> for the difference in our results. I can provide you with ssh/scp access to
>>>> my server if you need to.
>>>>
>>>>
>>>>
>>>> * Looking for MT/NLP opportunities *
>>>> Hieu Hoang
>>>> http://moses-smt.org/
>>>>
>>>>
>>>> On 4 April 2017 at 09:00, Ivan Zapreev <ivan.zapreev@gmail.com> wrote:
>>>>
>>>>> Dear Hieu,
>>>>>
>>>>> Please see the answers below.
>>>>>
>>>>> Can you clarify a few things for me.
>>>>>> 1. How many sentences, words were in the test set you used to
>>>>>> measure decoding speed? Are there many duplicate sentence - ie. did you
>>>>>> create a large test set by concatenating the same small test set multiple
>>>>>> times?
>>>>>>
>>>>>
>>>>> We run experiments on the same MT04 Chinese text as we tuned the
>>>>> system. The text consists of 1788 unique sentences and 49582 tokens.
>>>>>
>>>>>
>>>>>> 2. Are the model sizes you quoted the gzipped text files or
>>>>>> unzipped, or the model size as it is when loaded into memory?
>>>>>>
>>>>>
>>>>> This is the plain text models as stored on the hard drive.
>>>>>
>>>>>
>>>>>> 3. Can you please reformat this graph
>>>>>> https://github.com/ivan-zapreev/Basic-Translation-Infrastruc
>>>>>> ture/blob/master/doc/images/experiments/servers/stats.time.t
>>>>>> ools.log.png
>>>>>> as #threads v. words per second. Ie. don't use log, don't use
>>>>>> decoding time.
>>>>>>
>>>>>
>>>>> The plot is attached, but this one is not about words per second, it
>>>>> shows the decoding run-times (as in the link you sent). The non-log scale
>>>>> plot, as you will see, is hard to read. I also attach the plain data files
>>>>> for moses and moses2 with the column values as follows:
>>>>>
>>>>> number of threads | average runtime decoding + model loading |
>>>>> standard deviation | average runtime model loading | standard deviation
>>>>>
>>>>> --
>>>>> Best regards,
>>>>>
>>>>> Ivan
>>>>> <http://www.tainichok.ru/>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Ivan
>>> <http://www.tainichok.ru/>
>>>
>>
>>
>
>
> --
> Best regards,
>
> Ivan
> <http://www.tainichok.ru/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170410/7658ea84/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 126, Issue 13
**********************************************

Moses-support Digest, Vol 126, Issue 13

0 Response to "Moses-support Digest, Vol 126, Issue 13"

Post a Comment