Moses-support Digest, Vol 101, Issue 40

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: In-memory loading of compact phrases (Barry Haddow)
2. Re: In-memory loading of compact phrases (Marcin Junczys-Dowmunt)
3. Re: In-memory loading of compact phrases (Barry Haddow)

----------------------------------------------------------------------

Message: 1
Date: Thu, 12 Mar 2015 10:11:07 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] In-memory loading of compact phrases
To: Jes?s Gonz?lez Rubio <jesus.g.rubio@gmail.com>, Marcin
Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <5501663B.6070200@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hi Jes?s

As Marcin points out, when using the compact phrase table you need to
allow Moses time to cache the translation options for the common phrase
pairs. With the gzipped phrase table, it effectively caches the whole
phrase table during loading, but you excluded this 1800+ seconds from
your calculations.

I'm curious why the search time is twice as long for gzipped as opposed
to compact though (3.3s vs 1.6s). Once the translation options are
loaded, they should be doing the same thing shouldn't they? Maybe the
reduced process size with the compact phrase table gives the OS more
space to cache LM pages? I'm not sure how accurate the timings given by
Moses are.

cheers - Barry

On 11/03/15 19:31, Jes?s Gonz?lez Rubio wrote:
>
> 2015-03-11 19:21 GMT+00:00 Marcin Junczys-Dowmunt <junczys@amu.edu.pl
> <mailto:junczys@amu.edu.pl>>:
>
> Maybe someone will correct me, but if I am not wrong, the gziped
> version already calculates the future score while loading (i.e.
> the phrase is being scored by the language model). The compact
> phrase table cannot do this during loading and doing this on-line.
> This will be the reason for the slow speed. I suppose your phrase
> table has not been pruned? So, for instance function words like
> "the" can have hundreds of thousands of counterparts that need to
> be scored by the LM during collection.
>
> That makes sense.
>
> You can limit your phrase table using Barry's prunePhraseTable
> tool. With this you can limit it to, say, the 20 best phrases
> (corresponds to the ttable limit) and only score this 20 phrases
> during collection. That should be orders of magnitude faster.
>
> OK.
>
> Best,
> Marcin
>
> W dniu 11.03.2015 o 20:12, Jes?s Gonz?lez Rubio pisze:
>> Thanks for the quick response, I will try as you suggest.
>>
>> Nevertheless, my main concern is the time spent collecting
>> options. Is it normal the difference observed respect to the
>> gzip'ed tables? being the tables cached, shouldn't they be closer?
>>
>> 2015-03-11 18:52 GMT+00:00 Marcin Junczys-Dowmunt
>> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>>:
>>
>> Hi,
>> Try measuring the differences again after a full system
>> reboot (fresh reboot before each mesurement) or after purging
>> OS read/write caches. Your phrase tables are most likely
>> cached, which means they are in fact in memory.
>> Best,
>> Marcin
>>
>> W dniu 11.03.2015 o 19:31, Jes?s Gonz?lez Rubio pisze:
>>> Hi,
>>>
>>> I'm obtaining some unintuitive timing results when using
>>> compact phrase tables. The average translation time per
>>> sentence is much higher for them in comparison to using
>>> gzip'ed phrase tables. Particularly important is the
>>> difference in time required to collect the options. This
>>> table summarizes the timings (in seconds):
>>>
>>> Compact Gzip'ed
>>> on-disk in-memory
>>> Init: 5.9 6.3 1882.8
>>> Per-sentence:
>>> - Collect: 5.9 5.8 0.2
>>> - Search: 1.6 1.6 3.3
>>>
>>> Results in the table were computed using Moses v2.1 with one
>>> single thread (-th 1) but I've seen similar results using
>>> the pre-compiled binary for moses v3.0. The model comprises
>>> two phrase-tables (~2G and ~3M), two lexicalized reordering
>>> tables (~700M and ~1M) and two language models (~31G and
>>> ~38M). You can see the exact configuration in the attached
>>> moses.ini file.
>>>
>>> Interestingly, there is virtually no difference for the
>>> compact table between the the on-disk and in-memory options.
>>> Additionally, timings were higher for the initial sentences
>>> in both cases which I think should not be the case for the
>>> in-memory option.
>>>
>>> May be the case that the in-memory option of compact tables
>>> (-minpht-memory -minlexr-memory) is not working properly?
>>>
>>> Cheers.
>>> --
>>> Jes?s
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
>
>
> --
> Jes?s
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

Message: 2
Date: Thu, 12 Mar 2015 12:00:03 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] In-memory loading of compact phrases
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>, Jes?s Gonz?lez Rubio
<jesus.g.rubio@gmail.com>
Message-ID: <53a39b1c36dd3a38ed8c9a372d3fef5a@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"

Hi Barry,

I do have another cache used for decompression in my phrase table. Maybe
that's the reason? It's being shared between threads, so I guess it gets
filled up more quickly than the thread-specific caches. In other words:
I am cheating :)

W dniu 2015-03-12 11:11, Barry Haddow napisa?(a):

> Hi Jes?s
>
> As Marcin points out, when using the compact phrase table you need to allow Moses time to cache the translation options for the common phrase pairs. With the gzipped phrase table, it effectively caches the whole phrase table during loading, but you excluded this 1800+ seconds from your calculations.
>
> I'm curious why the search time is twice as long for gzipped as opposed to compact though (3.3s vs 1.6s). Once the translation options are loaded, they should be doing the same thing shouldn't they? Maybe the reduced process size with the compact phrase table gives the OS more space to cache LM pages? I'm not sure how accurate the timings given by Moses are.
>
> cheers - Barry
>
> On 11/03/15 19:31, Jes?s Gonz?lez Rubio wrote:
> 2015-03-11 19:21 GMT+00:00 Marcin Junczys-Dowmunt <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>>: Maybe someone will correct me, but if I am not wrong, the gziped version already calculates the future score while loading (i.e. the phrase is being scored by the language model). The compact phrase table cannot do this during loading and doing this on-line. This will be the reason for the slow speed. I suppose your phrase table has not been pruned? So, for instance function words like "the" can have hundreds of thousands of counterparts that need to be scored by the LM during collection. That makes sense. You can limit your phrase table using Barry's prunePhraseTable tool. With this you can limit it to, say, the 20 best phrases (corresponds to the ttable limit) and only score this 20 phrases during collection. That should be orders of magnitude faster. OK. Best, Marcin W dniu 11.03.2015 o 20:12, Jes?s Gonz?lez Rubio pisze: Thanks for the quick response, I will try as you su!
ggest.
Nevertheless, my main concern is the time spent collecting options. Is it normal the difference observed respect to the gzip'ed tables? being the tables cached, shouldn't they be closer? 2015-03-11 18:52 GMT+00:00 Marcin Junczys-Dowmunt <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>>: Hi, Try measuring the differences again after a full system reboot (fresh reboot before each mesurement) or after purging OS read/write caches. Your phrase tables are most likely cached, which means they are in fact in memory. Best, Marcin W dniu 11.03.2015 o 19:31, Jes?s Gonz?lez Rubio pisze: Hi, I'm obtaining some unintuitive timing results when using compact phrase tables. The average translation time per sentence is much higher for them in comparison to using gzip'ed phrase tables. Particularly important is the difference in time required to collect the options. This table summarizes the timings (in seconds): Compact Gzip'ed on-disk in-memory Init: 5.9 6.3 1882.8 Per-sentence: - Collect: 5!
.9 5.8
0.2 - Search: 1.6 1.6 3.3 Results in the table were computed using Moses v2.1 with one single thread (-th 1) but I've seen similar results using the pre-compiled binary for moses v3.0. The model comprises two phrase-tables (~2G and ~3M), two lexicalized reordering tables (~700M and ~1M) and two language models (~31G and ~38M). You can see the exact configuration in the attached moses.ini file. Interestingly, there is virtually no difference for the compact table between the the on-disk and in-memory options. Additionally, timings were higher for the initial sentences in both cases which I think should not be the case for the in-memory option. May be the case that the in-memory option of compact tables (-minpht-memory -minlexr-memory) is not working properly? Cheers. -- Jes?s _______________________________________________ Moses-support mailing list Moses-support@mit.edu <mailto:Moses-support@mit.edu> http://mailman.mit.edu/mailman/listinfo/moses-support [1]
_______________________________________________ Moses-support mailing list Moses-support@mit.edu <mailto:Moses-support@mit.edu> http://mailman.mit.edu/mailman/listinfo/moses-support [1]
-- Jes?s _______________________________________________ Moses-support
mailing list Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support [1]

Links:
------
[1] http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150312/97bcd235/attachment-0001.htm

------------------------------

Message: 3
Date: Thu, 12 Mar 2015 11:09:02 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] In-memory loading of compact phrases
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: moses-support <moses-support@mit.edu>, Jes?s Gonz?lez Rubio
<jesus.g.rubio@gmail.com>
Message-ID: <550173CE.1040203@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi Marcin

But the search time that Jesus quotes shouldn't include any translation
option lookup, and therefore shouldn't benefit from phrase table
caching, should it?

cheers - Barry

On 12/03/15 11:00, Marcin Junczys-Dowmunt wrote:
>
> Hi Barry,
>
> I do have another cache used for decompression in my phrase table.
> Maybe that's the reason? It's being shared between threads, so I guess
> it gets filled up more quickly than the thread-specific caches. In
> other words: I am cheating :)
>
> W dniu 2015-03-12 11:11, Barry Haddow napisa?(a):
>
>> Hi Jes?s
>>
>> As Marcin points out, when using the compact phrase table you need to allow Moses time to cache the translation options for the common phrase pairs. With the gzipped phrase table, it effectively caches the whole phrase table during loading, but you excluded this 1800+ seconds from your calculations.
>>
>> I'm curious why the search time is twice as long for gzipped as opposed to compact though (3.3s vs 1.6s). Once the translation options are loaded, they should be doing the same thing shouldn't they? Maybe the reduced process size with the compact phrase table gives the OS more space to cache LM pages? I'm not sure how accurate the timings given by Moses are.
>>
>> cheers - Barry
>>
>>
>> On 11/03/15 19:31, Jes?s Gonz?lez Rubio wrote:
>>> 2015-03-11 19:21 GMT+00:00 Marcin Junczys-Dowmunt
>>> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>
>>> <mailto:junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>>>: Maybe
>>> someone will correct me, but if I am not wrong, the gziped version
>>> already calculates the future score while loading (i.e. the phrase
>>> is being scored by the language model). The compact phrase table
>>> cannot do this during loading and doing this on-line. This will be
>>> the reason for the slow speed. I suppose your phrase table has not
>>> been pruned? So, for instance function words like "the" can have
>>> hundreds of thousands of counterparts that need to be scored by the
>>> LM during collection. That makes sense. You can limit your phrase
>>> table using Barry's prunePhraseTable tool. With this you can limit
>>> it to, say, the 20 best phrases (corresponds to the ttable limit)
>>> and only score this 20 phrases during collection. That should be
>>> orders of magnitude faster. OK. Best, Marcin W dniu 11.03.2015 o
>>> 20:12, Jes?s Gonz?lez Rubio pisze:
>>>> Thanks for the quick response, I will try as you suggest.
>>>> Nevertheless, my main concern is the time spent collecting options.
>>>> Is it normal the difference observed respect to the gzip'ed tables?
>>>> being the tables cached, shouldn't they be closer? 2015-03-11 18:52
>>>> GMT+00:00 Marcin Junczys-Dowmunt <junczys@amu.edu.pl
>>>> <mailto:junczys@amu.edu.pl> <mailto:junczys@amu.edu.pl
>>>> <mailto:junczys@amu.edu.pl>>>: Hi, Try measuring the differences
>>>> again after a full system reboot (fresh reboot before each
>>>> mesurement) or after purging OS read/write caches. Your phrase
>>>> tables are most likely cached, which means they are in fact in
>>>> memory. Best, Marcin W dniu 11.03.2015 o 19:31, Jes?s Gonz?lez
>>>> Rubio pisze:
>>>>> Hi, I'm obtaining some unintuitive timing results when using
>>>>> compact phrase tables. The average translation time per sentence
>>>>> is much higher for them in comparison to using gzip'ed phrase
>>>>> tables. Particularly important is the difference in time required
>>>>> to collect the options. This table summarizes the timings (in
>>>>> seconds): Compact Gzip'ed on-disk in-memory Init: 5.9 6.3 1882.8
>>>>> Per-sentence: - Collect: 5.9 5.8 0.2 - Search: 1.6 1.6 3.3 Results
>>>>> in the table were computed using Moses v2.1 with one single thread
>>>>> (-th 1) but I've seen similar results using the pre-compiled
>>>>> binary for moses v3.0. The model comprises two phrase-tables (~2G
>>>>> and ~3M), two lexicalized reordering tables (~700M and ~1M) and
>>>>> two language models (~31G and ~38M). You can see the exact
>>>>> configuration in the attached moses.ini file. Interestingly, there
>>>>> is virtually no difference for the compact table between the the
>>>>> on-disk and in-memory options. Additionally, timings were higher
>>>>> for the initial sentences in both cases which I think should not
>>>>> be the case for the in-memory option. May be the case that the
>>>>> in-memory option of compact tables (-minpht-memory
>>>>> -minlexr-memory) is not working properly? Cheers. -- Jes?s
>>>>> _______________________________________________ Moses-support
>>>>> mailing list Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>>> <mailto:Moses-support@mit.edu <mailto:Moses-support@mit.edu>>
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> _______________________________________________ Moses-support
>>>> mailing list Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>> <mailto:Moses-support@mit.edu <mailto:Moses-support@mit.edu>>
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> -- Jes?s _______________________________________________
>>> Moses-support mailing list Moses-support@mit.edu
>>> <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 101, Issue 40
**********************************************

Moses-support Digest, Vol 101, Issue 40

0 Response to "Moses-support Digest, Vol 101, Issue 40"

Post a Comment