Moses-support Digest, Vol 108, Issue 26

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Faster decoding with multiple moses instances
(Kenneth Heafield)
2. Re: Faster decoding with multiple moses instances
(Marcin Junczys-Dowmunt)
3. Re: Faster decoding with multiple moses instances
(Kenneth Heafield)
4. Re: Faster decoding with multiple moses instances
(Marcin Junczys-Dowmunt)

----------------------------------------------------------------------

Message: 1
Date: Thu, 8 Oct 2015 20:36:06 +0100
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Faster decoding with multiple moses
instances
To: moses-support@mit.edu
Message-ID: <5616C5A6.7040909@kheafield.com>
Content-Type: text/plain; charset=utf-8

There's a ton of object/malloc churn in creating Moses::TargetPhrase
objects, most of which are thrown away. If PhraseDictionaryMemory
(which creates and keeps the objects) scales better than CompactPT,
that's the first thing I'd optimize.

On 10/08/2015 08:30 PM, Marcin Junczys-Dowmunt wrote:
> We did quite a bit of experimenting with that, usually there is hardly
> any measureable quality loss until you get below 1000. Good enough for
> deployment systems. It seems however you can get up 0.4 BLEU increase
> when going really high (about 5000 and beyond) with larger distortion
> limits. But that's rather uninteresting for commercial applications.
>
> W dniu 08.10.2015 o 21:24, Michael Denkowski pisze:
>> Hi Vincent,
>>
>> That definitely helps. I reran everything comparing the original
>> 2000/2000 to your suggestion of 400/400. There isn't much difference
>> for a single multi-threaded instance, but there's about a 30% speedup
>> when using all single-threaded instances:
>>
>> pop limit & stack
>> procs/threads 2000 400
>> 1x16 5.46 5.68
>> 2x8 7.58 8.70
>> 4x4 9.71 11.24
>> 8x2 12.50 15.87
>> 16x1 14.08 18.52
>>
>> There wasn't any degradation to BLEU/TER/Meteor but this is just one
>> data point and a fairly simple system. I would be curious to see how
>> things work out in other users' systems.
>>
>> Best,
>> Michael
>>
>> On Thu, Oct 8, 2015 at 2:34 PM, Vincent Nguyen <vnguyen@neuf.fr
>> <mailto:vnguyen@neuf.fr>> wrote:
>>
>> out of curiosity, what gain do you get with 400 for both stack and
>> cube pruning ?
>>
>>
>> Le 08/10/2015 20:26, Michael Denkowski a ?crit :
>>
>> Hi Vincent,
>>
>> I'm using cube pruning with the following options for all data
>> points:
>>
>> [search-algorithm]
>> 1
>>
>> [cube-pruning-deterministic-search]
>> true
>>
>> [cube-pruning-pop-limit]
>> 2000
>>
>> [stack]
>> 2000
>>
>> Best,
>> Michael
>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

------------------------------

Message: 2
Date: Thu, 8 Oct 2015 21:39:44 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Faster decoding with multiple moses
instances
To: moses-support@mit.edu
Message-ID: <5616C680.2070703@amu.edu.pl>
Content-Type: text/plain; charset=utf-8; format=flowed

How is probing-pt avoiding the same problem then?

W dniu 08.10.2015 o 21:36, Kenneth Heafield pisze:
> There's a ton of object/malloc churn in creating Moses::TargetPhrase
> objects, most of which are thrown away. If PhraseDictionaryMemory
> (which creates and keeps the objects) scales better than CompactPT,
> that's the first thing I'd optimize.
>
> On 10/08/2015 08:30 PM, Marcin Junczys-Dowmunt wrote:
>> We did quite a bit of experimenting with that, usually there is hardly
>> any measureable quality loss until you get below 1000. Good enough for
>> deployment systems. It seems however you can get up 0.4 BLEU increase
>> when going really high (about 5000 and beyond) with larger distortion
>> limits. But that's rather uninteresting for commercial applications.
>>
>> W dniu 08.10.2015 o 21:24, Michael Denkowski pisze:
>>> Hi Vincent,
>>>
>>> That definitely helps. I reran everything comparing the original
>>> 2000/2000 to your suggestion of 400/400. There isn't much difference
>>> for a single multi-threaded instance, but there's about a 30% speedup
>>> when using all single-threaded instances:
>>>
>>> pop limit & stack
>>> procs/threads 2000 400
>>> 1x16 5.46 5.68
>>> 2x8 7.58 8.70
>>> 4x4 9.71 11.24
>>> 8x2 12.50 15.87
>>> 16x1 14.08 18.52
>>>
>>> There wasn't any degradation to BLEU/TER/Meteor but this is just one
>>> data point and a fairly simple system. I would be curious to see how
>>> things work out in other users' systems.
>>>
>>> Best,
>>> Michael
>>>
>>> On Thu, Oct 8, 2015 at 2:34 PM, Vincent Nguyen <vnguyen@neuf.fr
>>> <mailto:vnguyen@neuf.fr>> wrote:
>>>
>>> out of curiosity, what gain do you get with 400 for both stack and
>>> cube pruning ?
>>>
>>>
>>> Le 08/10/2015 20:26, Michael Denkowski a ?crit :
>>>
>>> Hi Vincent,
>>>
>>> I'm using cube pruning with the following options for all data
>>> points:
>>>
>>> [search-algorithm]
>>> 1
>>>
>>> [cube-pruning-deterministic-search]
>>> true
>>>
>>> [cube-pruning-pop-limit]
>>> 2000
>>>
>>> [stack]
>>> 2000
>>>
>>> Best,
>>> Michael
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

Message: 3
Date: Thu, 8 Oct 2015 20:56:32 +0100
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Faster decoding with multiple moses
instances
To: moses-support@mit.edu
Message-ID: <5616CA70.4050404@kheafield.com>
Content-Type: text/plain; charset=utf-8

Good point. I now blame this code from
moses/TranslationModel/CompactPT/TargetPhraseCollectionCache.h

Looks like a case for a concurrent fixed-size hash table. Failing that,
banded locks instead of a single lock? Namely an array of hash tables,
each of which is independently locked.

/** retrieve translations for source phrase from persistent cache **/
void Cache(const Phrase &sourcePhrase, TargetPhraseVectorPtr tpv,
size_t bitsLeft = 0, size_t maxRank = 0) {
#ifdef WITH_THREADS
boost::mutex::scoped_lock lock(m_mutex);

Moses-support Digest, Vol 108, Issue 26

0 Response to "Moses-support Digest, Vol 108, Issue 26"

Post a Comment