Moses-support Digest, Vol 89, Issue 14

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: mixing old and new moses.ini format ? (Hieu Hoang)
2. Re: Help. First request to MosesServer very slow
(Marcin Junczys-Dowmunt)
3. Re: Help. First request to MosesServer very slow (Barry Haddow)
4. Re: Help. First request to MosesServer very slow
(Marcin Junczys-Dowmunt)
5. Re: question about --return-best-dev in mert-moses (Barry Haddow)

----------------------------------------------------------------------

Message: 1
Date: Thu, 6 Mar 2014 17:28:56 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] mixing old and new moses.ini format ?
To: Viktor Pless <viktor.pless@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbj_M1jp98Mh2VajKuNPEGmXFxFuG2Z7EV4fFcaFr65arw@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

-d is the shorthand for -weight-d

if you want no distortion, do
-distortion-limit 0
or
-dl 0

On 6 March 2014 17:04, Viktor Pless <viktor.pless@gmail.com> wrote:

> Hi, please have a look at my command:
>
> echo 'Qui?n mat? a la llamita blanca?' |
> /home/ubuntu/mosesdecoder/bin/moses -f /home/ubuntu/0301/model/moses.ini -d
> 0
>
> and my error msg:
> [blah, blah....]
> weight: UnknownWordPenalty0= 1 WordPenalty0= -1 PhrasePenalty0= 0.2
> TranslationModel0= 0.2 0.2 0.2 0.2 Lexica lReordering0= 0.3
> 0.3 0.3 0.3 0.3 0.3 Distortion0= 0.3 LM0= 0.5
> weight-d: 0
> Exception: moses/Parameter.cpp:336 in bool
> Moses::Parameter::LoadParam(int, char**) threw util::Exception'. Don't mix
> old and new ini file format
>
> (Seems that "weight-d: 0" comes out of nowhere, as I specified distortion
> as 'Distortion0= 0.3'. )
>
> See my ini file attached. Thanks in advance.
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140306/dcac8d16/attachment-0001.htm

------------------------------

Message: 2
Date: Thu, 06 Mar 2014 18:05:00 +0000
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Help. First request to MosesServer very
slow
To: moses-support@mit.edu
Message-ID: <5318B8CC.8060209@amu.edu.pl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Marcos,
What you can do, if you want to rule out the compact data structures are
the issue, is load the tables at start-up into memory (if you have
enough memory):

That would be the following additional options to your ini-file:

[minphr-memory]
1

[minlexr-memory]
1

However your problem looks client-side related. Why should the creation
of client-side requests affect translation time on the side of the
server? I understand the same server instance is running all the time?

Best,
Marcin

W dniu 06.03.2014 17:20, Marcos Fernandez pisze:
> Hi, I am having an issue with MosesServer.
>
> I am using compact phrase and reordering table, and KENLM.
>
> The problem is this (I'll explain with an example):
>
> - I have one file with 20 very short sentences. I split and tokenize
> them and send one XMLPRC request per sentence to MosesServer
> - If I create just one XMLRPC ServerProxy instance and I use it to send
> all the requests through it, all the sentences get translated in approx
> 2.5 sec. The problem is that the first sentence takes almost 2 seconds
> to get translated, while the other 19 are much faster
> - If I create one ServerProxy instance per request, the translation time
> rises to 30 sec (now every sentence takes almost 2 sec)
>
> I don't understand the reason of that delay for the first request. I
> have followed the source of this delay to the function:
>
> GetTargetPhraseCollectionLEGACY(const Phrase& src)
>
> in the file: ...TranslationModel/PhraseDictionary.cpp
>
> It seems that for the first request it's needed to look for something
> in the phrase table, while for subsequent requests it can be retrieved
> (most of the times) from a cache.
>
> But, as the sentences in my file are not related one to another in any
> way, the information on this cache can not be sentence-dependent, so why
> wouldn't it be possible for the cache to be preloaded with the
> information needed?
>
> I think that perhaps I have something misconfigured, because I have seen
> other people using the approach of creating one ServerProxy object for
> each XMLRPC request (which would facilitate things a lot for me), so I
> don't think they are experiencing this overhead. Perhaps using the
> compact formats can have something to do with it?
>
> Any help would be much appreciated. I paste below my moses.ini, if that
> helps:
>
> Thanks :)
>
> ### MOSES CONFIG FILE ###
> ###################
>
> # input factors
> [input-factors]
> 0
>
> # mapping steps
> [mapping]
> 0 T 0
>
> # translation tables: table type (hierarchical(0), textual (0), binary
> (1)), source-factors, target-factors, number of scores, file
> # OLD FORMAT is still handled for back-compatibility
> # OLD FORMAT translation tables: source-factors, target-factors, number
> of scores, file
> # OLD FORMAT a binary table type (1) is assumed
> [ttable-file]
> 12 0 0 5 /opt/moses-compiling/modelos/es-en/phrase-model/phrase-table
>
> # no generation models, no generation-file section
>
> # language models: type(srilm/irstlm), factors, order, file
> [lmodel-file]
> 8 0 5
> /opt/moses-compiling/modelos/es-en/lm/13-19-03gen_intec_head8m_sb5LM.kenlm
>
>
> # limit on how many phrase translations e for each phrase f are loaded
> # 0 = all elements loaded
> [ttable-limit]
> 10
>
> # distortion (reordering) files
> [distortion-file]
> 0-0 wbe-msd-bidirectional-fe-allff 6
> /opt/moses-compiling/modelos/es-en/phrase-model/reordering-table
>
> # distortion (reordering) weight
> [weight-d]
> 0.097107
> 0.150373
> -0.0551767
> -0.0307787
> 0.114613
> 0.214587
> 0.0467398
>
> # language model weights
> [weight-l]
> 0.0442748
>
>
> # translation model weights
> [weight-t]
> 0.00370888
> 0.0425665
> 0.0719956
> 0.0202699
> 0.071147
>
> # no generation models, no weight-generation section
>
> # word penalty
> [weight-w]
> 0.0366626
>
> [distortion-limit]
> 6
>
> [v]
> 0
>
>

------------------------------

Message: 3
Date: Thu, 06 Mar 2014 18:30:02 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Help. First request to MosesServer very
slow
To: Marcos Fernandez <marcos.fernandez.lopez@usc.es>,
moses-support@mit.edu
Message-ID: <5318BEAA.2030709@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Marcos

I think the problem is that the rules (or phrase pairs) are now cached
on a per thread basis. This is good for command-line Moses as it uses a
pool of threads, and having per-thread caches means that there is no
locking on the caches, as there used to be.

mosesserver, afaik, creates a new thread for each connection, so it
can't take advantage of the cache. This is done in the xmlrpc-c library
so we don't have much control over it. If you dig around in the xmlrpc-c
documentation (or code!) you might find a way to control the threading
policy.

I just spoke to Marcin about the problem, and we're not sure if loading
the compact phrase table into memory would help, as you still would need
the higher level cache (in PhraseDictionary). But you could try this anyway.

cheers - Barry

On 06/03/14 17:20, Marcos Fernandez wrote:
> Hi, I am having an issue with MosesServer.
>
> I am using compact phrase and reordering table, and KENLM.
>
> The problem is this (I'll explain with an example):
>
> - I have one file with 20 very short sentences. I split and tokenize
> them and send one XMLPRC request per sentence to MosesServer
> - If I create just one XMLRPC ServerProxy instance and I use it to send
> all the requests through it, all the sentences get translated in approx
> 2.5 sec. The problem is that the first sentence takes almost 2 seconds
> to get translated, while the other 19 are much faster
> - If I create one ServerProxy instance per request, the translation time
> rises to 30 sec (now every sentence takes almost 2 sec)
>
> I don't understand the reason of that delay for the first request. I
> have followed the source of this delay to the function:
>
> GetTargetPhraseCollectionLEGACY(const Phrase& src)
>
> in the file: ...TranslationModel/PhraseDictionary.cpp
>
> It seems that for the first request it's needed to look for something
> in the phrase table, while for subsequent requests it can be retrieved
> (most of the times) from a cache.
>
> But, as the sentences in my file are not related one to another in any
> way, the information on this cache can not be sentence-dependent, so why
> wouldn't it be possible for the cache to be preloaded with the
> information needed?
>
> I think that perhaps I have something misconfigured, because I have seen
> other people using the approach of creating one ServerProxy object for
> each XMLRPC request (which would facilitate things a lot for me), so I
> don't think they are experiencing this overhead. Perhaps using the
> compact formats can have something to do with it?
>
> Any help would be much appreciated. I paste below my moses.ini, if that
> helps:
>
> Thanks :)
>
> ### MOSES CONFIG FILE ###
> ###################
>
> # input factors
> [input-factors]
> 0
>
> # mapping steps
> [mapping]
> 0 T 0
>
> # translation tables: table type (hierarchical(0), textual (0), binary
> (1)), source-factors, target-factors, number of scores, file
> # OLD FORMAT is still handled for back-compatibility
> # OLD FORMAT translation tables: source-factors, target-factors, number
> of scores, file
> # OLD FORMAT a binary table type (1) is assumed
> [ttable-file]
> 12 0 0 5 /opt/moses-compiling/modelos/es-en/phrase-model/phrase-table
>
> # no generation models, no generation-file section
>
> # language models: type(srilm/irstlm), factors, order, file
> [lmodel-file]
> 8 0 5
> /opt/moses-compiling/modelos/es-en/lm/13-19-03gen_intec_head8m_sb5LM.kenlm
>
>
> # limit on how many phrase translations e for each phrase f are loaded
> # 0 = all elements loaded
> [ttable-limit]
> 10
>
> # distortion (reordering) files
> [distortion-file]
> 0-0 wbe-msd-bidirectional-fe-allff 6
> /opt/moses-compiling/modelos/es-en/phrase-model/reordering-table
>
> # distortion (reordering) weight
> [weight-d]
> 0.097107
> 0.150373
> -0.0551767
> -0.0307787
> 0.114613
> 0.214587
> 0.0467398
>
> # language model weights
> [weight-l]
> 0.0442748
>
>
> # translation model weights
> [weight-t]
> 0.00370888
> 0.0425665
> 0.0719956
> 0.0202699
> 0.071147
>
> # no generation models, no weight-generation section
>
> # word penalty
> [weight-w]
> 0.0366626
>
> [distortion-limit]
> 6
>
> [v]
> 0
>
>

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

Message: 4
Date: Thu, 06 Mar 2014 18:35:48 +0000
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Help. First request to MosesServer very
slow
To: moses-support@mit.edu
Message-ID: <5318C004.5020603@amu.edu.pl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

If this is indeed the problem then not using the compact phrase table
might result in even longer delays. The compact phrase table has a
low-level cache that is not thread-specific, so xml-rpc probably takes
advantage of that.

W dniu 06.03.2014 18:30, Barry Haddow pisze:
> Hi Marcos
>
> I think the problem is that the rules (or phrase pairs) are now cached
> on a per thread basis. This is good for command-line Moses as it uses a
> pool of threads, and having per-thread caches means that there is no
> locking on the caches, as there used to be.
>
> mosesserver, afaik, creates a new thread for each connection, so it
> can't take advantage of the cache. This is done in the xmlrpc-c library
> so we don't have much control over it. If you dig around in the xmlrpc-c
> documentation (or code!) you might find a way to control the threading
> policy.
>
> I just spoke to Marcin about the problem, and we're not sure if loading
> the compact phrase table into memory would help, as you still would need
> the higher level cache (in PhraseDictionary). But you could try this anyway.
>
> cheers - Barry
>
> On 06/03/14 17:20, Marcos Fernandez wrote:
>> Hi, I am having an issue with MosesServer.
>>
>> I am using compact phrase and reordering table, and KENLM.
>>
>> The problem is this (I'll explain with an example):
>>
>> - I have one file with 20 very short sentences. I split and tokenize
>> them and send one XMLPRC request per sentence to MosesServer
>> - If I create just one XMLRPC ServerProxy instance and I use it to send
>> all the requests through it, all the sentences get translated in approx
>> 2.5 sec. The problem is that the first sentence takes almost 2 seconds
>> to get translated, while the other 19 are much faster
>> - If I create one ServerProxy instance per request, the translation time
>> rises to 30 sec (now every sentence takes almost 2 sec)
>>
>> I don't understand the reason of that delay for the first request. I
>> have followed the source of this delay to the function:
>>
>> GetTargetPhraseCollectionLEGACY(const Phrase& src)
>>
>> in the file: ...TranslationModel/PhraseDictionary.cpp
>>
>> It seems that for the first request it's needed to look for something
>> in the phrase table, while for subsequent requests it can be retrieved
>> (most of the times) from a cache.
>>
>> But, as the sentences in my file are not related one to another in any
>> way, the information on this cache can not be sentence-dependent, so why
>> wouldn't it be possible for the cache to be preloaded with the
>> information needed?
>>
>> I think that perhaps I have something misconfigured, because I have seen
>> other people using the approach of creating one ServerProxy object for
>> each XMLRPC request (which would facilitate things a lot for me), so I
>> don't think they are experiencing this overhead. Perhaps using the
>> compact formats can have something to do with it?
>>
>> Any help would be much appreciated. I paste below my moses.ini, if that
>> helps:
>>
>> Thanks :)
>>
>> ### MOSES CONFIG FILE ###
>> ###################
>>
>> # input factors
>> [input-factors]
>> 0
>>
>> # mapping steps
>> [mapping]
>> 0 T 0
>>
>> # translation tables: table type (hierarchical(0), textual (0), binary
>> (1)), source-factors, target-factors, number of scores, file
>> # OLD FORMAT is still handled for back-compatibility
>> # OLD FORMAT translation tables: source-factors, target-factors, number
>> of scores, file
>> # OLD FORMAT a binary table type (1) is assumed
>> [ttable-file]
>> 12 0 0 5 /opt/moses-compiling/modelos/es-en/phrase-model/phrase-table
>>
>> # no generation models, no generation-file section
>>
>> # language models: type(srilm/irstlm), factors, order, file
>> [lmodel-file]
>> 8 0 5
>> /opt/moses-compiling/modelos/es-en/lm/13-19-03gen_intec_head8m_sb5LM.kenlm
>>
>>
>> # limit on how many phrase translations e for each phrase f are loaded
>> # 0 = all elements loaded
>> [ttable-limit]
>> 10
>>
>> # distortion (reordering) files
>> [distortion-file]
>> 0-0 wbe-msd-bidirectional-fe-allff 6
>> /opt/moses-compiling/modelos/es-en/phrase-model/reordering-table
>>
>> # distortion (reordering) weight
>> [weight-d]
>> 0.097107
>> 0.150373
>> -0.0551767
>> -0.0307787
>> 0.114613
>> 0.214587
>> 0.0467398
>>
>> # language model weights
>> [weight-l]
>> 0.0442748
>>
>>
>> # translation model weights
>> [weight-t]
>> 0.00370888
>> 0.0425665
>> 0.0719956
>> 0.0202699
>> 0.071147
>>
>> # no generation models, no weight-generation section
>>
>> # word penalty
>> [weight-w]
>> 0.0366626
>>
>> [distortion-limit]
>> 6
>>
>> [v]
>> 0
>>
>>
>

------------------------------

Message: 5
Date: Thu, 06 Mar 2014 18:37:24 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] question about --return-best-dev in
mert-moses
To: Jorg Tiedemann <tiedeman@gmail.com>, moses-support
<moses-support@mit.edu>
Message-ID: <5318C064.2020902@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi J?rg

In each MERT iteration, the first action is to decode the tuning set and
create an n-best list, using the current weight set. The 1-bests from
this decoding run are the hypotheses which get scored by --return-best-dev.

After that decoding, MERT searchs for a weight set that can rerank the
n-best lists to give a better BLEU, and stops when it reaches a local
maximum. This is the BLEU that is reported in the moses.ini file. So it
is a BLEU obtained by decoding with one weight set, and then reranking
with a different weight set. When you redecode using the new weight set
you do not get the same set of translations, since the nbest list is
just a tiny sample of the hypotheses that are considered during
decoding, so there will normally be hypotheses outwith the nbest list
which have higher model score.

We haven't generally used --return-best-dev with MERT - does it help?
It's really designed for pro and kbmira.

cheers - Barry

On 06/03/14 11:28, Jorg Tiedemann wrote:
> Hi,
>
> I have a question about the --return-best-dev flag in mert-moses.pl
> I have run several experiments using this flag and I don't really
> understand how it influences the choice of settings during MERT. In
> many cases, the system will select an early iteration which is much
> below in terms of BLEU than many iterations later. Maybe my confusing
> is related to the BLEU score mentioned in the moses.ini files printed
> after each iteration? Can someone help me? Thanks!
>
>
> Cheers,
> J?rg
>
>
> J?rg Tiedemann
> tiedeman@gmail.com <mailto:tiedeman@gmail.com>
>
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 89, Issue 14
*********************************************

Moses-support Digest, Vol 89, Issue 14

0 Response to "Moses-support Digest, Vol 89, Issue 14"

Post a Comment