Moses-support Digest, Vol 105, Issue 13

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: NPLM and BilingualNPLM not working as expected in Moses
(Rico Sennrich)
2. Re: NPLM and BilingualNPLM not working as expected in Moses
(Raj Dabre)


----------------------------------------------------------------------

Message: 1
Date: Mon, 06 Jul 2015 17:18:51 +0100
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] NPLM and BilingualNPLM not working as
expected in Moses
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <559AAA6B.8020808@gmx.ch>
Content-Type: text/plain; charset="utf-8"

Hi Raj,

the information you provide is pretty vague, so I'm just making some
wild guesses here:

it could be a user error, for instance an inconsistency between the
training sets used for training BilingualNPLM and the phrase table.
Check that the same version of the corpus (including tokenization,
truecasing etc.) was used for training, and that you did not mix up
source and target language. Also check that the settings during training
are consistent with those in the moses.ini file.

it's possible that some of the settings (vocabulary size, number of
training epochs, or similar) are unsuitable for your task. For example,
since you have a relatively small training corpus, you may need more
epochs of training to get good results (use a validation set to see if
model perplexity converges).

please double-check that there were no problems with the
unicode-handling of Japanese/Chinese characters, and that the encoding
of your vocabulary files matches that of the translation model, and the
decoder input. We have never experienced such problems, but they could
arise for some system configurations.

best wishes,
Rico


On 06.07.2015 16:31, Raj Dabre wrote:
> Hello Rico,
> I trained both mono as well as bilingual LM's.
> Both seemed ineffective.
> As I mentioned before, I am working with Chinese-Japanese and the
> domain is paper abstracts.
> I did check the n-best lists and I saw a significant difference
> between the LM scores when comparing the runs for KenLm and NPLM.
> What could have gone wrong during the training?
> Regards.
>
> On Mon, Jul 6, 2015 at 10:53 PM, Rico Sennrich <rico.sennrich@gmx.ch
> <mailto:rico.sennrich@gmx.ch>> wrote:
>
> Hello Raj,
>
> can you please clarify if you tried to train a monolingual LM
> (NeuralLM), a bilingual LM (BilingualNPLM), or both? Our previous
> experiences with BilingualNPLM are mixed, and we observed
> improvements for some tasks and language pairs, but not for
> others. See for instance:
>
> Alexandra Birch, Matthias Huck, Nadir Durrani, Nikolay Bogoychev
> and Philipp Koehn. 2014. Edinburgh SLT and MT System Description
> for the IWSLT 2014 Evaluation. Proceedings of IWSLT 2014.
>
> To help debugging, you can check the scores in the n-best lists of
> the tuning runs. If the NPLM features give much higher costs than
> KenLM (trained on the same data), this can indicate that something
> went wrong during training.
>
> best wishes,
> Rico
>
>
> On 06.07.2015 14:29, Raj Dabre wrote:
>> Dear all,
>> I have checked out the latest version of moses and nplm and
>> compiled moses successfully with the --with-nplm option.
>> I got a ton of warnings during compilation but in the end it all
>> worked out and all the desired binaries were created. Simply
>> executing the moses binary told me the the BilingualNPLM and
>> NeuralLM features were available.
>>
>> I trained an NPLM model based on the instructions here:
>> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc33
>> The corpus size I used was about 600k lines (for
>> Chinese-Japanese; Target is Japanese)
>>
>> I then integrated the resultant language model (after 10
>> iterations) into the decoding process by moses.ini
>>
>> I initiated tuning (standard parameters) and I got no errors,
>> which means that the neural language model (NPLM) was recognized
>> and queried appropriately.
>> I also ran tuning without a language model.
>>
>> The strange thing is that the tuning and test BLEU scores for
>> both these cases are almost the same. I checked the weights and
>> saw that the LM was assigned a very low weight.
>>
>> On the other hand when I used KENLM on the same data.... I had
>> comparatively higher BLEU scores.
>>
>> Am I missing something? Am I using the NeuralLM in an incorrect way?
>>
>> Thanks in advance.
>>
>>
>>
>> --
>> Raj Dabre.
>> Doctoral Student,
>> Graduate School of Informatics,
>> Kyoto University.
>> CSE MTech, IITB., 2011-2014
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Raj Dabre.
> Doctoral Student,
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150706/693daffe/attachment-0001.htm

------------------------------

Message: 2
Date: Tue, 7 Jul 2015 12:43:03 +0900
From: Raj Dabre <prajdabre@gmail.com>
Subject: Re: [Moses-support] NPLM and BilingualNPLM not working as
expected in Moses
To: Rico Sennrich <rico.sennrich@gmx.ch>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAB3gfjC3PfArzS1P1d0yFzcefQxQt+0qJGNAToNVMNmV41Sc2w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello Rico,
Now that you mention it I also performed an additional test.
I took a translation and obtained the perplexity score by querying the
kenlm and nplm from the command line. In this case the difference between
the scores was not that large.
It might be an encoding issue.
I will check again and let you know.

However the data I am using to train the LM's (KENLM, NPLM and BILM) is
the same as I am using to train. I should also mention that I did no
tokenization etc before training the LM's and the TM.
Thanks for your replies.
Regards.

On Tue, Jul 7, 2015 at 1:18 AM, Rico Sennrich <rico.sennrich@gmx.ch> wrote:

> Hi Raj,
>
> the information you provide is pretty vague, so I'm just making some wild
> guesses here:
>
> it could be a user error, for instance an inconsistency between the
> training sets used for training BilingualNPLM and the phrase table. Check
> that the same version of the corpus (including tokenization, truecasing
> etc.) was used for training, and that you did not mix up source and target
> language. Also check that the settings during training are consistent with
> those in the moses.ini file.
>
> it's possible that some of the settings (vocabulary size, number of
> training epochs, or similar) are unsuitable for your task. For example,
> since you have a relatively small training corpus, you may need more epochs
> of training to get good results (use a validation set to see if model
> perplexity converges).
>
> please double-check that there were no problems with the unicode-handling
> of Japanese/Chinese characters, and that the encoding of your vocabulary
> files matches that of the translation model, and the decoder input. We have
> never experienced such problems, but they could arise for some system
> configurations.
>
> best wishes,
> Rico
>
>
>
> On 06.07.2015 16:31, Raj Dabre wrote:
>
> Hello Rico,
> I trained both mono as well as bilingual LM's.
> Both seemed ineffective.
> As I mentioned before, I am working with Chinese-Japanese and the domain
> is paper abstracts.
> I did check the n-best lists and I saw a significant difference between
> the LM scores when comparing the runs for KenLm and NPLM.
> What could have gone wrong during the training?
> Regards.
>
> On Mon, Jul 6, 2015 at 10:53 PM, Rico Sennrich <rico.sennrich@gmx.ch>
> wrote:
>
>> Hello Raj,
>>
>> can you please clarify if you tried to train a monolingual LM (NeuralLM),
>> a bilingual LM (BilingualNPLM), or both? Our previous experiences with
>> BilingualNPLM are mixed, and we observed improvements for some tasks and
>> language pairs, but not for others. See for instance:
>>
>> Alexandra Birch, Matthias Huck, Nadir Durrani, Nikolay Bogoychev and
>> Philipp Koehn. 2014. Edinburgh SLT and MT System Description for the IWSLT
>> 2014 Evaluation. Proceedings of IWSLT 2014.
>>
>> To help debugging, you can check the scores in the n-best lists of the
>> tuning runs. If the NPLM features give much higher costs than KenLM
>> (trained on the same data), this can indicate that something went wrong
>> during training.
>>
>> best wishes,
>> Rico
>>
>> On 06.07.2015 14:29, Raj Dabre wrote:
>>
>> Dear all,
>> I have checked out the latest version of moses and nplm and compiled
>> moses successfully with the --with-nplm option.
>> I got a ton of warnings during compilation but in the end it all worked
>> out and all the desired binaries were created. Simply executing the moses
>> binary told me the the BilingualNPLM and NeuralLM features were available.
>>
>> I trained an NPLM model based on the instructions here:
>> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc33
>> The corpus size I used was about 600k lines (for Chinese-Japanese;
>> Target is Japanese)
>>
>> I then integrated the resultant language model (after 10 iterations)
>> into the decoding process by moses.ini
>>
>> I initiated tuning (standard parameters) and I got no errors, which
>> means that the neural language model (NPLM) was recognized and queried
>> appropriately.
>> I also ran tuning without a language model.
>>
>> The strange thing is that the tuning and test BLEU scores for both
>> these cases are almost the same. I checked the weights and saw that the LM
>> was assigned a very low weight.
>>
>> On the other hand when I used KENLM on the same data.... I had
>> comparatively higher BLEU scores.
>>
>> Am I missing something? Am I using the NeuralLM in an incorrect way?
>>
>> Thanks in advance.
>>
>>
>>
>> --
>> Raj Dabre.
>> Doctoral Student,
>> Graduate School of Informatics,
>> Kyoto University.
>> CSE MTech, IITB., 2011-2014
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> Raj Dabre.
> Doctoral Student,
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
Raj Dabre.
Doctoral Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150707/da1e1270/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 105, Issue 13
**********************************************

0 Response to "Moses-support Digest, Vol 105, Issue 13"

Post a Comment