Moses-support Digest, Vol 108, Issue 14

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: (no subject) (Rico Sennrich)

----------------------------------------------------------------------

Message: 1
Date: Mon, 5 Oct 2015 22:15:42 +0100
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] (no subject)
To: Sanjanashree Palanivel <sanjanashree@gmail.com>,
"moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <5612E87E.2010404@gmx.ch>
Content-Type: text/plain; charset="utf-8"

Hello Sanjanasri,

Basically, you can forget all results that you obtained without tuning.
They are not a meaningful indicator of the quality of NPLM. If you add a
new language model, the weight of the other language models, translation
models etc. needs to be balanced accordingly, and that is what tuning does.

If you do tuning every time you add/remove/change a model, you can
modify your moses.ini any way you want, and you do not need to specify
all models during training. But again, don't make conclusions without
tuning for each modified moses.ini. I know this will slow down your
experiments, but that's the only way to produce new knowlege.

setting num_hidden to zero is a bit of a hack, because NPLM normally has
two hidden layers (and num_hidden gives the size of the first hidden
layer). If we want to use NPLM in decoding, we only want one hidden
layer for speed, hence num_hidden=0 [output_embedding_dimension is the
size of the remaining hidden layer]. The vocabulary size thing was a
guess. I don't know what the best vocabulary size would be for you, but
I tend to recommend smaller vocabularies if you have less data.

best wishes,
Rico

On 05/10/15 17:14, Sanjanashree Palanivel wrote:
> Dear Rico,
>
> Thanks a lot. I will increase the number of hidden layers,
> currently the parameter num_hidden is fixed as Zero. I did not tune
> the system just working with baseline. Will do tuning too. But, why
> smaller vocabulary size, what is the reason?. It should be my mistake,
> the testset may not be an disjoint set i guess.I will check. I got two
> more doubts. Please clarify.
>
>
> 1) Can I use two or more LMs (specifically NPLMs with different
> vocabulary size) in moses.ini file or should i use LMs that are used
> in training phase. Ex: Can I use SRILM or RANDLM directly to moses.ini
> file without specifying in training phase.
>
> 2) I just did little tinkering in moses.ini file for English-Hindi MT
> System to understand the influence of LM. I trained the system with
> KENLM of order 3. But, in *.ini file, the replaced the lm to 5-order
> KENLM binary file. Let me name the *.ini file as dummy 5-KENLM. The
> Bleu of Score of dummy 5-KENLM (14.72) is higher than the 3-order
> KENLM (12.53) , though the Bleu score is lesser than the model
> actually trained with 5-KENLM (17.43). I understand, what i modified
> in *.ini does not make any sense. But, I wish to know, why score is
> higher for dummy 5-KENLM than 3-KENLM.
>
>
>
> On Mon, Oct 5, 2015 at 7:07 PM, Rico Sennrich <rico.sennrich@gmx.ch
> <mailto:rico.sennrich@gmx.ch>> wrote:
>
> Hi Sanjanasri,
>
> 1) your corpus is very small, and you may have to use more
> iterations of NPLM training and smaller vocabulary sizes. Just to
> double-check, are you tuning your systems? MERT (or PRO or MIRA)
> should normally ensure that adding a model doesn't make BLEU go down.
>
> 2) I'm not sure which perplexity is for which model, but lower
> perplexity is better, so this makes sense.
>
> 3) a perplexity of 3 is *extremely* low. Do you have overlap
> between your test set and your training set? This would be an
> unrealistic test setting, and would explain why KenLM does so much
> better (because backoff n-gram models are good at memorizing things).
>
> best wishes,
> Rico
>
>
>
> On 05.10.2015 09:27, Sanjanashree Palanivel wrote:
>> Dear Rico,
>>
>> I tried using KENLM and NPLM for three language
>> pairs. And I came across series of questions . I am listing it
>> one by one. It would be great if you could guide me.
>>
>>
>> 1) I did testing for NPLM with different vocabulary sizes and
>> training epochs. But, the bleu score, I gained from NPLM
>> integrated with KENLM is smaller than the one I trained with
>> KENLM. In all the three language pairs I get a standard
>> difference of three.
>>
>> Eg: English to Hindi (KENLM-17.43, NPLM+KENLM-14.27)
>> Tamil to Hindi (KENLM-16.66,NPLM+KENLM-13.53)
>> Marathi to Hindi (KENLM-29.42,NPLM+KENLM-25.76)
>>
>> The sentence count is 103502. unigram count is 89919. I gave
>> vocabulary size as 89000,89700,89850 with validation size
>> 200,200,100 respectively and with different learning rate and
>> epocs. However, I am getting Bleu score of NPLM and KENLM is lesser.
>>
>>
>> 2)The Bleu score of the model having perplexity about 385 has
>> higher Bleu score than the one having pp around 564 . Is this
>> rite model. I mean the model with lower perplexity seems to give
>> better Bleu score. Where am I doing worng.
>>
>>
>> 3) I used query script for KENLM model. I found perplexity to
>> 3.4xx. But, the Bleu score of KENLM alone in decoding phase gives
>> Blue of 16.66 for English to HIndi MT. But, when combined with
>> NPLM I get only 13.53.
>>
>> On Sun, Sep 20, 2015 at 8:07 PM, Sanjanashree Palanivel
>> <sanjanashree@gmail.com <mailto:sanjanashree@gmail.com>> wrote:
>>
>> Dear Rico,
>>
>> Thanks a lot for your excellent guidance.
>>
>> On Sat, Sep 19, 2015 at 9:10 PM, Rico Sennrich
>> <rico.sennrich@gmx.ch <mailto:rico.sennrich@gmx.ch>> wrote:
>>
>> Hi Sanjanasri,
>>
>> we have seen improvements in BLEU from having both KENLM
>> and NPLM in our system. Things can go wrong during
>> training though (e.g. a bad choice of hyperparameters
>> (vocabulary size, number of training epochs)). I
>> recommend using a development set during NPLM training,
>> and comparing perplexity scores with those obtained from
>> KENLM.
>>
>> maybe somebody else can help you with the phrase table
>> normalization. NPLM doesn't have binarization.
>>
>> best wishes,
>> Rico
>>
>>
>> On 19/09/15 08:11, Sanjanashree Palanivel wrote:
>>> Dear Rico,
>>>
>>> I did necessary changes and I trained
>>> language model succesfully. The language model of nplm
>>> gives me lesser BLEU score when compared to KENLM. But,
>>> when I used two models together accuracy is greater than
>>> the one I got in NPLM alone but lesser than KENLM. I am
>>> just trying to tune it by changing the parameters. So
>>> far the accuracy is getting improved but not close to
>>> KENLM accuracy. Is that worthy to do because its taking
>>> quite a long time to train.
>>>
>>> I also tried to binarize the phrase table following
>>> this
>>> http://www.statmt.org/moses/?n=Advanced.RuleTables#ntoc3, and
>>> compilation with moses is done succesfully. But. when i run
>>> processPhraseTableMin -threads 3 -in train/model/phrase-table.gz
>>> -nscores 4 -out binarised-model/phrase-table
>>> I am getting segmentation fault. I dont know what is worng. Is there something todo with threads
>>> Also how to binarize nplm model
>>>
>>> On Fri, Sep 18, 2015 at 11:27 AM, Sanjanashree Palanivel
>>> <sanjanashree@gmail.com <mailto:sanjanashree@gmail.com>>
>>> wrote:
>>>
>>> Dear Rico,
>>>
>>> Thanks a lot. Will do the necessary
>>> changes
>>>
>>>
>>> On Thu, Sep 17, 2015 at 1:54 PM, Rico Sennrich
>>> <rico.sennrich@gmx.ch <mailto:rico.sennrich@gmx.ch>>
>>> wrote:
>>>
>>> Hi Sanjanasri,
>>>
>>> if you first compiled moses without the option
>>> '--with-nplm', and then add the option later,
>>> the build system isn't smart enough to know
>>> which files it needs to recompile. if you change
>>> one of the compile options, use the option '-a'
>>> to force recompilation from scratch.
>>>
>>> best wishes,
>>> Rico
>>>
>>>
>>>
>>>
>>> On 16/09/15 06:30, Sanjanashree Palanivel wrote:
>>>> Dear Rico,
>>>>
>>>>
>>>> I did the following steps
>>>>
>>>>
>>>> 1. Installed NPLM and trained a language model
>>>> 2. I compiled it with Moses with the
>>>> command ./bjam --with-nplm=path/to/nplm
>>>>
>>>> ./bjam
>>>> --with-nplm=/home/sanjana/Documents/SMT/NPLM/nplm
>>>> Tip: install tcmalloc for faster threading.
>>>> See BUILD-INSTRUCTIONS.txt for more
>>>> information.
>>>> warning: No toolsets are configured.
>>>> warning: Configuring default toolset "gcc".
>>>> warning: If the default is wrong, your
>>>> build may not work correctly.
>>>> warning: Use the "toolset=xxxxx" option to
>>>> override our guess.
>>>> warning: For more configuration options,
>>>> please consult
>>>> warning:
>>>> http://boost.org/boost-build2/doc/html/bbv2/advanced/configuration.html
>>>> NOT BUILDING MOSES SERVER!
>>>> Performing configuration checks
>>>>
>>>> - Shared Boost : yes (cached)
>>>> - Static Boost : yes (cached)
>>>> ...patience...
>>>> ...patience...
>>>> ...found 4823 targets...
>>>> SUCCESS
>>>>
>>>> 3. I added the the following lines to the
>>>> moses.ini file
>>>>
>>>> NeuralLM factor=0 name=LM1 order=5
>>>> path=/path/to/nplmmodel
>>>> LM1= 0.5
>>>>
>>>> Then i did testing. and end up with the error
>>>>
>>>>
>>>> On Tue, Sep 15, 2015 at 8:43 PM, Rico Sennrich
>>>> <rico.sennrich@gmx.ch
>>>> <mailto:rico.sennrich@gmx.ch>> wrote:
>>>>
>>>> Hi Sanjanasri,
>>>>
>>>> this error occurs when Moses was compiled
>>>> without the option '--with-nplm'.
>>>>
>>>> best wishes,
>>>> Rico
>>>>
>>>>
>>>>
>>>> On 15.09.2015 15 <tel:15.09.2015%2015>:08,
>>>> Sanjanashree Palanivel wrote:
>>>>> Dear Rico,
>>>>>
>>>>> I updated moses and NPLM has
>>>>> been compiled succesfully with moses.
>>>>> However, when I perform decoding I am
>>>>> getting an error.
>>>>>
>>>>> Defined parameters (per moses.ini or
>>>>> switch):
>>>>> config:
>>>>> /home/sanjana/Documents/SMT/ICON15/Health/BL/Ta_H/model/moses.ini
>>>>>
>>>>> distortion-limit: 6
>>>>> feature: UnknownWordPenalty
>>>>> WordPenalty PhrasePenalty
>>>>> PhraseDictionaryMemory
>>>>> name=TranslationModel0 num-features=4
>>>>> path=/home/sanjana/Documents/SMT/ICON15/Health/BL/Ta_H/model/phrase-table.gz
>>>>> input-factor=0 output-factor=0
>>>>> Distortion KENLM lazyken=0 name=LM0
>>>>> factor=0
>>>>> path=/home/sanjana/Documents/SMT/LM/Hindi/monolin80k.hi1.bin
>>>>> order=3 NeuralLM factor=0 name=LM1
>>>>> order=3
>>>>> path=/home/sanjana/Documents/SMT/LM/Hindi/hin_out.txt
>>>>>
>>>>> input-factors: 0
>>>>> mapping: 0 T 0
>>>>> weight: Distortion0= 0.136328 LM0=
>>>>> 0.135599 LM1= 0.5 WordPenalty0=
>>>>> -0.488892 PhrasePenalty0= 0.0826147
>>>>> TranslationModel0= 0.0104273 0.0663914
>>>>> 0.0254094 0.0543384
>>>>> UnknownWordPenalty0= 1
>>>>> line=UnknownWordPenalty
>>>>> FeatureFunction: UnknownWordPenalty0
>>>>> start: 0 end: 0
>>>>> line=WordPenalty
>>>>> FeatureFunction: WordPenalty0 start: 1
>>>>> end: 1
>>>>> line=PhrasePenalty
>>>>> FeatureFunction: PhrasePenalty0 start:
>>>>> 2 end: 2
>>>>> line=PhraseDictionaryMemory
>>>>> name=TranslationModel0 num-features=4
>>>>> path=/home/sanjana/Documents/SMT/ICON15/Health/BL/Ta_H/model/phrase-table.gz
>>>>> input-factor=0 output-factor=0
>>>>> FeatureFunction: TranslationModel0
>>>>> start: 3 end: 6
>>>>> line=Distortion
>>>>> FeatureFunction: Distortion0 start: 7
>>>>> end: 7
>>>>> line=KENLM lazyken=0 name=LM0 factor=0
>>>>> path=/home/sanjana/Documents/SMT/LM/Hindi/monolin80k.hi1.bin
>>>>> order=3
>>>>> FeatureFunction: LM0 start: 8 end: 8
>>>>> line=NeuralLM factor=0 name=LM1
>>>>> order=3
>>>>> path=/home/sanjana/Documents/SMT/LM/Hindi/hin_out.txt
>>>>> Exception: moses/FF/Factory.cpp:349 in
>>>>> void
>>>>> Moses::FeatureRegistry::Construct(const string&,
>>>>> const string&) threw
>>>>> UnknownFeatureException because `i ==
>>>>> registry_.end()'.
>>>>> Feature name NeuralLM is not registered.
>>>>>
>>>>>
>>>>> I added following 2 lines in my moses file
>>>>>
>>>>> NeuralLM factor=0 name=LM1 order=5
>>>>> path=/path/to/nplmmodel
>>>>> LM1= 0.5
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 15, 2015 at 5:06 PM,
>>>>> Sanjanashree Palanivel
>>>>> <sanjanashree@gmail.com
>>>>> <mailto:sanjanashree@gmail.com>> wrote:
>>>>>
>>>>> Thank you for your earnest response. I
>>>>> will update moses and I will try
>>>>>
>>>>> On Tue, Sep 15, 2015 at 4:22 PM, Rico
>>>>> Sennrich <rico.sennrich@gmx.ch
>>>>> <mailto:rico.sennrich@gmx.ch>> wrote:
>>>>>
>>>>> Hello Sanjanasri,
>>>>>
>>>>> this looks like a version mismatch
>>>>> between Moses and NPLM.
>>>>> Specifically, you're using an
>>>>> older Moses commit that is only
>>>>> compatible with nplm 0.2 (or
>>>>> specifically, Kenneth's fork at
>>>>> https://github.com/kpu/nplm ).
>>>>>
>>>>> If you use the latest Moses
>>>>> version from
>>>>> https://github.com/moses-smt/mosesdecoder
>>>>> , and the latest nplm version from
>>>>> https://github.com/moses-smt/nplm
>>>>> , it should work.
>>>>>
>>>>> best wishes,
>>>>> Rico
>>>>>
>>>>>
>>>>> On 15.09.2015 08
>>>>> <tel:15.09.2015%2008>:24,
>>>>> Sanjanashree Palanivel wrote:
>>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I tried building language model
>>>>>> using NPLM. Llanguage model was
>>>>>> build succesfully, but, when I
>>>>>> tried to compile NPLM with Moses
>>>>>> using "./bjam
>>>>>> --with-nplm=path/to/nplm" I am
>>>>>> getting an error. I am using
>>>>>> boost 1.55. I am attaching the
>>>>>> log file for reference. I dont
>>>>>> know where I went wrong. Any help
>>>>>> would be appreciated.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks and regards,
>>>>>>
>>>>>> Sanjanasri J.P
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> Moses-support@mit.edu
>>>>>> <mailto:Moses-support@mit.edu>
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> <mailto:Moses-support@mit.edu>
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks and regards,
>>>>>
>>>>> Sanjanasri J.P
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks and regards,
>>>>>
>>>>> Sanjanasri J.P
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> <mailto:Moses-support@mit.edu>
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks and regards,
>>>>
>>>> Sanjanasri J.P
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>> --
>>> Thanks and regards,
>>>
>>> Sanjanasri J.P
>>>
>>>
>>>
>>>
>>> --
>>> Thanks and regards,
>>>
>>> Sanjanasri J.P
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> --
>> Thanks and regards,
>>
>> Sanjanasri J.P
>>
>>
>>
>>
>> --
>> Thanks and regards,
>>
>> Sanjanasri J.P
>
>
>
>
> --
> Thanks and regards,
>
> Sanjanasri J.P

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151005/696f7009/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 108, Issue 14
**********************************************

Moses-support Digest, Vol 108, Issue 14

0 Response to "Moses-support Digest, Vol 108, Issue 14"

Post a Comment