Moses-support Digest, Vol 108, Issue 9

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. KenLM poison (???)
2. Re: (no subject) (Rico Sennrich)


----------------------------------------------------------------------

Message: 1
Date: Mon, 5 Oct 2015 15:52:04 +0300
From: ??? <fountain1128@gmail.com>
Subject: [Moses-support] KenLM poison
To: moses-support@mit.edu
Message-ID: <5519FC67-FE07-4BA1-927F-DC0691EAC747@gmail.com>
Content-Type: text/plain; charset="gb2312"

Dear all,

I?m building the baseline system, and some error occurred during the last step of LM training process as the first attached file shows.

I checked another case of ?Last input should have been poison?, but that one has more detailed information ?no space left on device?, while mine has nothing but that sentence.

The exact command I use for Kenlm is:
$MOSES/bin/lmplz -o 3 < ~/es-fi/OpenSubtitles2013.es-fi.true.fi <http://opensubtitles2013.es-fi.true.fi/> > OpenSubtitles2013.es-fi.arpa.fi <http://opensubtitles2013.es-fi.arpa.fi/>

As mosesdecoder is installed at the administrator?s directory instead of my own, "~/mosesdecoder "is replaced by $MOSES.

my corpus(the language pair is Spanish to Finnish) was downloaded from Opus(http://opus.lingfil.uu.se/OpenSubtitles2013.php <http://opus.lingfil.uu.se/OpenSubtitles2013.php>) in the Moses format.

The downloaded profile contains three files: OpenSubtitles2013.es-fi.es <http://opensubtitles2013.es/>, OpenSubtitles2013.es <http://opensubtitles2013.es/>-fi.fi <http://opensubtitles2013.fi/>, and OpenSubtitles2013.es <http://opensubtitles2013.es/>-fi.ids.

The tokenization, truecasing and cleaning are all completed with the ?es" and ?fi? files. Is it possible if the error has something to do with the ?ids? file?

Here attaches the output of LM process, and the command I used for corpus preparation.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151005/36b9b9ff/attachment-0003.html
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: KenLM-output.txt
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151005/36b9b9ff/attachment-0002.txt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151005/36b9b9ff/attachment-0004.html
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: CPcommand.txt
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151005/36b9b9ff/attachment-0003.txt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151005/36b9b9ff/attachment-0005.html

------------------------------

Message: 2
Date: Mon, 5 Oct 2015 14:37:04 +0100
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] (no subject)
To: Sanjanashree Palanivel <sanjanashree@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <56127D00.5070705@gmx.ch>
Content-Type: text/plain; charset="utf-8"

Hi Sanjanasri,

1) your corpus is very small, and you may have to use more iterations of
NPLM training and smaller vocabulary sizes. Just to double-check, are
you tuning your systems? MERT (or PRO or MIRA) should normally ensure
that adding a model doesn't make BLEU go down.

2) I'm not sure which perplexity is for which model, but lower
perplexity is better, so this makes sense.

3) a perplexity of 3 is *extremely* low. Do you have overlap between
your test set and your training set? This would be an unrealistic test
setting, and would explain why KenLM does so much better (because
backoff n-gram models are good at memorizing things).

best wishes,
Rico


On 05.10.2015 09:27, Sanjanashree Palanivel wrote:
> Dear Rico,
>
> I tried using KENLM and NPLM for three language
> pairs. And I came across series of questions . I am listing it one by
> one. It would be great if you could guide me.
>
>
> 1) I did testing for NPLM with different vocabulary sizes and training
> epochs. But, the bleu score, I gained from NPLM integrated with KENLM
> is smaller than the one I trained with KENLM. In all the three
> language pairs I get a standard difference of three.
>
> Eg: English to Hindi (KENLM-17.43, NPLM+KENLM-14.27)
> Tamil to Hindi (KENLM-16.66,NPLM+KENLM-13.53)
> Marathi to Hindi (KENLM-29.42,NPLM+KENLM-25.76)
>
> The sentence count is 103502. unigram count is 89919. I gave
> vocabulary size as 89000,89700,89850 with validation size 200,200,100
> respectively and with different learning rate and epocs. However, I am
> getting Bleu score of NPLM and KENLM is lesser.
>
>
> 2)The Bleu score of the model having perplexity about 385 has higher
> Bleu score than the one having pp around 564 . Is this rite model. I
> mean the model with lower perplexity seems to give better Bleu score.
> Where am I doing worng.
>
>
> 3) I used query script for KENLM model. I found perplexity to 3.4xx.
> But, the Bleu score of KENLM alone in decoding phase gives Blue of
> 16.66 for English to HIndi MT. But, when combined with NPLM I get only
> 13.53.
>
> On Sun, Sep 20, 2015 at 8:07 PM, Sanjanashree Palanivel
> <sanjanashree@gmail.com <mailto:sanjanashree@gmail.com>> wrote:
>
> Dear Rico,
>
> Thanks a lot for your excellent guidance.
>
> On Sat, Sep 19, 2015 at 9:10 PM, Rico Sennrich
> <rico.sennrich@gmx.ch <mailto:rico.sennrich@gmx.ch>> wrote:
>
> Hi Sanjanasri,
>
> we have seen improvements in BLEU from having both KENLM and
> NPLM in our system. Things can go wrong during training though
> (e.g. a bad choice of hyperparameters (vocabulary size, number
> of training epochs)). I recommend using a development set
> during NPLM training, and comparing perplexity scores with
> those obtained from KENLM.
>
> maybe somebody else can help you with the phrase table
> normalization. NPLM doesn't have binarization.
>
> best wishes,
> Rico
>
>
> On 19/09/15 08:11, Sanjanashree Palanivel wrote:
>> Dear Rico,
>>
>> I did necessary changes and I trained language
>> model succesfully. The language model of nplm gives me lesser
>> BLEU score when compared to KENLM. But, when I used two
>> models together accuracy is greater than the one I got in
>> NPLM alone but lesser than KENLM. I am just trying to tune it
>> by changing the parameters. So far the accuracy is getting
>> improved but not close to KENLM accuracy. Is that worthy to
>> do because its taking quite a long time to train.
>>
>> I also tried to binarize the phrase table following this
>> http://www.statmt.org/moses/?n=Advanced.RuleTables#ntoc3, and
>> compilation with moses is done succesfully. But. when i run
>> processPhraseTableMin -threads 3 -in train/model/phrase-table.gz
>> -nscores 4 -out binarised-model/phrase-table
>> I am getting segmentation fault. I dont know what is worng. Is there something todo with threads
>> Also how to binarize nplm model
>>
>> On Fri, Sep 18, 2015 at 11:27 AM, Sanjanashree Palanivel
>> <sanjanashree@gmail.com <mailto:sanjanashree@gmail.com>> wrote:
>>
>> Dear Rico,
>>
>> Thanks a lot. Will do the necessary changes
>>
>>
>> On Thu, Sep 17, 2015 at 1:54 PM, Rico Sennrich
>> <rico.sennrich@gmx.ch <mailto:rico.sennrich@gmx.ch>> wrote:
>>
>> Hi Sanjanasri,
>>
>> if you first compiled moses without the option
>> '--with-nplm', and then add the option later, the
>> build system isn't smart enough to know which files
>> it needs to recompile. if you change one of the
>> compile options, use the option '-a' to force
>> recompilation from scratch.
>>
>> best wishes,
>> Rico
>>
>>
>>
>>
>> On 16/09/15 06:30, Sanjanashree Palanivel wrote:
>>> Dear Rico,
>>>
>>>
>>> I did the following steps
>>>
>>>
>>> 1. Installed NPLM and trained a language model
>>> 2. I compiled it with Moses with the command
>>> ./bjam --with-nplm=path/to/nplm
>>>
>>> ./bjam
>>> --with-nplm=/home/sanjana/Documents/SMT/NPLM/nplm
>>> Tip: install tcmalloc for faster threading. See
>>> BUILD-INSTRUCTIONS.txt for more information.
>>> warning: No toolsets are configured.
>>> warning: Configuring default toolset "gcc".
>>> warning: If the default is wrong, your build may
>>> not work correctly.
>>> warning: Use the "toolset=xxxxx" option to
>>> override our guess.
>>> warning: For more configuration options, please
>>> consult
>>> warning:
>>> http://boost.org/boost-build2/doc/html/bbv2/advanced/configuration.html
>>> NOT BUILDING MOSES SERVER!
>>> Performing configuration checks
>>>
>>> - Shared Boost : yes (cached)
>>> - Static Boost : yes (cached)
>>> ...patience...
>>> ...patience...
>>> ...found 4823 targets...
>>> SUCCESS
>>>
>>> 3. I added the the following lines to the
>>> moses.ini file
>>>
>>> NeuralLM factor=0 name=LM1 order=5
>>> path=/path/to/nplmmodel
>>> LM1= 0.5
>>>
>>> Then i did testing. and end up with the error
>>>
>>>
>>> On Tue, Sep 15, 2015 at 8:43 PM, Rico Sennrich
>>> <rico.sennrich@gmx.ch <mailto:rico.sennrich@gmx.ch>>
>>> wrote:
>>>
>>> Hi Sanjanasri,
>>>
>>> this error occurs when Moses was compiled
>>> without the option '--with-nplm'.
>>>
>>> best wishes,
>>> Rico
>>>
>>>
>>>
>>> On 15.09.2015 15 <tel:15.09.2015%2015>:08,
>>> Sanjanashree Palanivel wrote:
>>>> Dear Rico,
>>>>
>>>> I updated moses and NPLM has been
>>>> compiled succesfully with moses. However, when
>>>> I perform decoding I am getting an error.
>>>>
>>>> Defined parameters (per moses.ini or switch):
>>>> config:
>>>> /home/sanjana/Documents/SMT/ICON15/Health/BL/Ta_H/model/moses.ini
>>>>
>>>> distortion-limit: 6
>>>> feature: UnknownWordPenalty WordPenalty
>>>> PhrasePenalty PhraseDictionaryMemory
>>>> name=TranslationModel0 num-features=4
>>>> path=/home/sanjana/Documents/SMT/ICON15/Health/BL/Ta_H/model/phrase-table.gz
>>>> input-factor=0 output-factor=0 Distortion
>>>> KENLM lazyken=0 name=LM0 factor=0
>>>> path=/home/sanjana/Documents/SMT/LM/Hindi/monolin80k.hi1.bin
>>>> order=3 NeuralLM factor=0 name=LM1 order=3
>>>> path=/home/sanjana/Documents/SMT/LM/Hindi/hin_out.txt
>>>>
>>>> input-factors: 0
>>>> mapping: 0 T 0
>>>> weight: Distortion0= 0.136328 LM0=
>>>> 0.135599 LM1= 0.5 WordPenalty0= -0.488892
>>>> PhrasePenalty0= 0.0826147
>>>> TranslationModel0= 0.0104273 0.0663914
>>>> 0.0254094 0.0543384 UnknownWordPenalty0= 1
>>>> line=UnknownWordPenalty
>>>> FeatureFunction: UnknownWordPenalty0 start:
>>>> 0 end: 0
>>>> line=WordPenalty
>>>> FeatureFunction: WordPenalty0 start: 1 end: 1
>>>> line=PhrasePenalty
>>>> FeatureFunction: PhrasePenalty0 start: 2 end: 2
>>>> line=PhraseDictionaryMemory
>>>> name=TranslationModel0 num-features=4
>>>> path=/home/sanjana/Documents/SMT/ICON15/Health/BL/Ta_H/model/phrase-table.gz
>>>> input-factor=0 output-factor=0
>>>> FeatureFunction: TranslationModel0 start: 3
>>>> end: 6
>>>> line=Distortion
>>>> FeatureFunction: Distortion0 start: 7 end: 7
>>>> line=KENLM lazyken=0 name=LM0 factor=0
>>>> path=/home/sanjana/Documents/SMT/LM/Hindi/monolin80k.hi1.bin
>>>> order=3
>>>> FeatureFunction: LM0 start: 8 end: 8
>>>> line=NeuralLM factor=0 name=LM1 order=3
>>>> path=/home/sanjana/Documents/SMT/LM/Hindi/hin_out.txt
>>>> Exception: moses/FF/Factory.cpp:349 in void
>>>> Moses::FeatureRegistry::Construct(const
>>>> string&, const string&) threw
>>>> UnknownFeatureException because `i ==
>>>> registry_.end()'.
>>>> Feature name NeuralLM is not registered.
>>>>
>>>>
>>>> I added following 2 lines in my moses file
>>>>
>>>> NeuralLM factor=0 name=LM1 order=5
>>>> path=/path/to/nplmmodel
>>>> LM1= 0.5
>>>>
>>>>
>>>>
>>>> On Tue, Sep 15, 2015 at 5:06 PM, Sanjanashree
>>>> Palanivel <sanjanashree@gmail.com
>>>> <mailto:sanjanashree@gmail.com>> wrote:
>>>>
>>>> Thank you for your earnest response. I will
>>>> update moses and I will try
>>>>
>>>> On Tue, Sep 15, 2015 at 4:22 PM, Rico
>>>> Sennrich <rico.sennrich@gmx.ch
>>>> <mailto:rico.sennrich@gmx.ch>> wrote:
>>>>
>>>> Hello Sanjanasri,
>>>>
>>>> this looks like a version mismatch
>>>> between Moses and NPLM. Specifically,
>>>> you're using an older Moses commit that
>>>> is only compatible with nplm 0.2 (or
>>>> specifically, Kenneth's fork at
>>>> https://github.com/kpu/nplm ).
>>>>
>>>> If you use the latest Moses version
>>>> from
>>>> https://github.com/moses-smt/mosesdecoder
>>>> , and the latest nplm version from
>>>> https://github.com/moses-smt/nplm , it
>>>> should work.
>>>>
>>>> best wishes,
>>>> Rico
>>>>
>>>>
>>>> On 15.09.2015 08
>>>> <tel:15.09.2015%2008>:24, Sanjanashree
>>>> Palanivel wrote:
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I tried building language model using
>>>>> NPLM. Llanguage model was build
>>>>> succesfully, but, when I tried to
>>>>> compile NPLM with Moses using "./bjam
>>>>> --with-nplm=path/to/nplm" I am getting
>>>>> an error. I am using boost 1.55. I am
>>>>> attaching the log file for reference.
>>>>> I dont know where I went wrong. Any
>>>>> help would be appreciated.
>>>>>
>>>>>
>>>>> --
>>>>> Thanks and regards,
>>>>>
>>>>> Sanjanasri J.P
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> <mailto:Moses-support@mit.edu>
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> <mailto:Moses-support@mit.edu>
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks and regards,
>>>>
>>>> Sanjanasri J.P
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks and regards,
>>>>
>>>> Sanjanasri J.P
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>> --
>>> Thanks and regards,
>>>
>>> Sanjanasri J.P
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> --
>> Thanks and regards,
>>
>> Sanjanasri J.P
>>
>>
>>
>>
>> --
>> Thanks and regards,
>>
>> Sanjanasri J.P
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Thanks and regards,
>
> Sanjanasri J.P
>
>
>
>
> --
> Thanks and regards,
>
> Sanjanasri J.P

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151005/15fbcd92/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 108, Issue 9
*********************************************

0 Response to "Moses-support Digest, Vol 108, Issue 9"

Post a Comment