Moses-support Digest, Vol 91, Issue 45

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Configuring LMs (Matthias Huck)
2. About adding features (Jianri Li)
3. Re: Configuring LMs (Lars Bungum)
4. Re: Configuring LMs (Matthias Huck)


----------------------------------------------------------------------

Message: 1
Date: Wed, 28 May 2014 01:08:59 +0100
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] Configuring LMs
To: Lars Bungum <lars.bungum@idi.ntnu.no>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <1401235739.1432.4.camel@hucklap.site>
Content-Type: text/plain; charset="UTF-8"

Hi Lars,

The instructions you're looking for are here:
http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel

You can also create a KenLM binary file instead and use it in the
decoder with the KENLM line in the [feature] section of your moses.ini.

$ kenlm/build_binary filename.arpa filename.binary

Cheers,
Matthias


On Tue, 2014-05-27 at 11:57 +0200, Lars Bungum wrote:
> Hi,
>
> I am a bit confused on how to configure the LM features correctly.
>
> In my moses.ini this feature line is provided from running the script
> train-model.perl with the lm parameters 0:3:$LMPATH (otherwise standard
> parameters from the baseline system instructions). I built the LM with
> srilm. WIth the text model I receive the following error message when
> decoding:
>
> The ARPA file is missing <unk>. Substituting log10 probability
> -100.000
>
> but it otherwise works. However, when I compiled the LM with the command:
>
> ngram -order 3 -lm en-de.kn5.lm -write-bin-lm en-de.kn5.lm.bin
>
> I receive the error message:
>
> Reading $PATH/en-de.kn5.lm.bin
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> ****************************************************************************************************
> Exception: lm/read_arpa.cc:65 in void
> lm::ReadARPACounts(util::FilePiece&, std::vector<long unsigned int>&)
> threw FormatLoadException'.
> first non-empty line was "SRILM_BINARY_NGRAM_002" not \data\. Byte: 23
>
> this led me to trying to figure out why and I looked in my moses.ini
> file. Here the LM is configured with this line:
>
> KENLM lazyken=0 name=LM0 factor=0 path=$PATH/en-de.kn5.lm.bin order=3
>
> and I here is when I couldn't find out why. Why is this feature named
> KENLM? And how do I know how to configure it? Did I make a mistake in
> running train-model somehow? I guess intuitively I should configure it
> with a line that is called SRILM that knows how to read this binary
> format, but I was not able to find out how to do that.
>
> Thanks
> //LB
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 2
Date: Wed, 28 May 2014 16:40:17 +0900
From: Jianri Li <skywalker@postech.ac.kr>
Subject: [Moses-support] About adding features
To: moses-support@mit.edu
Message-ID: <1401262817086.51880.postech@postech.ac.kr>
Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140528/c507fb62/attachment-0001.htm

------------------------------

Message: 3
Date: Wed, 28 May 2014 11:41:19 +0200
From: Lars Bungum <lars.bungum@idi.ntnu.no>
Subject: Re: [Moses-support] Configuring LMs
To: Matthias Huck <mhuck@inf.ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <5385AF3F.4040802@idi.ntnu.no>
Content-Type: text/plain; charset="utf-8"

Hi,

thanks for the pointer, good information!

But the instructions for building the Baseline model found here
http://www.statmt.org/moses/?n=Moses.Baseline builds the translation
model with the parameters:

nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir train \
-corpus ~/corpus/news-commentary-v8.fr-en.clean \
-f fr -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe \
-lm 0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8 \
-external-bin-dir ~/mosesdecoder/tools >& training.out &

.. from what I can see from the train-model.perl script the option 8
asks for KENML, but the actual LM in the baseline example is built with
IRSTLM. Still, it works to use the KENLM feature with the IRSTLM LM
(but not, SRILM as I discovered). Are these options to train-model.perl
also documented somewhere?

One more thing, about in the page you refer to it reads:

To use these language models, they have to be compiled with the proper
option:

* --with-srilm=<root dir of the SRILM toolkit>
* --with-irstlm=<root dir of the IRSTLM toolkit>
* --with-randlm=<root dir of the RandLM toolkit>
* --with-dalm=<root dir of the DALM toolkit>

I take it that "they" here actually means that moses has to be configured with these options. Is this because moses uses the libraries from the LMs
directly instead of calling commands when decoding?

//LB

On 28. mai 2014 02:08, Matthias Huck wrote:
> Hi Lars,
>
> The instructions you're looking for are here:
> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel
>
> You can also create a KenLM binary file instead and use it in the
> decoder with the KENLM line in the [feature] section of your moses.ini.
>
> $ kenlm/build_binary filename.arpa filename.binary
>
> Cheers,
> Matthias
>
>
> On Tue, 2014-05-27 at 11:57 +0200, Lars Bungum wrote:
>> Hi,
>>
>> I am a bit confused on how to configure the LM features correctly.
>>
>> In my moses.ini this feature line is provided from running the script
>> train-model.perl with the lm parameters 0:3:$LMPATH (otherwise standard
>> parameters from the baseline system instructions). I built the LM with
>> srilm. WIth the text model I receive the following error message when
>> decoding:
>>
>> The ARPA file is missing <unk>. Substituting log10 probability
>> -100.000
>>
>> but it otherwise works. However, when I compiled the LM with the command:
>>
>> ngram -order 3 -lm en-de.kn5.lm -write-bin-lm en-de.kn5.lm.bin
>>
>> I receive the error message:
>>
>> Reading $PATH/en-de.kn5.lm.bin
>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>> ****************************************************************************************************
>> Exception: lm/read_arpa.cc:65 in void
>> lm::ReadARPACounts(util::FilePiece&, std::vector<long unsigned int>&)
>> threw FormatLoadException'.
>> first non-empty line was "SRILM_BINARY_NGRAM_002" not \data\. Byte: 23
>>
>> this led me to trying to figure out why and I looked in my moses.ini
>> file. Here the LM is configured with this line:
>>
>> KENLM lazyken=0 name=LM0 factor=0 path=$PATH/en-de.kn5.lm.bin order=3
>>
>> and I here is when I couldn't find out why. Why is this feature named
>> KENLM? And how do I know how to configure it? Did I make a mistake in
>> running train-model somehow? I guess intuitively I should configure it
>> with a line that is called SRILM that knows how to read this binary
>> format, but I was not able to find out how to do that.
>>
>> Thanks
>> //LB
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140528/93f04b9c/attachment-0001.htm

------------------------------

Message: 4
Date: Wed, 28 May 2014 15:04:37 +0100
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] Configuring LMs
To: Lars Bungum <lars.bungum@idi.ntnu.no>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <1401285877.2309.1777.camel@portedgar>
Content-Type: text/plain; charset="UTF-8"

Hi Lars,

The format for the -lm parameter string in train-model.perl is:
factor:order:filename:type

"SRILM" if type == 0
"IRSTLM" if type == 1
"KENLM lazyken=0" if type == 8
"KENLM lazyken=1" if type == 9

The type is set to 0 if you omit it in the parameter string.

If you want to use SRILM, IRSTLM, RandLM, or DALM for language model
scoring during decoding, you need to compile Moses with the respective
libraries.

http://www.statmt.org/moses/?n=FactoredTraining.TrainingParameters
http://www.statmt.org/moses/manual/manual.pdf

Cheers,
Matthias


On Wed, 2014-05-28 at 11:41 +0200, Lars Bungum wrote:
> Hi,
>
> thanks for the pointer, good information!
>
> But the instructions for building the Baseline model found here
> http://www.statmt.org/moses/?n=Moses.Baseline builds the translation
> model with the parameters:
> nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir train \
> -corpus ~/corpus/news-commentary-v8.fr-en.clean \
> -f fr -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe \
> -lm 0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8 \
> -external-bin-dir ~/mosesdecoder/tools >& training.out &
> .. from what I can see from the train-model.perl script the option 8
> asks for KENML, but the actual LM in the baseline example is built
> with IRSTLM. Still, it works to use the KENLM feature with the IRSTLM
> LM (but not, SRILM as I discovered). Are these options to
> train-model.perl also documented somewhere?
> One more thing, about in the page you refer to it reads:
>
> To use these language models, they have to be compiled with the proper
> option:
>
> * --with-srilm=<root dir of the SRILM toolkit>
> * --with-irstlm=<root dir of the IRSTLM toolkit>
> * --with-randlm=<root dir of the RandLM toolkit>
> * --with-dalm=<root dir of the DALM toolkit>
> I take it that "they" here actually means that moses has to be configured with these options. Is this because moses uses the libraries from the LMs
> directly instead of calling commands when decoding?
>
> //LB
> On 28. mai 2014 02:08, Matthias Huck wrote:
>
> > Hi Lars,
> >
> > The instructions you're looking for are here:
> > http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel
> >
> > You can also create a KenLM binary file instead and use it in the
> > decoder with the KENLM line in the [feature] section of your moses.ini.
> >
> > $ kenlm/build_binary filename.arpa filename.binary
> >
> > Cheers,
> > Matthias
> >
> >
> > On Tue, 2014-05-27 at 11:57 +0200, Lars Bungum wrote:
> > > Hi,
> > >
> > > I am a bit confused on how to configure the LM features correctly.
> > >
> > > In my moses.ini this feature line is provided from running the script
> > > train-model.perl with the lm parameters 0:3:$LMPATH (otherwise standard
> > > parameters from the baseline system instructions). I built the LM with
> > > srilm. WIth the text model I receive the following error message when
> > > decoding:
> > >
> > > The ARPA file is missing <unk>. Substituting log10 probability
> > > -100.000
> > >
> > > but it otherwise works. However, when I compiled the LM with the command:
> > >
> > > ngram -order 3 -lm en-de.kn5.lm -write-bin-lm en-de.kn5.lm.bin
> > >
> > > I receive the error message:
> > >
> > > Reading $PATH/en-de.kn5.lm.bin
> > > ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> > > ****************************************************************************************************
> > > Exception: lm/read_arpa.cc:65 in void
> > > lm::ReadARPACounts(util::FilePiece&, std::vector<long unsigned int>&)
> > > threw FormatLoadException'.
> > > first non-empty line was "SRILM_BINARY_NGRAM_002" not \data\. Byte: 23
> > >
> > > this led me to trying to figure out why and I looked in my moses.ini
> > > file. Here the LM is configured with this line:
> > >
> > > KENLM lazyken=0 name=LM0 factor=0 path=$PATH/en-de.kn5.lm.bin order=3
> > >
> > > and I here is when I couldn't find out why. Why is this feature named
> > > KENLM? And how do I know how to configure it? Did I make a mistake in
> > > running train-model somehow? I guess intuitively I should configure it
> > > with a line that is called SRILM that knows how to read this binary
> > > format, but I was not able to find out how to do that.
> > >
> > > Thanks
> > > //LB
> > > _______________________________________________
> > > Moses-support mailing list
> > > Moses-support@mit.edu
> > > http://mailman.mit.edu/mailman/listinfo/moses-support
> > >
> >
> >
>
>
>



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 91, Issue 45
*********************************************

Related Posts :

0 Response to "Moses-support Digest, Vol 91, Issue 45"

Post a Comment