Moses-support Digest, Vol 97, Issue 83

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Delvin et al 2014 (Tom Hoar)


----------------------------------------------------------------------

Message: 1
Date: Wed, 26 Nov 2014 21:09:04 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] Delvin et al 2014
To: moses-support@mit.edu
Message-ID: <5475DF00.5030402@precisiontranslationtools.com>
Content-Type: text/plain; charset="windows-1252"

Thanks Nikolay! This is a great start. I have a few clarification
questions.

1) does this replace or run independently of traditional language models
like KenLM? I.e. when compiling, we can use -with-kenlm, -with-irstlm,
-with-randlm and -with-srilm together. Are -with-oxlm and -with-nplm
added to the stack or are they exclusive?

2) It looks like your branch of nplm is thread-safe. Is oxlm also
thread-safe?

3) You say, "To run it in moses as a feature function..." Does that mean
compiling with your above option(s) creates a new runtime binary
"BilingualNPLM" that replaces the moses binary, much like moseschart and
mosesserver? Or, does BilingualNPLM run in a separate process that the
Moses binary accesses during runtime?

4) How large do these LM files become? Are they comparable to
traditional ARPA files, larger or smaller? Also, are they binarized with
mmap reads or do they have to load into RAM?

Thanks,
Tom




On 11/26/2014 08:04 PM, Nikolay Bogoychev wrote:
> Fix formatting...
>
> Hey,
>
> BilingualLM is implemented and as of last week resides within moses
> master:
> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp
>
> To compile it you need a NeuralNetwork backend for it. Currently there
> are two supported: Oxlm and Nplm. Adding a new backend is relatively
> easy, you need to implement the interface as shown here:
> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h
>
> To compile with oxlm backend you need to compile moses with the switch
> -with-oxlm=/path/to/oxlm
> To compile with nplm backend you need to compile moses with the switch
> -with-nplm=/path/to/nplm (You need this fork of nplm
> https://github.com/rsennrich/nplm
>
> Unfortunately documentaiton is not yet available so here's a short
> summary how to train a model and use it using, the nplm backend:
> Use the extract training script to prepare aligned bilingual corpus:
> https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py
>
> You need the following options:
>
> "-e", "--target-language", type="string", dest="target_language")
> //Mandatory, for example es "-f", "--source-language", type="string",
> dest="source_language") //Mandatory, for example en "-c", "--corpus",
> type="string", dest="corpus_stem") // path/to/corpus In the directory
> you have specified there should be files corpus.sourcelang and
> corpus.targetlang "-t", "--tagged-corpus", type="string",
> dest="tagged_stem") //Optional for backoff to pos tag "-a", "--align",
> type="string", dest="align_file") //Mandatory alignemtn file "-w",
> "--working-dir", type="string", dest="working_dir") //Output directory
> of the model "-n", "--target-context", type="int", dest="n") / "-m",
> "--source-context", type="int", dest="m") //The actual context size is
> 2*m + 1, this is the number of words on both left and right "-s",
> "--prune-source-vocab", type="int", dest="sprune") //cutoff vocabulary
> threshold "-p", "--prune-target-vocab", type="int", dest="tprune")
> //cutoff vocabulary threshold
>
> Then, use the training script to train the model:
> https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/train_nplm.py
>
> Example execution is:
>
> train_nplm.py -w de-en-500250source/ -r de-en150nopos-source750 -n 16
> -d 0 --nplm-home=/home/abmayne/code/deepathon/nplm_one_layer/ -c
> corpus.1.word -i 750 -o 750
>
> where -i and -o are input and output embeddings
> -n is the total ngram size
> -d is the number of hidden layyers
> -w and -c are the same as the extract_training options
> -r is the output directory of the model
>
> Consult the python script for more detailed description of the options
>
> After you have done that in the output directory you should have a
> trained bilingual Neural Network language model
>
> To run it in moses as a feature function you need the following line:
>
> BilingualNPLM
> filepath=/mnt/gna0/nbogoych/new_nplm_german/de-en150nopos/train.10k.model.nplm.10
> target_ngrams=4 source_ngrams=9
> source_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.source
> target_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.targe
>
> The source and target vocab is located in the working directory used
> to prepare the neural network language model.
> target_ngrams doesn't include the predicted word (so target_ngrams =
> 4, would mean 1 word predicted and 4 target context word)
> The total of the model would target_ngrams + source_ngrams + 1)
>
> I will write a proper documentation in the following weeks. If you
> have any problems runnning it, please consult me.
>
> Cheers,
>
> Nick
>
>
> On Wed, Nov 26, 2014 at 1:02 PM, Nikolay Bogoychev <nheart@gmail.com
> <mailto:nheart@gmail.com>> wrote:
>
> Hey,
>
> BilingualLM is implemented and as of last week resides within
> moses master:
> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp
>
> To compile it you need a NeuralNetwork backend for it. Currently
> there are two supported: Oxlm and Nplm. Adding a new backend is
> relatively easy, you need to implement the interface as shown here:
> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h
>
> To compile with oxlm backend you need to compile moses with the
> switch -with-oxlm=/path/to/oxlm
> To compile with nplm backend you need to compile moses with the
> switch -with-nplm=/path/to/nplm (You need this fork of nplm
> https://github.com/rsennrich/nplm
>
> Unfortunately documentaiton is not yet available so here's a short
> summary how to train a model and use it using, the nplm backend:
> Use the extract training script to prepare aligned bilingual
> corpus:
> https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/extract_training.py
>
> You need the following options:
>
> "-e", "--target-language", type="string", dest="target_language")
> //Mandatory, for example es "-f", "--source-language",
> type="string", dest="source_language") //Mandatory, for example en
> "-c", "--corpus", type="string", dest="corpus_stem") //
> path/to/corpus In the directory you have specified there should be
> files corpus.sourcelang and corpus.targetlang "-t",
> "--tagged-corpus", type="string", dest="tagged_stem") //Optional
> for backoff to pos tag "-a", "--align", type="string",
> dest="align_file") //Mandatory alignemtn file "-w",
> "--working-dir", type="string", dest="working_dir") //Output
> directory of the model "-n", "--target-context", type="int",
> dest="n") / "-m", "--source-context", type="int", dest="m") //The
> actual context size is 2*m + 1, this is the number of words on
> both left and right "-s", "--prune-source-vocab", type="int",
> dest="sprune") //cutoff vocabulary threshold "-p",
> "--prune-target-vocab", type="int", dest="tprune") //cutoff
> vocabulary threshold
> Then, use the training script to train the model:
> https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/bilingual-lm/train_nplm.py
>
> Example execution is: train_nplm.py -w de-en-500250source/ -r
> de-en150nopos-source750 -n 16 -d 0
> --nplm-home=/home/abmayne/code/deepathon/nplm_one_layer/ -c
> corpus.1.word -i 750 -o 750
>
> where -i and -o are input and output embeddings
> -n is the total ngram size
> -d is the number of hidden layyers
> -w and -c are the same as the extract_training options
> -r is the output directory of the model
>
> Consult the python script for more detailed description of the options
>
> After you have done that in the output directory you should have a
> trained bilingual Neural Network language model
>
> To run it in moses as a feature function you need the following line:
>
> BilingualNPLM
> filepath=/mnt/gna0/nbogoych/new_nplm_german/de-en150nopos/train.10k.model.nplm.10
> target_ngrams=4 source_ngrams=9
> source_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.source
> target_vocab=/mnt/gna0/nbogoych/new_nplm_german/de-enIWSLTnopos/vocab.targe
>
> The source and target vocab is located in the working directory
> used to prepare the neural network language model.
> target_ngrams doesn't include the predicted word (so target_ngrams
> = 4, would mean 1 word predicted and 4 target context word)
> The total of the model would target_ngrams + source_ngrams + 1)
>
> I will write a proper documentation in the following weeks. If
> you have any problems runnning it, please consult me.
>
> Cheers,
>
> Nick
>
>
>
>
>
> On Wed, Nov 26, 2014 at 11:53 AM, Tom Hoar
> <tahoar@precisiontranslationtools.com
> <mailto:tahoar@precisiontranslationtools.com>> wrote:
>
> Hieu,
>
> Sorry I missed you in Vancouver. I just reviewed your slide
> deck from the MosesCore TAUS Round Table in Vancouver
> (taus-moses-industry-roundtable-2014-changes-in-moses-hieu-hoang-university-of-edinburgh).
>
>
> In particular, I'm interested in the "Bilingual Language
> Models" that "replicate Delvin et al, 2014". A search on
> statmt.org/moses <http://statmt.org/moses> doesn't show any
> hits searching for "delvin". So, A) is the code finished? If
> so B) are there any instructions how to enable/use this
> feature? If not, C) what kind of help do you need to test the
> code for release?
>
> --
>
> Best regards,
> Tom Hoar
> Managing Director
> *Precision Translation Tools Co., Ltd.*
> Bangkok, Thailand
> Web: www.precisiontranslationtools.com
> <http://www.precisiontranslationtools.com>
> Mobile: +66 87 345-1875 <tel:%2B66%2087%20345-1875>
> Skype: tahoar
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141126/4a552f6f/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 97, Issue 83
*********************************************

0 Response to "Moses-support Digest, Vol 97, Issue 83"

Post a Comment