Moses-support Digest, Vol 82, Issue 42

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Decoding with word lattice (Hieu Hoang)
2. Re: Error when attempting to translate: fails with "
StrayFactorException " (Hieu Hoang)
3. Error with processPhaseTableMin (Jo?o Gra?a)
4. Fwd: Re: Error with processPhaseTableMin (Marcin Junczys-Dowmunt)

----------------------------------------------------------------------

Message: 1
Date: Tue, 27 Aug 2013 16:03:01 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Decoding with word lattice
To: moses-support@mit.edu
Message-ID: <521CBFA5.2070600@gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

you mean your lattice only has 1 path? ie. it is a sentence encoded as a
lattice?

Did you retune your model?

I am surprise, but no-one has done that experiment. if you can make the
model files available, i can debug it and let you know why there is a
difference

On 26/08/2013 14:42, Wei Qiu wrote:
> Hi,
>
> I'm now trying to use word lattice decoding.
> I observed performance drop (around 5% in BLEU) by simply
> concatenating tokens in the source language into a linear *word
> lattice*. I thought it would give almost identical results.
>
> Since I'm using this technique for some artificial language pairs, I'm
> wondering is it normal? Do you have also this problem for natural
> language pairs?
>
> If possible, how can I improve the word lattice decoding result?
>
> Thanks in advance!
>
> Best,
> Wei Qiu
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130827/33186eef/attachment-0001.htm

------------------------------

Message: 2
Date: Tue, 27 Aug 2013 16:07:19 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Error when attempting to translate: fails
with " StrayFactorException "
To: moses-support@mit.edu
Message-ID: <521CC0A7.2010601@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

did you escape your training and input data? There must not be |
characters in your data unless you are using factored models

the moses tokenizer script does it, as well as the specific escape script.
scripts/tokenizer/tokenizer.perl
scripts/tokenizer/escape-special-chars.perl

On 26/08/2013 15:23, Stefan Dumitrescu wrote:
> Hi!
>
> I have the following error when attempting to translate:
>
> Exception: moses/Word.cpp:109 in void
> Moses::Word::CreateFromString(Moses::FactorDirection, const
> std::vector<long unsigned int>&, const StringPiece&, bool) threw
> StrayFactorException because `fit'.
> You have configured 1 factors but the word !|!|!^EXCL|EXCL|EXCL contains
> factor delimiter | too many times.
>
> I have the following training script:
>
> /usr/local/trans/tools/moses/scripts/training/train-model.perl \
> --corpus /usr/local/trans/corpus/ted/train \
> --external-bin-dir=/usr/local/trans/tools/mgiza/bin \
> --parallel \
> --mgiza \
> --mgiza-cpus 8 \
> --f ro --e en \
> --lm 0:5:/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm:8 \
> --root-dir /usr/local/trans/work/ted/m1 \
> --max-phrase-length 4 \
> --first-step $FIRSTSTEP \
> --translation-factors 0-0 \
> --alignment grow-diag-final-and \
> --reordering wbe-msd-bidirectional-fe
>
> The train files are factored (5 factors: word, lemma, lemma^postag1,
> postag1, postag2). The training process works without any errors, it
> generates a valid phrase-table that looks like:
>
> zcat ../m1/model/phrase-table.gz | head -1
> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
> pe|pe|pe^S|S|Spca ||| !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
> !|!|!^EXCL|EXCL|EXCL ||| 0.5 0.0187005 1 0.18952 ||| 0-0 1-0 0-1 2-2 |||
> 2 1 1
>
> I did not get this error a couple of months ago when working on another
> experiment. I'm guessing something changed in Moses and I am missing
> some required flag in my scripts? I am using scripts that have worked ok
> so far.
> I looked through the manual, and I tried using the -input-factors
> option, but i still receive the same error. What am I doing wrong? It is
> something trivial most likely, but I do appreciate your help with it.
>
> Thank you,
> Stefan
>
> (moses.ini below:)
> #########################
> ### MOSES CONFIG FILE ###
> #########################
>
> # input factors
> [input-factors]
> 0
>
> # mapping steps
> [mapping]
> 0 T 0
>
> [distortion-limit]
> 6
>
> # feature functions
> [feature]
> UnknownWordPenalty
> WordPenalty
> PhrasePenalty
> PhraseDictionaryMemory name=TranslationModel0 table-limit=20
> num-features=4 path=/usr/local/trans/work/ted/m1/model/phrase-table.gz
> input-factor=0 output-factor=0
> LexicalReordering name=LexicalReordering0 num-features=6
> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
> path=/usr/local/trans/work/ted/m1/model/reordering-table.wbe-msd-bidirectional-fe.gz
> Distortion
> KENLM lazyken=0 name=LM0 factor=0
> path=/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm order=5
>
> # dense weights for feature functions
> [weight]
> UnknownWordPenalty0= 1
> WordPenalty0= -1
> PhrasePenalty0= 0.2
> TranslationModel0= 0.2 0.2 0.2 0.2
> LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
> Distortion0= 0.3
> LM0= 0.5
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

------------------------------

Message: 3
Date: Tue, 27 Aug 2013 16:48:07 +0100
From: Jo?o Gra?a <gracaninja@gmail.com>
Subject: [Moses-support] Error with processPhaseTableMin
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAGfH6a4A5Mnjzc8EcLsSmuci+-PRwB+nzsCFa9SZuZbp+5qqbA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,

I am trying to create a compact version of the phrase table based on the
pre-trained models of release 1.

I get the following error when I run the processPhraseTableMin on a ubuntu
Vagrant machine.

Thanks for your help,

Jo?o

vagrant@precise64:/vagrant/mt-models/en-es$
~/mosesdecoder/bin/processPhraseTableMin -in phrase-table.1.gz -out
phrase-table -nscores 5 -threads 2
Used options:
Text phrase table will be read from: phrase-table.1.gz
Output phrase table will be written to: phrase-table.minphr
Step size for source landmark phrases: 2^10=1024
Source phrase fingerprint size: 16 bits / P(fp)=1.52588e-05
Selected target phrase encoding: Huffman + PREnc
Maxiumum allowed rank for PREnc: 100
Number of score components in phrase table: 5
Single Huffman code set for score components: no
Using score quantization: no
Explicitly included alignment information: yes
Running with 2 threads

Pass 1/3: Creating hash function for rank assignment
..................................................[5000000]
..................................................[10000000]
..................................................[15000000]
..................................................[20000000]
..................................................[25000000]
..................................................[30000000]
..................................................[35000000]
..................................................[40000000]
..................................................[45000000]
..................................................[50000000]
..................................................[55000000]
......terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130827/7a896108/attachment-0001.htm

------------------------------

Message: 4
Date: Tue, 27 Aug 2013 18:23:43 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: [Moses-support] Fwd: Re: Error with processPhaseTableMin
To: moses-support <moses-support@mit.edu>
Message-ID: <521CD28F.5000303@amu.edu.pl>
Content-Type: text/plain; charset="iso-8859-1"

Hi,
Might be two reasons:

1) are you running this on a 32-bit machine or Cygwin? Then the maximum
size phrase table that can be built is about 3GB, which can be hit quite
quickly.
2) do you have only little free space in your /tmp directory? You can
change the directory used for temporary files with the "-T path" option.

Best,
Marcin

W dniu 27.08.2013 17:48, Jo?o Gra?a pisze:
> Hello,
>
> I am trying to create a compact version of the phrase table based on
> the pre-trained models of release 1.
>
> I get the following error when I run the processPhraseTableMin on a
> ubuntu Vagrant machine.
>
> Thanks for your help,
>
> Jo?o
>
> vagrant@precise64:/vagrant/mt-models/en-es$
> ~/mosesdecoder/bin/processPhraseTableMin -in phrase-table.1.gz -out
> phrase-table -nscores 5 -threads 2
> Used options:
> Text phrase table will be read from: phrase-table.1.gz
> Output phrase table will be written to: phrase-table.minphr
> Step size for source landmark phrases: 2^10=1024
> Source phrase fingerprint size: 16 bits / P(fp)=1.52588e-05
> Selected target phrase encoding: Huffman + PREnc
> Maxiumum allowed rank for PREnc: 100
> Number of score components in phrase table: 5
> Single Huffman code set for score components: no
> Using score quantization: no
> Explicitly included alignment information: yes
> Running with 2 threads
>
> Pass 1/3: Creating hash function for rank assignment
> ..................................................[5000000]
> ..................................................[10000000]
> ..................................................[15000000]
> ..................................................[20000000]
> ..................................................[25000000]
> ..................................................[30000000]
> ..................................................[35000000]
> ..................................................[40000000]
> ..................................................[45000000]
> ..................................................[50000000]
> ..................................................[55000000]
> ......terminate called after throwing an instance of 'std::bad_alloc'
> what(): std::bad_alloc
> Aborted
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130827/dc169850/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 82, Issue 42
*********************************************

Moses-support Digest, Vol 82, Issue 42

0 Response to "Moses-support Digest, Vol 82, Issue 42"

Post a Comment