Moses-support Digest, Vol 82, Issue 46

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Error when attempting to translate: fails with "
StrayFactorException " (Hieu Hoang)

----------------------------------------------------------------------

Message: 1
Date: Wed, 28 Aug 2013 12:12:01 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Error when attempting to translate: fails
with " StrayFactorException "
To: Stefan Dumitrescu <dumitrescu.stefan@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbhRw0rnHXVDPvrmLkPC9rq5pdvdvGexhFE0qzSU+AS-wg@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-2"

On 28 August 2013 12:00, Stefan Dumitrescu <dumitrescu.stefan@gmail.com>wrote:

> Hi Hieu,
>
> I have a single annotated training corpus from which i will build several
> models, single factor and multiple factor. I'm expecting that if i specify
> -translate-factors 0-0 i'll get a phrase-table.0-0.gz and if in a later
> model i specify -translate-factors 0,1,4-0 i'll get factors 0 1 and 4 in my
> phrase table, but using the same training data.
>

yes, having a single annotated training corpus is a good idea and it should
work. However, there might be a bug in the script or human error in how it
was run.

if the phrase-table entry is
PhraseDictionaryMemory input-factor=0 output-factor=0 ...
then the phrase-table can only have 1 factor in both input & output. If it
has more than 1, then something went wrong and you have to work backwards
to debug it

>
> I have just trained a new model (to test the above) with the following
> script:
>
> /usr/local/trans/tools/moses/scripts/training/train-model.perl \
> --corpus /usr/local/trans/corpus/ted/train \
> --external-bin-dir=/usr/local/trans/tools/mgiza/bin \
> --parallel \
> --mgiza \
> --mgiza-cpus 8 \
> --f ro --e en \
> --lm 0:5:/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm:8 \
> --root-dir /usr/local/trans/work/ted/m6 \
> --max-phrase-length 4 \
> --first-step $FIRSTSTEP \
> *--translation-factors 0,2-0 \*
>
> --alignment-factors 2-2 \
> --alignment grow-diag-final-and \
> --reordering-factors 2-2 \
> --reordering wbe-msd-bidirectional-fe
>
> It creates a PT like : sdumitrescu /usr/local/trans/work/ted/scripts >
> zcat ../m6/model/*phrase-table.0,2-0.gz* | head -2
> *!|!^EXCL* *!|!^EXCL* *!|!^EXCL* *"|"^DBLQ* ||| *. "* ||| 0.000169635
> 2.32345e-08 1 0.262566 ||| 0-0 1-0 2-0 3-1 ||| 5895 1 1
> !|!^EXCL !|!^EXCL !|!^EXCL pe|pe^S ||| ! ! ! ||| 0.5 0.0202071 1 0.190109
> ||| 0-0 2-0 0-1 1-1 1-2 ||| 2 1 1
>
> So far, so good. moses.ini looks like: (it automatically filled the input
> factors, though i am not using factor #1 anywhere)
> [input-factors]
> *0*
> *
> **1**
> **2*
>
> # mapping steps
> [mapping]
> 0 T 0
>
> [distortion-limit]
> 6
>
> # feature functions
> [feature]
> UnknownWordPenalty
> WordPenalty
> PhrasePenalty
> *PhraseDictionaryMemory name=TranslationModel0 table-limit=20
> num-features=4
> path=/usr/local/trans/work/ted/m6/model/phrase-table.0,2-0.gz
> input-factor=0,2 output-factor=0*
> LexicalReordering name=LexicalReordering0 num-features=6
> type=wbe-msd-bidirectional-fe-allff input-factor=2 output-factor=2
> path=/usr/local/trans/work/ted/m6/model/reordering-table.2-2.wbe-msd-bidirectional-fe.gz
>
> Distortion
> KENLM lazyken=0 name=LM0 factor=0
> path=/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm order=5
>
> # dense weights for feature functions
> [weight]
> UnknownWordPenalty0= 1
> WordPenalty0= -1
> PhrasePenalty0= 0.2
> TranslationModel0= 0.2 0.2 0.2 0.2
> LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
> Distortion0= 0.3
> LM0= 0.5
>
>
>
> Input is: (5 factors, same as training data)
> sdumitrescu /usr/local/trans/work/ted/scripts > cat ../../../corpus/ted/
> test.in.ro | head -1
> *Robert|robert|robert^NP|NP|Np* Gupta|gupta|gupta^NP|NP|Np
> ,|,|,^COMMA|COMMA|COMMA violonist|violonist|violonist^NSN|NSN|Nc-s-ny
> la|la|la^S|S|Spca Orchestra|orchestra|orchestra^NP|NP|Np
> Filarmonic?|filarmonic?|filarmonic?^NP|NP|Np din|din|din^S|S|Spca
> Los|los|los^NP|NP|Np Angeles|angelege|angelege^V3|V3|Vmii3p
> *
>
> *Error looks like:
>
> Exception: moses/Word.cpp:109 in void
> Moses::Word::CreateFromString(Moses::FactorDirection, const
> std::vector<long unsigned int>&, const StringPiece&, bool) threw
> StrayFactorException because `fit'.
> You have configured 3 factors but the word Robert|robert|robert^NP|NP|Np
> contains factor delimiter | too many times.
>
> From all the experiments so far i can only deduce one thing, i have to
> create as many input files as different model types i build, just to match
> the factors in the phrase table? Wasn't the default behavior of Moses to
> allow any number of factors in the input file and pick the ones it needs at
> each translation/generation/reordering step?
>
> Past year I have tried a significant number of factored models (different
> combinations) and i just used a single input file that contained all the
> factors, as i am doing now, without any moses exceptions.. For the example
> above to work i'm guessing i have to recreate the test.in.ro file only
> with factors 0 and 2?
>
> Thanks,
> Stefan
>
>
> On 8/28/2013 11:32 AM, Hieu Hoang wrote:
>
> Do you want to have multiple factors in your phrase-table?
>
> The training command doesn't specify any factors. The ini file says your
> phrase-table has only 1 factor for both input and output. However, your
> translation rules contain 10 factors!
>
>
> On 28/08/2013 08:59, Stefan Dumitrescu wrote:
>
> Hi Hieu,
>
> The training and test data is correctly processed, first tokenized (with
> moses' script), then truecased then annotated).
>
> I have trained a surface model on the unannotated (unfactored) data and
> everything runs smoothly. However, when i am using an annotated corpus
> (correctly annotated, each token becomes 5 factors) as well as an annotated
> input, then i get this exception.
>
> I tried recompiling moses with -max-factors 10, no change.
>
> I played with the -input-factors switch for the decoder, now i am getting
> this:
>
> .... (i cut the first part) ...
> line=KENLM lazyken=0 name=LM0 factor=0
> path=/usr/local/trans/corpus/tedlm/en.sur
> face.5gram.kni.blm order=5
> FeatureFunction: LM0 start: 14 end: 14
> Loading table into memory...done.
> Start loading text SCFG phrase table. Moses format : [63.000] seconds
> Reading /usr/local/trans/work/ted/m4/model/phrase-table.0-0.gz
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85-
> --90---95--100
> **************************************************************************************
> **************
> Exception: bitset::set
>
> It is a bit frustrating because i have used factored models several times
> in the past year without any issues..
>
> For my model m1, i did not specify any -translation-factors in the
> training phase and i got a phrase-table.gz which contained the five factors
> together as in :
>
> sdumitrescu /usr/local/trans/work/ted/scripts > zcat
> ../m1/model/phrase-table.gz | head -2
> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
> pe|pe|pe^S|S|Spca ||| !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
> !|!|!^EXCL|EXCL|EXCL ||| 0.5 0.018702 1 0.18952 ||| 0-0 1-0 0-1 2-2 ||| 2
> 1 1
> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL |||
> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
> "|"|"^DBLQ|DBLQ|DBLQ ||| 1 0.291644 0.5 0.000225623 ||| 0-0 1-1 2-1 1-2 2-3
> ||| 1 2 1
>
> For model m3 for example, trained with:
>
> ...(cut)...
> --root-dir /usr/local/trans/work/ted/m3 \
> --max-phrase-length 4 \
> --first-step $FIRSTSTEP \
> --alignment-factors 2-2 \
> --alignment grow-diag-final-and \
> --reordering-factors 2-2 \
> --reordering wbe-msd-bidirectional-fe
>
> i'm getting a phrase-table.0-0.gz:
>
> sdumitrescu /usr/local/trans/work/ted/scripts > zcat
> ../m3/model/phrase-table.0-0.gz | head -2
> ! ! ! " ||| . " ||| 0.000169635 2.32345e-08 1 0.262566 ||| 0-0 1-0 2-0 3-1
> ||| 5895 1 1
> ! ! ! pe ||| ! ! ! ||| 0.5 0.0202114 1 0.190109 ||| 0-0 2-0 0-1 1-1 1-2
> ||| 2 1 1
>
> Either one does not work with an annotated input file like:
>
> *I*|i|i^NN|NN|Nc *actually*|actually|actually^ADVE|ADVE|Rmp *am*|be|be^VERB1|VERB1|Vmip1s
> *.*|.|.^PERIOD|PERIOD|PERIOD
>
> .. i'm getting the strayfactor exception when not specifying any
> -input-factors (default 0), or exception: bitset::set when setting anything
> else.
>
> Thanks for your help,
> Stefan
>
> On 8/27/2013 6:07 PM, Hieu Hoang wrote:
>
> did you escape your training and input data? There must not be |
> characters in your data unless you are using factored models
>
> the moses tokenizer script does it, as well as the specific escape script.
> scripts/tokenizer/tokenizer.perl
> scripts/tokenizer/escape-special-chars.perl
>
> On 26/08/2013 15:23, Stefan Dumitrescu wrote:
>
> Hi!
>
> I have the following error when attempting to translate:
>
> Exception: moses/Word.cpp:109 in void
> Moses::Word::CreateFromString(Moses::FactorDirection, const
> std::vector<long unsigned int>&, const StringPiece&, bool) threw
> StrayFactorException because `fit'.
> You have configured 1 factors but the word !|!|!^EXCL|EXCL|EXCL contains
> factor delimiter | too many times.
>
> I have the following training script:
>
> /usr/local/trans/tools/moses/scripts/training/train-model.perl \
> --corpus /usr/local/trans/corpus/ted/train \
> --external-bin-dir=/usr/local/trans/tools/mgiza/bin \
> --parallel \
> --mgiza \
> --mgiza-cpus 8 \
> --f ro --e en \
> --lm 0:5:/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm:8 \
> --root-dir /usr/local/trans/work/ted/m1 \
> --max-phrase-length 4 \
> --first-step $FIRSTSTEP \
> --translation-factors 0-0 \
> --alignment grow-diag-final-and \
> --reordering wbe-msd-bidirectional-fe
>
> The train files are factored (5 factors: word, lemma, lemma^postag1,
> postag1, postag2). The training process works without any errors, it
> generates a valid phrase-table that looks like:
>
> zcat ../m1/model/phrase-table.gz | head -1
> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
> pe|pe|pe^S|S|Spca ||| !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
> !|!|!^EXCL|EXCL|EXCL ||| 0.5 0.0187005 1 0.18952 ||| 0-0 1-0 0-1 2-2 |||
> 2 1 1
>
> I did not get this error a couple of months ago when working on another
> experiment. I'm guessing something changed in Moses and I am missing
> some required flag in my scripts? I am using scripts that have worked ok
> so far.
> I looked through the manual, and I tried using the -input-factors
> option, but i still receive the same error. What am I doing wrong? It is
> something trivial most likely, but I do appreciate your help with it.
>
> Thank you,
> Stefan
>
> (moses.ini below:)
> #########################
> ### MOSES CONFIG FILE ###
> #########################
>
> # input factors
> [input-factors]
> 0
>
> # mapping steps
> [mapping]
> 0 T 0
>
> [distortion-limit]
> 6
>
> # feature functions
> [feature]
> UnknownWordPenalty
> WordPenalty
> PhrasePenalty
> PhraseDictionaryMemory name=TranslationModel0 table-limit=20
> num-features=4 path=/usr/local/trans/work/ted/m1/model/phrase-table.gz
> input-factor=0 output-factor=0
> LexicalReordering name=LexicalReordering0 num-features=6
> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
> path=/usr/local/trans/work/ted/m1/model/reordering-table.wbe-msd-bidirectional-fe.gz
> Distortion
> KENLM lazyken=0 name=LM0 factor=0
> path=/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm order=5
>
> # dense weights for feature functions
> [weight]
> UnknownWordPenalty0= 1
> WordPenalty0= -1
> PhrasePenalty0= 0.2
> TranslationModel0= 0.2 0.2 0.2 0.2
> LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
> Distortion0= 0.3
> LM0= 0.5
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
>

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130828/7c84e603/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 82, Issue 46
*********************************************

Moses-support Digest, Vol 82, Issue 46

0 Response to "Moses-support Digest, Vol 82, Issue 46"

Post a Comment