Moses-support Digest, Vol 82, Issue 45

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Stuck when running moses (Jelita Asian)
2. Re: Error when attempting to translate: fails with "
StrayFactorException " (Stefan Dumitrescu)

----------------------------------------------------------------------

Message: 1
Date: Wed, 28 Aug 2013 17:52:48 +0700
From: Jelita Asian <jelitayang@gmail.com>
Subject: [Moses-support] Stuck when running moses
To: moses-support@mit.edu
Message-ID:
<CAOmUaaq7S+XNeRSWHvZ+4CoP_LPCF7LY6R=EGuDBSXSwgyRDxw@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

When I run the moses command, it suddenly stops with this error:

Defined parameters (per moses.ini or switch):
config: bin_id/moses.ini
distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6
bin_id/reordering-
table.Corpus27August2013.for_train.id-en.wbe-msd-bidirectional-fe
distortion-limit: 6
input-factors: 0
lmodel-file: 1 0 4
bin_id/en.LM-Corpus27August2013.for_train-IRSTLM-4-1-
improved-kneser-ney-0-1.blm.mm
mapping: 0 T 0
ttable-file: 1 0 0 5 bin_id/
phrase-table.Corpus27August2013.for_train.id
-en
ttable-limit: 20
weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3
weight-l: 0.8
weight-t: 1 0.2 0.2 0.2 0.2
weight-w: -1
.
ScoreProducer: Distortion start: 0 end: 1
ScoreProducer: WordPenalty start: 1 end: 2
ScoreProducer: !UnknownWordPenalty start: 2 end: 3
Loading lexical distortion models...have 1 models
ScoreProducer: LexicalReordering_wbe-msd-bidirectional-fe-allff start: 3
end: 9
Creating lexical reordering...
weights: 0.300 0.300 0.300 0.300 0.300 0.300
binary file loaded, default OFF_T: -1
Start loading LanguageModel
bin_id/en.LM-Corpus27August2013.for_train-IRSTLM-4-1
-improved-kneser-ney-0-1.blm.mm : [0.322] seconds
In LanguageModelIRST::Load: nGramOrder = 4
Language Model Type of
bin_id/en.LM-Corpus27August2013.for_train-IRSTLM-4-1-impr
oved-kneser-ney-0-1.blm.mm is 1
Language Model Type is 1
blmt
loadbin()
lmtable::loadbin_dict()
dict->size(): 52515
loadbin_level (level 1)
mapping 52515 1-grams
tableOffs 543479 tableGaps19191-grams
done (level 1)
loadbin_level (level 2)
mapping 758553 2-grams
tableOffs 1331204 tableGaps20484-grams
done (level 2)
loadbin_level (level 3)
mapping 1690187 3-grams
tableOffs 12709499 tableGaps61051-grams
done (level 3)
loadbin_level (level 4)
mapping 2146354 4-grams
tableOffs 38062304 tableGaps51424-grams
done (level 4)
done
OOV code is 52514
IRST: m_unknownId=52514
ScoreProducer: LM start: 9 end: 10
Finished loading LanguageModels : [0.630] seconds
Start loading PhraseTable
bin_id/phrase-table.Corpus27August2013.for_train.id-en
: [0.634] seconds
filePath: bin_id/phrase-table.Corpus27August2013.for_train.id-en
ScoreProducer: PhraseModel start: 10 end: 15
Finished loading phrase tables : [0.641] seconds
IO from STDOUT/STDIN
Created input-output object : [0.645] seconds

However, sometime when I run the same code without changing anything, it
seems OK. Why is that?
Thanks.

Best regards

Jelita
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130828/d8636088/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 28 Aug 2013 14:00:21 +0300
From: Stefan Dumitrescu <dumitrescu.stefan@gmail.com>
Subject: Re: [Moses-support] Error when attempting to translate: fails
with " StrayFactorException "
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support@mit.edu
Message-ID: <521DD845.1040406@gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Hieu,

I have a single annotated training corpus from which i will build
several models, single factor and multiple factor. I'm expecting that if
i specify -translate-factors 0-0 i'll get a phrase-table.0-0.gz and if
in a later model i specify -translate-factors 0,1,4-0 i'll get factors 0
1 and 4 in my phrase table, but using the same training data.

I have just trained a new model (to test the above) with the following
script:

/usr/local/trans/tools/moses/scripts/training/train-model.perl \
--corpus /usr/local/trans/corpus/ted/train \
--external-bin-dir=/usr/local/trans/tools/mgiza/bin \
--parallel \
--mgiza \
--mgiza-cpus 8 \
--f ro --e en \
--lm 0:5:/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm:8 \
--root-dir /usr/local/trans/work/ted/m6 \
--max-phrase-length 4 \
--first-step $FIRSTSTEP \
*--translation-factors 0,2-0 \*
--alignment-factors 2-2 \
--alignment grow-diag-final-and \
--reordering-factors 2-2 \
--reordering wbe-msd-bidirectional-fe

It creates a PT like : sdumitrescu /usr/local/trans/work/ted/scripts >
zcat ../m6/model/*phrase-table.0,2-0.gz* | head -2
*!|!^EXCL* *!|!^EXCL* *!|!^EXCL* *"|"^DBLQ* ||| *. "* ||| 0.000169635
2.32345e-08 1 0.262566 ||| 0-0 1-0 2-0 3-1 ||| 5895 1 1
!|!^EXCL !|!^EXCL !|!^EXCL pe|pe^S ||| ! ! ! ||| 0.5 0.0202071 1
0.190109 ||| 0-0 2-0 0-1 1-1 1-2 ||| 2 1 1

So far, so good. moses.ini looks like: (it automatically filled the
input factors, though i am not using factor #1 anywhere)
[input-factors]
*0**
**1**
**2*

# mapping steps
[mapping]
0 T 0

[distortion-limit]
6

# feature functions
[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
*PhraseDictionaryMemory name=TranslationModel0 table-limit=20
num-features=4
path=/usr/local/trans/work/ted/m6/model/phrase-table.0,2-0.gz
input-factor=0,2 output-factor=0*
LexicalReordering name=LexicalReordering0 num-features=6
type=wbe-msd-bidirectional-fe-allff input-factor=2 output-factor=2
path=/usr/local/trans/work/ted/m6/model/reordering-table.2-2.wbe-msd-bidirectional-fe.gz
Distortion
KENLM lazyken=0 name=LM0 factor=0
path=/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm order=5

# dense weights for feature functions
[weight]
UnknownWordPenalty0= 1
WordPenalty0= -1
PhrasePenalty0= 0.2
TranslationModel0= 0.2 0.2 0.2 0.2
LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
Distortion0= 0.3
LM0= 0.5

Input is: (5 factors, same as training data)
sdumitrescu /usr/local/trans/work/ted/scripts > cat
../../../corpus/ted/test.in.ro | head -1
*Robert|robert|robert^NP|NP|Np* Gupta|gupta|gupta^NP|NP|Np
,|,|,^COMMA|COMMA|COMMA violonist|violonist|violonist^NSN|NSN|Nc-s-ny
la|la|la^S|S|Spca Orchestra|orchestra|orchestra^NP|NP|Np
Filarmonica(|filarmonica(|filarmonica(^NP|NP|Np din|din|din^S|S|Spca
Los|los|los^NP|NP|Np Angeles|angelege|angelege^V3|V3|Vmii3p
*

*Error looks like:
Exception: moses/Word.cpp:109 in void
Moses::Word::CreateFromString(Moses::FactorDirection, const
std::vector<long unsigned int>&, const StringPiece&, bool) threw
StrayFactorException because `fit'.
You have configured 3 factors but the word Robert|robert|robert^NP|NP|Np
contains factor delimiter | too many times.

From all the experiments so far i can only deduce one thing, i have to
create as many input files as different model types i build, just to
match the factors in the phrase table? Wasn't the default behavior of
Moses to allow any number of factors in the input file and pick the ones
it needs at each translation/generation/reordering step?

Past year I have tried a significant number of factored models
(different combinations) and i just used a single input file that
contained all the factors, as i am doing now, without any moses
exceptions.. For the example above to work i'm guessing i have to
recreate the test.in.ro file only with factors 0 and 2?

Thanks,
Stefan

On 8/28/2013 11:32 AM, Hieu Hoang wrote:
> Do you want to have multiple factors in your phrase-table?
>
> The training command doesn't specify any factors. The ini file says
> your phrase-table has only 1 factor for both input and output.
> However, your translation rules contain 10 factors!
>
>
> On 28/08/2013 08:59, Stefan Dumitrescu wrote:
>> Hi Hieu,
>>
>> The training and test data is correctly processed, first tokenized
>> (with moses' script), then truecased then annotated).
>>
>> I have trained a surface model on the unannotated (unfactored) data
>> and everything runs smoothly. However, when i am using an annotated
>> corpus (correctly annotated, each token becomes 5 factors) as well as
>> an annotated input, then i get this exception.
>>
>> I tried recompiling moses with -max-factors 10, no change.
>>
>> I played with the -input-factors switch for the decoder, now i am
>> getting this:
>>
>> .... (i cut the first part) ...
>> line=KENLM lazyken=0 name=LM0 factor=0
>> path=/usr/local/trans/corpus/tedlm/en.sur face.5gram.kni.blm order=5
>> FeatureFunction: LM0 start: 14 end: 14
>> Loading table into memory...done.
>> Start loading text SCFG phrase table. Moses format : [63.000] seconds
>> Reading /usr/local/trans/work/ted/m4/model/phrase-table.0-0.gz
>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85-
>> --90---95--100
>> **************************************************************************************
>> **************
>> Exception: bitset::set
>>
>> It is a bit frustrating because i have used factored models several
>> times in the past year without any issues..
>>
>> For my model m1, i did not specify any -translation-factors in the
>> training phase and i got a phrase-table.gz which contained the five
>> factors together as in :
>>
>> sdumitrescu /usr/local/trans/work/ted/scripts > zcat
>> ../m1/model/phrase-table.gz | head -2
>> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
>> pe|pe|pe^S|S|Spca ||| !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
>> !|!|!^EXCL|EXCL|EXCL ||| 0.5 0.018702 1 0.18952 ||| 0-0 1-0 0-1 2-2
>> ||| 2 1 1
>> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL |||
>> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
>> "|"|"^DBLQ|DBLQ|DBLQ ||| 1 0.291644 0.5 0.000225623 ||| 0-0 1-1 2-1
>> 1-2 2-3 ||| 1 2 1
>>
>> For model m3 for example, trained with:
>>
>> ...(cut)...
>> --root-dir /usr/local/trans/work/ted/m3 \
>> --max-phrase-length 4 \
>> --first-step $FIRSTSTEP \
>> --alignment-factors 2-2 \
>> --alignment grow-diag-final-and \
>> --reordering-factors 2-2 \
>> --reordering wbe-msd-bidirectional-fe
>>
>> i'm getting a phrase-table.0-0.gz:
>>
>> sdumitrescu /usr/local/trans/work/ted/scripts > zcat
>> ../m3/model/phrase-table.0-0.gz | head -2
>> ! ! ! " ||| . " ||| 0.000169635 2.32345e-08 1 0.262566 ||| 0-0 1-0
>> 2-0 3-1 ||| 5895 1 1
>> ! ! ! pe ||| ! ! ! ||| 0.5 0.0202114 1 0.190109 ||| 0-0 2-0 0-1 1-1
>> 1-2 ||| 2 1 1
>>
>> Either one does not work with an annotated input file like:
>>
>> *I*|i|i^NN|NN|Nc *actually*|actually|actually^ADVE|ADVE|Rmp
>> *am*|be|be^VERB1|VERB1|Vmip1s *.*|.|.^PERIOD|PERIOD|PERIOD
>>
>> .. i'm getting the strayfactor exception when not specifying any
>> -input-factors (default 0), or exception: bitset::set when setting
>> anything else.
>>
>> Thanks for your help,
>> Stefan
>>
>> On 8/27/2013 6:07 PM, Hieu Hoang wrote:
>>> did you escape your training and input data? There must not be |
>>> characters in your data unless you are using factored models
>>>
>>> the moses tokenizer script does it, as well as the specific escape script.
>>> scripts/tokenizer/tokenizer.perl
>>> scripts/tokenizer/escape-special-chars.perl
>>>
>>> On 26/08/2013 15:23, Stefan Dumitrescu wrote:
>>>> Hi!
>>>>
>>>> I have the following error when attempting to translate:
>>>>
>>>> Exception: moses/Word.cpp:109 in void
>>>> Moses::Word::CreateFromString(Moses::FactorDirection, const
>>>> std::vector<long unsigned int>&, const StringPiece&, bool) threw
>>>> StrayFactorException because `fit'.
>>>> You have configured 1 factors but the word !|!|!^EXCL|EXCL|EXCL contains
>>>> factor delimiter | too many times.
>>>>
>>>> I have the following training script:
>>>>
>>>> /usr/local/trans/tools/moses/scripts/training/train-model.perl \
>>>> --corpus /usr/local/trans/corpus/ted/train \
>>>> --external-bin-dir=/usr/local/trans/tools/mgiza/bin \
>>>> --parallel \
>>>> --mgiza \
>>>> --mgiza-cpus 8 \
>>>> --f ro --e en \
>>>> --lm 0:5:/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm:8 \
>>>> --root-dir /usr/local/trans/work/ted/m1 \
>>>> --max-phrase-length 4 \
>>>> --first-step $FIRSTSTEP \
>>>> --translation-factors 0-0 \
>>>> --alignment grow-diag-final-and \
>>>> --reordering wbe-msd-bidirectional-fe
>>>>
>>>> The train files are factored (5 factors: word, lemma, lemma^postag1,
>>>> postag1, postag2). The training process works without any errors, it
>>>> generates a valid phrase-table that looks like:
>>>>
>>>> zcat ../m1/model/phrase-table.gz | head -1
>>>> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
>>>> pe|pe|pe^S|S|Spca ||| !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
>>>> !|!|!^EXCL|EXCL|EXCL ||| 0.5 0.0187005 1 0.18952 ||| 0-0 1-0 0-1 2-2 |||
>>>> 2 1 1
>>>>
>>>> I did not get this error a couple of months ago when working on another
>>>> experiment. I'm guessing something changed in Moses and I am missing
>>>> some required flag in my scripts? I am using scripts that have worked ok
>>>> so far.
>>>> I looked through the manual, and I tried using the -input-factors
>>>> option, but i still receive the same error. What am I doing wrong? It is
>>>> something trivial most likely, but I do appreciate your help with it.
>>>>
>>>> Thank you,
>>>> Stefan
>>>>
>>>> (moses.ini below:)
>>>> #########################
>>>> ### MOSES CONFIG FILE ###
>>>> #########################
>>>>
>>>> # input factors
>>>> [input-factors]
>>>> 0
>>>>
>>>> # mapping steps
>>>> [mapping]
>>>> 0 T 0
>>>>
>>>> [distortion-limit]
>>>> 6
>>>>
>>>> # feature functions
>>>> [feature]
>>>> UnknownWordPenalty
>>>> WordPenalty
>>>> PhrasePenalty
>>>> PhraseDictionaryMemory name=TranslationModel0 table-limit=20
>>>> num-features=4 path=/usr/local/trans/work/ted/m1/model/phrase-table.gz
>>>> input-factor=0 output-factor=0
>>>> LexicalReordering name=LexicalReordering0 num-features=6
>>>> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
>>>> path=/usr/local/trans/work/ted/m1/model/reordering-table.wbe-msd-bidirectional-fe.gz
>>>> Distortion
>>>> KENLM lazyken=0 name=LM0 factor=0
>>>> path=/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm order=5
>>>>
>>>> # dense weights for feature functions
>>>> [weight]
>>>> UnknownWordPenalty0= 1
>>>> WordPenalty0= -1
>>>> PhrasePenalty0= 0.2
>>>> TranslationModel0= 0.2 0.2 0.2 0.2
>>>> LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
>>>> Distortion0= 0.3
>>>> LM0= 0.5
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130828/ca2c9ea2/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 82, Issue 45
*********************************************

Moses-support Digest, Vol 82, Issue 45

0 Response to "Moses-support Digest, Vol 82, Issue 45"

Post a Comment