Moses-support Digest, Vol 82, Issue 43

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. New parameter "const std::string &line" in FeatureFunction
(Lane Schwartz)
2. Experimenter: giving an aligned corpus (Hassan Sajjad)
3. Re: Error when attempting to translate: fails with "
StrayFactorException " (Stefan Dumitrescu)
4. Re: New parameter "const std::string &line" in
FeatureFunction (Hieu Hoang)

----------------------------------------------------------------------

Message: 1
Date: Tue, 27 Aug 2013 14:22:31 -0400
From: Lane Schwartz <dowobeha@gmail.com>
Subject: [Moses-support] New parameter "const std::string &line" in
FeatureFunction
To: "moses-support@mit.edu" <moses-support@MIT.EDU>
Message-ID:
<CABv3vZ=7FLnB52FLW3u2sYAy5nkuO14ZQh=o=gAfW2mkoO43XQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I'm doing some LM tinkering, and I noticed that there's a new parameter in
FeatureFunction and its descendents.

What is the following parameter represent?

"const std::string &line"

Thanks,
Lane
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130827/57462e3f/attachment-0001.htm

------------------------------

Message: 2
Date: Tue, 27 Aug 2013 22:41:36 +0300
From: Hassan Sajjad <sajjad@ims.uni-stuttgart.de>
Subject: [Moses-support] Experimenter: giving an aligned corpus
To: Moses-support@mit.edu
Message-ID:
<CAOiX71YMnTDx515NKwgsdoeruS-3oSXSzZ1_YiOQM5JRq+_UUA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

I am using experimenter to run Moses. In the configuration file, I provide
word alignments by specifying it as"word-alignment =" in TRAINING. I could
not find a way to specify cleaned corpus. I am using the latest branch of
Moses. In the previous one, I used to specify cleaned corpus in the
TRAINING section as "corpus = " and every thing worked fine.
Now, I tried to specify it in TRAINING and it complained to specify it in
GENERAL. When I specify it in GENERAL, it ignores it and takes the raw
corpus from CORPUS and pre-process it.

Is there a different format to mention it in GENERAL? I use "corpus ="

--
Regards;
Hassan Sajjad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130827/e4108cb3/attachment-0001.htm

------------------------------

Message: 3
Date: Wed, 28 Aug 2013 10:59:51 +0300
From: Stefan Dumitrescu <dumitrescu.stefan@gmail.com>
Subject: Re: [Moses-support] Error when attempting to translate: fails
with " StrayFactorException "
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support@mit.edu
Message-ID: <521DADF7.7040107@gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Hieu,

The training and test data is correctly processed, first tokenized (with
moses' script), then truecased then annotated).

I have trained a surface model on the unannotated (unfactored) data and
everything runs smoothly. However, when i am using an annotated corpus
(correctly annotated, each token becomes 5 factors) as well as an
annotated input, then i get this exception.

I tried recompiling moses with -max-factors 10, no change.

I played with the -input-factors switch for the decoder, now i am
getting this:

.... (i cut the first part) ...
line=KENLM lazyken=0 name=LM0 factor=0
path=/usr/local/trans/corpus/tedlm/en.sur face.5gram.kni.blm order=5
FeatureFunction: LM0 start: 14 end: 14
Loading table into memory...done.
Start loading text SCFG phrase table. Moses format : [63.000] seconds
Reading /usr/local/trans/work/ted/m4/model/phrase-table.0-0.gz
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85-
--90---95--100
**************************************************************************************
**************
Exception: bitset::set

It is a bit frustrating because i have used factored models several
times in the past year without any issues..

For my model m1, i did not specify any -translation-factors in the
training phase and i got a phrase-table.gz which contained the five
factors together as in :

sdumitrescu /usr/local/trans/work/ted/scripts > zcat
../m1/model/phrase-table.gz | head -2
!|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
pe|pe|pe^S|S|Spca ||| !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
!|!|!^EXCL|EXCL|EXCL ||| 0.5 0.018702 1 0.18952 ||| 0-0 1-0 0-1 2-2 |||
2 1 1
!|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL |||
!|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
"|"|"^DBLQ|DBLQ|DBLQ ||| 1 0.291644 0.5 0.000225623 ||| 0-0 1-1 2-1 1-2
2-3 ||| 1 2 1

For model m3 for example, trained with:

...(cut)...
--root-dir /usr/local/trans/work/ted/m3 \
--max-phrase-length 4 \
--first-step $FIRSTSTEP \
--alignment-factors 2-2 \
--alignment grow-diag-final-and \
--reordering-factors 2-2 \
--reordering wbe-msd-bidirectional-fe

i'm getting a phrase-table.0-0.gz:

sdumitrescu /usr/local/trans/work/ted/scripts > zcat
../m3/model/phrase-table.0-0.gz | head -2
! ! ! " ||| . " ||| 0.000169635 2.32345e-08 1 0.262566 ||| 0-0 1-0 2-0
3-1 ||| 5895 1 1
! ! ! pe ||| ! ! ! ||| 0.5 0.0202114 1 0.190109 ||| 0-0 2-0 0-1 1-1 1-2
||| 2 1 1

Either one does not work with an annotated input file like:

*I*|i|i^NN|NN|Nc *actually*|actually|actually^ADVE|ADVE|Rmp
*am*|be|be^VERB1|VERB1|Vmip1s *.*|.|.^PERIOD|PERIOD|PERIOD

.. i'm getting the strayfactor exception when not specifying any
-input-factors (default 0), or exception: bitset::set when setting
anything else.

Thanks for your help,
Stefan

On 8/27/2013 6:07 PM, Hieu Hoang wrote:
> did you escape your training and input data? There must not be |
> characters in your data unless you are using factored models
>
> the moses tokenizer script does it, as well as the specific escape script.
> scripts/tokenizer/tokenizer.perl
> scripts/tokenizer/escape-special-chars.perl
>
> On 26/08/2013 15:23, Stefan Dumitrescu wrote:
>> Hi!
>>
>> I have the following error when attempting to translate:
>>
>> Exception: moses/Word.cpp:109 in void
>> Moses::Word::CreateFromString(Moses::FactorDirection, const
>> std::vector<long unsigned int>&, const StringPiece&, bool) threw
>> StrayFactorException because `fit'.
>> You have configured 1 factors but the word !|!|!^EXCL|EXCL|EXCL contains
>> factor delimiter | too many times.
>>
>> I have the following training script:
>>
>> /usr/local/trans/tools/moses/scripts/training/train-model.perl \
>> --corpus /usr/local/trans/corpus/ted/train \
>> --external-bin-dir=/usr/local/trans/tools/mgiza/bin \
>> --parallel \
>> --mgiza \
>> --mgiza-cpus 8 \
>> --f ro --e en \
>> --lm 0:5:/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm:8 \
>> --root-dir /usr/local/trans/work/ted/m1 \
>> --max-phrase-length 4 \
>> --first-step $FIRSTSTEP \
>> --translation-factors 0-0 \
>> --alignment grow-diag-final-and \
>> --reordering wbe-msd-bidirectional-fe
>>
>> The train files are factored (5 factors: word, lemma, lemma^postag1,
>> postag1, postag2). The training process works without any errors, it
>> generates a valid phrase-table that looks like:
>>
>> zcat ../m1/model/phrase-table.gz | head -1
>> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
>> pe|pe|pe^S|S|Spca ||| !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
>> !|!|!^EXCL|EXCL|EXCL ||| 0.5 0.0187005 1 0.18952 ||| 0-0 1-0 0-1 2-2 |||
>> 2 1 1
>>
>> I did not get this error a couple of months ago when working on another
>> experiment. I'm guessing something changed in Moses and I am missing
>> some required flag in my scripts? I am using scripts that have worked ok
>> so far.
>> I looked through the manual, and I tried using the -input-factors
>> option, but i still receive the same error. What am I doing wrong? It is
>> something trivial most likely, but I do appreciate your help with it.
>>
>> Thank you,
>> Stefan
>>
>> (moses.ini below:)
>> #########################
>> ### MOSES CONFIG FILE ###
>> #########################
>>
>> # input factors
>> [input-factors]
>> 0
>>
>> # mapping steps
>> [mapping]
>> 0 T 0
>>
>> [distortion-limit]
>> 6
>>
>> # feature functions
>> [feature]
>> UnknownWordPenalty
>> WordPenalty
>> PhrasePenalty
>> PhraseDictionaryMemory name=TranslationModel0 table-limit=20
>> num-features=4 path=/usr/local/trans/work/ted/m1/model/phrase-table.gz
>> input-factor=0 output-factor=0
>> LexicalReordering name=LexicalReordering0 num-features=6
>> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
>> path=/usr/local/trans/work/ted/m1/model/reordering-table.wbe-msd-bidirectional-fe.gz
>> Distortion
>> KENLM lazyken=0 name=LM0 factor=0
>> path=/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm order=5
>>
>> # dense weights for feature functions
>> [weight]
>> UnknownWordPenalty0= 1
>> WordPenalty0= -1
>> PhrasePenalty0= 0.2
>> TranslationModel0= 0.2 0.2 0.2 0.2
>> LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
>> Distortion0= 0.3
>> LM0= 0.5
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130828/d46dd382/attachment-0001.htm

------------------------------

Message: 4
Date: Wed, 28 Aug 2013 09:27:35 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] New parameter "const std::string &line"
in FeatureFunction
To: Lane Schwartz <dowobeha@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAEKMkbgwDBeOKTWcX+XWokEY3k3tJ6UV4KgWuT6b9J1ZxvZpzg@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

line contains the string that is specified in the [feature] section for the
feature. eg.
KENLM lazyken=0 name=LM0 factor=0 path=../interpolated-binlm.1 order=3
the line variable should be passed to the FeatureFunction base class during
construction.

you can then parse it for your feature-specific variables. Best to do this
by
1. call ReadParameters() in your constructor
2. Implement
SetParameter(const std::string& key, const std::string& value)

see class PhraseBoundaryState for a good example, or the irstlm or srilm in
about 2 yrs after i've cleaned them up...

On 27 August 2013 19:22, Lane Schwartz <dowobeha@gmail.com> wrote:

> I'm doing some LM tinkering, and I noticed that there's a new parameter in
> FeatureFunction and its descendents.
>
> What is the following parameter represent?
>
> "const std::string &line"
>
> Thanks,
> Lane
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130828/2a5e8385/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 82, Issue 43
*********************************************

Moses-support Digest, Vol 82, Issue 43

0 Response to "Moses-support Digest, Vol 82, Issue 43"

Post a Comment