Moses-support Digest, Vol 82, Issue 34

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: How do Moses use the LM for unknown words (Per Tunedal)
2. Decoding with word lattice (Wei Qiu)
3. Error when attempting to translate: fails with "
StrayFactorException " (Stefan Dumitrescu)


----------------------------------------------------------------------

Message: 1
Date: Mon, 26 Aug 2013 14:04:44 +0200
From: Per Tunedal <per.tunedal@operamail.com>
Subject: Re: [Moses-support] How do Moses use the LM for unknown words
To: Nicola Bertoldi <bertoldi@fbk.eu>
Cc: "<moses-support@mit.edu>" <moses-support@mit.edu>
Message-ID:
<1377518684.31616.14189005.20864FF4@webmail.messagingengine.com>
Content-Type: text/plain; charset="UTF-8"

Hi Nicola,
thank you for your answer. I conclude that "back-off weight" is the same
as what's called "back-off cost" in Koehn's textbook on SMT. And that
Moses use the back-off procedure described in the book.
Anything on this in the Moses wiki?
Yours,
Per Tunedal

On Sun, Aug 25, 2013, at 9:53, Nicola Bertoldi wrote:
> Hi Per,
>
> for all n-gram (but those of the highest order), the third field is the
> logarithmic back-off weight (logBO)
> if not reported, the weight is assumed equalt to 0 (in log scale)
>
> suppose you want to compute
> logP(maison | cadre seulement)
>
> and the 3-gram
> "cadre seulement maison"
> is absent , i.e. logP(maison | cadre seulement) = 0.0
>
> and the 2-gram is present as follows:
> -0.7 cadre maison -0.1
>
> hence, the LM is computed as:
>
> logP(maison | cadre seulement) + logBO(cadre seulement) * logP(maison |
> seulement) =
> = 0.0 + -4.57217 * -0.7
>
> (sorry for the example, but I do not speak French)
>
> best
> Nicola
>
> On Aug 23, 2013, at 9:20 PM, Per Tunedal wrote:
>
> >
> > Hi,
> > how do Moses calculate the probability of a sentence with an unknown
> > word? How is the LM used?
> >
> > I've estimated a 3-gram LM with IRSTLM for a base line system, according
> > to the instructions in the Wiki. The arpa-file contains entries like:
> >
> > -7.2625 redescendue -0.1681
> > -7.2625 serviabilit? -0.1681
> > -2.51072 <unk>
> >
> > -3.26915 cadre tr?s -0.096544
> > -4.52727 cadre lors
> > -4.57217 cadre seulement
> >
> > I suppose the first number is the probability and the second number is
> > the "back-off weight". Is it used somehow? In that case, what happens
> > when it's absent (4.52727 cadre lors) ?
> >
> > Yours,
> > Per Tunedal
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>



------------------------------

Message: 2
Date: Mon, 26 Aug 2013 15:42:19 +0200
From: Wei Qiu <wei@qiu.es>
Subject: [Moses-support] Decoding with word lattice
To: moses-support <moses-support@mit.edu>
Message-ID:
<CALWr_T9BmA6MY1tbhYOZMiKS-2k+RinxwFarxeDDcW52WE4d8w@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

I'm now trying to use word lattice decoding.
I observed performance drop (around 5% in BLEU) by simply concatenating
tokens in the source language into a linear *word lattice*. I thought it
would give almost identical results.

Since I'm using this technique for some artificial language pairs, I'm
wondering is it normal? Do you have also this problem for natural language
pairs?

If possible, how can I improve the word lattice decoding result?

Thanks in advance!

Best,
Wei Qiu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130826/d89de53e/attachment-0001.htm

------------------------------

Message: 3
Date: Mon, 26 Aug 2013 17:23:48 +0300
From: Stefan Dumitrescu <dumitrescu.stefan@gmail.com>
Subject: [Moses-support] Error when attempting to translate: fails
with " StrayFactorException "
To: moses-support@MIT.EDU
Message-ID: <521B64F4.20000@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi!

I have the following error when attempting to translate:

Exception: moses/Word.cpp:109 in void
Moses::Word::CreateFromString(Moses::FactorDirection, const
std::vector<long unsigned int>&, const StringPiece&, bool) threw
StrayFactorException because `fit'.
You have configured 1 factors but the word !|!|!^EXCL|EXCL|EXCL contains
factor delimiter | too many times.

I have the following training script:

/usr/local/trans/tools/moses/scripts/training/train-model.perl \
--corpus /usr/local/trans/corpus/ted/train \
--external-bin-dir=/usr/local/trans/tools/mgiza/bin \
--parallel \
--mgiza \
--mgiza-cpus 8 \
--f ro --e en \
--lm 0:5:/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm:8 \
--root-dir /usr/local/trans/work/ted/m1 \
--max-phrase-length 4 \
--first-step $FIRSTSTEP \
--translation-factors 0-0 \
--alignment grow-diag-final-and \
--reordering wbe-msd-bidirectional-fe

The train files are factored (5 factors: word, lemma, lemma^postag1,
postag1, postag2). The training process works without any errors, it
generates a valid phrase-table that looks like:

zcat ../m1/model/phrase-table.gz | head -1
!|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
pe|pe|pe^S|S|Spca ||| !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
!|!|!^EXCL|EXCL|EXCL ||| 0.5 0.0187005 1 0.18952 ||| 0-0 1-0 0-1 2-2 |||
2 1 1

I did not get this error a couple of months ago when working on another
experiment. I'm guessing something changed in Moses and I am missing
some required flag in my scripts? I am using scripts that have worked ok
so far.
I looked through the manual, and I tried using the -input-factors
option, but i still receive the same error. What am I doing wrong? It is
something trivial most likely, but I do appreciate your help with it.

Thank you,
Stefan

(moses.ini below:)
#########################
### MOSES CONFIG FILE ###
#########################

# input factors
[input-factors]
0

# mapping steps
[mapping]
0 T 0

[distortion-limit]
6

# feature functions
[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
PhraseDictionaryMemory name=TranslationModel0 table-limit=20
num-features=4 path=/usr/local/trans/work/ted/m1/model/phrase-table.gz
input-factor=0 output-factor=0
LexicalReordering name=LexicalReordering0 num-features=6
type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
path=/usr/local/trans/work/ted/m1/model/reordering-table.wbe-msd-bidirectional-fe.gz
Distortion
KENLM lazyken=0 name=LM0 factor=0
path=/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm order=5

# dense weights for feature functions
[weight]
UnknownWordPenalty0= 1
WordPenalty0= -1
PhrasePenalty0= 0.2
TranslationModel0= 0.2 0.2 0.2 0.2
LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
Distortion0= 0.3
LM0= 0.5



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 82, Issue 34
*********************************************

0 Response to "Moses-support Digest, Vol 82, Issue 34"

Post a Comment