Moses-support Digest, Vol 86, Issue 41

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. bug in constrained decoding and syntax decoding (Hieu Hoang)
2. Re: Some of the confusing concepts (Philipp Koehn)

----------------------------------------------------------------------

Message: 1
Date: Fri, 13 Dec 2013 17:48:32 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: [Moses-support] bug in constrained decoding and syntax
decoding
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbigp8tsBka8PtkKmEC3tqhtM5GTtiiEj7adGfJjRmno+A@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

If you're using constrained decoding (the ConstrainedDecoding feature
function) or you're using SCFG decoding with target syntax, there's a few
bugs that affect the results. Please update using
git pull

1. Constrained decoding. Bug causes more sentences to appear to be
unreachable (+10% in my experiment). Accidently casted an integer to bool.

https://github.com/moses-smt/mosesdecoder/commit/295c07e884fa3b10348c2293f98386f26b7a8b49

2. Target syntax decoding. If multiple translation rules have the same
target sentence but different LHS, only 1 is used. Especially affect
decoders linked to old version of boost.

Caused by programming error on our part, and from boost's hashing of a
shared_ptr not taking into account the actual pointer. Fixed:

https://github.com/moses-smt/mosesdecoder/commit/06b0b6ae87fa0184c2ceb4827dd090a6d5709a45

https://github.com/moses-smt/mosesdecoder/commit/ff3c8c195a1db35ed179025b8e7a4265fd54fbc7

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131213/67fa4d2a/attachment-0001.htm

------------------------------

Message: 2
Date: Fri, 13 Dec 2013 20:12:47 +0000
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] Some of the confusing concepts
To: Andrew <ravenyj@hotmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDD8foMcXHd+oZMkEt_WcmaJ2KpmbNRV0GAW609kpETJng@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

> 1) In GIZA++, what are the order and number of iterations for each model?
> It seems like the order is Model 1->Model 2->HMM -> Model 3-> Model 4 by
> default, but I'm not sure how many iterations of each runs by default.

train-model.perl currently sets the number of iterations to:
m1 => 5 ,
m2 => 0 ,
m3 => 3 ,
m4 => 3 ,

> 2) In GIZA++, is it right that source word cannot be aligned to more than
> one word in target language? What about the opposite? And can we have a case
> where multiple source words are aligned to the same target word, and vice
> versa? What would happen in an extreme case where source sentence is only
> one word, and target sentence is, say, 10 words?

Yes, there is only one-to-many alignment and that is why we run GIZA++
in both directions and symmetrize the alignments.

> 3) From what I've read, it seems like all possible alignments are counted at
> first, and alignment probability for each word is calculated based on those
> counts. If so, in case where |source| < |target|, which source word is
> likely to get aligned to empty word? My understanding is that it would be
> the word with lowest alignment probability in regard to target words, and a
> word with high fertility probability for n=0.

In GIZA++ words may be aligned to any number of words on the other side,
including 0 words, hence you will get some unaligned words. This is driven
by probabilities specific to each of the IBM Models.

> 4) If we opt not to use reordering table in moses.ini, will the distortion
> limit be meaningless? Also in that case, will the grammaticality be
> dependent only on the language model?

The distortion limit is generally useful even when a lexicalized reordering
model is used. The n-gram language model helps somewhat with
"grammaticality" but due to its narrow window its power is limited.
One of the main motivation for target syntax models is to have better
support for grammatically well-formed output.

> 5) If GIZA++ aligns words in both directions, why does it matter which one
> is source and which one is target? Is there difference in weights? Or is it
> because of the restriction that source word can only be aligned to one
> target word?

Two complete different models are trained for each direction of the GIZA++
run. For each direction, source and target have quite different properties.

-phi

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 86, Issue 41
*********************************************

Moses-support Digest, Vol 86, Issue 41

0 Response to "Moses-support Digest, Vol 86, Issue 41"

Post a Comment