Moses-support Digest, Vol 110, Issue 27

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: MERT's Powell Search (Adam Lopez)
2. Re: dictionary based word alignment? (Philipp Koehn)


----------------------------------------------------------------------

Message: 1
Date: Mon, 14 Dec 2015 17:55:03 +0000
From: Adam Lopez <alopez@inf.ed.ac.uk>
Subject: Re: [Moses-support] MERT's Powell Search
To: liling tan <alvations@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAE-ScgvYjNWvBfD9aB8jh+5xemQcJ22JbK5g6TxuU2GBPbHa7g@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

>
> On line 6 does the "score" in "compute line l: parameter value ? score"
> refer to (i) the MT evaluation metric score (e.g. BLEU) between the
> translation and the reference sentence or (ii) nbest list weighted overall
> score as we see in the last column of a moses generated nbest list (e.g.
> http://www.statmt.org/moses/?n=Advanced.Search)?
>

Neither. It is the model score of that sentence w.r.t. the parameter you're
optimizing. Once you have the model score for each sentence as a function
of ?j, you can then construct a function representing BLEU as a function of
?j by finding the convex hull representing the result of the argmax
operation. That is what is happening in slides 31-36. See below.


> At line 8 of the pseudo code, when it asks to "find line l with steepest
> descent", is it looking for each sentence find the (i) line with the
> highest ?j or (i) the line with the highest g(ei|f).
>

In this context, "steepest descesnt" means "steepest slope"; i.e. choose
the sentence i with the greatest value ai.


> Then at line 15 of the pseudo code, it says "compute score for value
> before first threshold point". Is this "score" different from the "score"
> at line 6? At line 6, it's a sentence-level score (which I hope it means
> BLEU and not the weighted overall score), and at line 15, it seems to be
> computing the corpus-level score given the initial parameter values.
>
> If at line 15, it is computing the corpus level score, is it only taking
> the best score of the n translations for each reference? And if this is
> BLEU, it's doing not a simple case of averaging sentence-level BLEU which
> might be kept from line 6, is that right? If it is BLEU, then this score
> could be pre-computed before the powell search too, right?
>

Remember what we're trying to do: choose ?j to maximize BLEU. The algorithm
here does that exactly w.r.t. the N-best list. That is, over a corpus of M
sentences for which we have N-best translations, we want to find:

1) argmax?j BLEU(?j)

Let's unroll this computation. Let e?m(?) be the translation that the
decoder chooses for the m-th training example when ?j=?, and bm(e?) be a
function returning the vector of sentence-level statistics used in the
computation of BLEU when e? is the translation of the m-th training example
(i.e. n-gram matches and reference counts). BLEU is a function of the
aggregate results of calls to b, so (1) becomes:

2) argmax?j BLEU(?m ? 1,...,M b(e?m(?j)))

But e?m(?j) is just argmaxn? 1,...,N g(em,n,fm,?j), where em,n is the n-th
element of the N-best list for the m-th training example and fm is the
source sentence of the m-th training example, and g is the model score we
compute from this pair as a function of ?j (holding the remaining elements
of ? constant, remember). So this becomes:

3) argmax?j BLEU(?m ? 1,...,M b(argmaxn? 1,...,N g(em,n,fm,?j)))

And since we have g(em,n,fm,?j) = ?k? 1,...,|?| ?khk(em,n,fm) = ?jhj(em,n,fm)
+ ?k? 1,...,j-1,j+1,...,|?| ?khk(em,n,fm), we get:

4) argmax?j BLEU(?m ? 1,...,M b(argmaxn? 1,...,N ?jhj(em,n,fm) + ?k?
1,...,j-1,j+1,...,|?| ?khk(em,n,fm)))

Since both h and and the remaining elements of ? are fixed, this becomes
(using a variant of the notation in slide 31, where a and b are functions
of these constants):

5) argmax?j BLEU(?m ? 1,...,M b(argmaxn? 1,...,N ?ja(em,n,fm) + b(em,n,fm)))

The function inside the outer argmax in (4) is exactly the function that's
being constructed piece-by-piece in slides 31-35, and illustrated in slide
36. Here's how that happens:

- On slide 31, we construct the model score the n-th element of the N-best
list for the m-th training example em,n as a linear function of ?j, as
we've discussed. This is the bit inside the inner argmax.

- On slide 32, we repeat the construction of 31 for *every* element of an
N-best list for the m-th training example.

- Slide 33 shows the max of the function inside the inner argmax. Each
point on the convex hull is a point where the argmax changes, and the
argmax of any interval over the x-values of these points is just the
element of the n-best list giving rise to the line whose value is maximal
in that interval.

- Slide 34 shows how we actually get the argmax. We have to find the
intersection points of the upper convex hull, which is why we're sorting
the lines by slope and computing their intersection.

- Finally, slide 36 shows the complete function inside the argmax of (4).
We compute the statistics b for the maximizing sentence in each interval,
and then sum the resulting function over all training examples. This
basically gives us a set of intervals and sufficient statistics for BLEU in
each interval, which we use to compute the complete function.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151214/cd19cfeb/attachment-0001.html
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151214/cd19cfeb/attachment-0001.pl

------------------------------

Message: 2
Date: Tue, 15 Dec 2015 10:16:47 -0500
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] dictionary based word alignment?
To: julian@lfbtranslations.co.uk
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDDQrv3qhjFzBSW1r9UXaqA5L2rfxYPxMfNr45Gs_dFg9A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

the simplest way to do this is to just add the dictionary (maybe many
times) to the parallel corpus that you want to align, and thus the model
will be biased towards alignments that match the dictionary.

-phi

On Mon, Dec 14, 2015 at 9:30 AM, Julian <julian@lfbtranslations.co.uk>
wrote:

> Hello all, would anyone know of a word alignment tool that can take a
> bilingual dictionary as an argument to guide probabilities? Preferably
> with an implementation like fast_align or similar.
>
> Thanks in advance
>
> Julian
>
> -------------------------------
>
> Julian Myerscough
> Quality Assurance Manager - Languages for Business Ltd
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151215/d4375b8a/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 110, Issue 27
**********************************************

0 Response to "Moses-support Digest, Vol 110, Issue 27"

Post a Comment