Moses-support Digest, Vol 90, Issue 11

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Implementation of pre-reordering for German and other
languages (Benjamin K?rner)
2. Re: how to integrate a long-span LM into moses (Kenneth Heafield)
3. Re: Implementation of pre-reordering for German and other
languages (Graham Neubig)


----------------------------------------------------------------------

Message: 1
Date: Thu, 3 Apr 2014 21:56:00 +0200
From: Benjamin K?rner <b.koerner@stud.uni-heidelberg.de>
Subject: Re: [Moses-support] Implementation of pre-reordering for
German and other languages
To: "'Hieu Hoang'" <hieuhoang@gmail.com>
Cc: 'moses-support' <moses-support@mit.edu>
Message-ID: <000c01cf4f76$ce6b0890$6b4119b0$@stud.uni-heidelberg.de>
Content-Type: text/plain; charset="utf-8"

Hi Hieu, hi all,



well it is rather a preprocessing toolkit including some java scripts and it needs a Hadoop environment. We got the code at google Code btw. Not sure whether it is really sophisticated enough for adding it to Moses.



Best,



Benjamin



Von: Hieu Hoang [mailto:hieuhoang@gmail.com]
Gesendet: Donnerstag, 3. April 2014 21:13
An: Benjamin K?rner
Cc: <maxkhalilov@gmail.com>; moses-support; mt-list@eamt.org
Betreff: Re: [Moses-support] Implementation of pre-reordering for German and other languages



Hi Benjamin



If you want to add your code to moses, send me your github username and I'll give you commit access

Sent while bumping into things


On 3 Apr 2014, at 05:54 pm, Benjamin K?rner <b.koerner@stud.uni-heidelberg.de> wrote:

Hi Maxim, hi all,



in an undergraduate class we were working on a automatic rule extraction and source-side preordering system as a preprocessing step for English- Japanese. At the moment I'm writing my BA thesis on this stuff. It works with Moses and cdec and since it is just preprocessing it should work with any other decoder.

If you exchange the parser and maybe some of the parameters it should work for every language pair. We get BLEU score improvement in intrinsic evaluation (right now with a 1,7 mio sentence eng-jap patent corpus).



The idea came from Genzel http://research.google.com/pubs/archive/36484.pdf



Also check our final presentation slides of our implemention @ https://www.cl.uni-heidelberg.de/studies/projects/poster/hitschler_koerner_ohta2013.pdf





Best,



Benjamin





Von: moses-support-bounces@mit.edu [mailto:moses-support-bounces@mit.edu] Im Auftrag von Milos Stanojevic
Gesendet: Donnerstag, 3. April 2014 16:12
An: Hieu Hoang
Cc: moses-support; mt-list@eamt.org
Betreff: Re: [Moses-support] Implementation of pre-reordering for German and other languages



Hi Maxim,



You can check this paper: http://aclweb.org/anthology//P/P11/P11-2067.pdf <http://aclweb.org/anthology/P/P11/P11-2067.pdf>

It says: "CKK uses the Dubey and Keller (2003) parser, which is trained on the Negra corpus (Skut et al., 1997)."



Regards,

Milos Stanojevic







On Thu, Apr 3, 2014 at 12:50 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:

I don't know of an implementation that's available for download and I seem to remember that the parser they used was extremely difficult to use or compile.

If you find out more, please let everyone know



On 25 March 2014 17:07, Maxim Khalilov <maxkhalilov@gmail.com> wrote:

Dear Moses community,



I am looking for a ready to use or easily customizable implementation of pre-reordering algorithms for Moses? In particular, I'm interested in language pairs with German as a source language and a variety of languages as target, so probably the best solution is syntax based.



As a starting point, I would consider the algorithm described in (Collins et al., 2005), but I don't know if there is an implementation available and which parser it relies on.



Thanks for your help beforehand,

Maxim Khalilov





_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support




--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support



_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140403/b49c577f/attachment-0001.htm

------------------------------

Message: 2
Date: Thu, 03 Apr 2014 14:23:16 -0700
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] how to integrate a long-span LM into
moses
To: moses-support@MIT.EDU
Message-ID: <533DD144.2070809@kheafield.com>
Content-Type: text/plain; charset=ISO-8859-1



On 04/03/14 03:55, David Mrva wrote:
> Hi moses developers,
>
> I have integrated an LM with unlimited history into moses code by
> implementing the StatefulFeatureFunction class following the example in
> moses/FF/SkeletonStatefulFF.cpp. My LM code can also handle ngrams
> through using kenlm. As kenlm is integrated into moses, I compared my
> new code with the standard moses with cs-en models. Standard moses gave
> 17.12 BLEU points and my new code with the same ini file and the same LM
> file resulted in BLEU 13.4. This lead me to read the Kenlm.cpp
> implementation in much more detail and I found a number of interesting
> aspects that I would like to understand to integrate a long-span model
> correctly.
>
> 1/ In the Evaluate(const Hypothesis &hypo, const FFState *ps,
> ScoreComponentCollection *out) const method moses/LM/Kenlm.cpp
> calculates the score over only the first N - 1 words, where N is the
> ngram order. Why not calculate the score for all words in the target
> phrase from hypo.GetCurrTargetWordsRange().GetEndPos() to
> hypo.GetCurrTargetWordsRange().GetStartPos()? Is this an optimisation or
> is it required to make the translation work?

It's an optimization. The Nth and beyond words were already scored when
the phrase was loaded so why bother scoring them again?

>
> 2/ At the time of the load of the translation phrase table, moses calls:
> void LanguageModel::Evaluate(const Phrase &source, const TargetPhrase
> &targetPhrase, ScoreComponentCollection &scoreBreakdown,
> ScoreComponentCollection &estimatedFutureScore) const
> where LanguageModel is the ancestor of the LanguageModelKen<Model>
> template implemented in Kenlm.cpp. Moses calls this Evaluate() method
> for each source-target translation phrase and assigns the accumulated
> score of the first N-1 words to estimatedFutureScore and the accumulated
> score over the rest of the phrase into scoreBreakdown. What is the
> purpose of this split of the ngram scores into the first N-1 words and
> the rest of the phrase? Is the scoreBreakdown value added with the score
> calculated with the Evaluate() method from point 1/ during the search to
> get the total phrase score?

The future part, which includes everything, is used for cube pruning
prioritization and future cost estimates. The Nth and beyond score is
added to the score of the first N-1 words under point 1.

>
> 3/ The method LanguageModelKen<Model>::CalcScore(const Phrase &phrase,
> float &fullScore, float &ngramScore, size_t &oovCount) const used also
> in the calculation in point 2/ above, distinguishes between
> terminal/non-terminal words. What are these? Is the distinction relevant
> to long-span LMs? Why is the terminal/non-terminal distinction not
> necessary in the Evaluate() method described in point 1/ above?

Evaluate is only used by phrase-based MT. CalcScore is used by
syntactic and phrase-based MT. Syntactic MT can have non-terminals.

>
> After changing my implementation to incorporate the behaviour described
> above I got an exact match of the output with moses's "native" Kenlm
> implementation. However, the behaviour in point 1/ is not suitable for
> long-span LMs and point 2/ does not make sense for non-ngram models. I
> would expect my long-span LM to operate in such a way that an LM state
> assigned to a hypothesis covers all target words from the start to the
> end of the hypothesis. And when a hypothesis is being extended, its LM
> state is extended by one target word at a time in a loop over the new
> phrase from start to finish. Ngram LM implementation does not work in
> this way and it seems to harm ngram performance. Can anyone shed some
> light on the motivation behind the behaviour described above in points 1-3?

If you want good search accuracy with your long-distance LM then you
need the ability to estimate the scores of phrases without knowing what
will appear before them. These estimates are critical to driving
search. It is likely that, once you implement such estimates, they will
take a similar form the the n-gram implementations.

>
> I used moses with its default, a.k.a. "normal", search algorithm (no
> [search-algorithm] variable specified in my config). For completeness,
> my config when using moses with its Kenlm class is pasted below.
>
> Best regards,
> David
>
>
> # input factors
> [input-factors]
> 0
>
> # mapping steps
> [mapping]
> 0 T 0
>
> [distortion-limit]
> 6
>
> # feature functions
> [feature]
> UnknownWordPenalty
> WordPenalty
> PhrasePenalty
> PhraseDictionaryMemory name=TranslationModel0 table-limit=20
> num-features=4 path=model/phrase-table.1.gz input-factor=0 output-factor=0
> LexicalReordering name=LexicalReordering0 num-features=6
> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
> path=model/reordering-table.1.wbe-msd-bidirectional-fe.gz
> Distortion
> KENLM lazyken=1 name=LM0 factor=0 path=lm/europarl.binlm.1 order=5
>
> # dense weights for feature functions
> [weight]
> UnknownWordPenalty0= 1
> WordPenalty0= -1
> PhrasePenalty0= 0.2
> TranslationModel0= 0.2 0.2 0.2 0.2
> LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
> Distortion0= 0.3
> LM0= 0.5
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

Message: 3
Date: Fri, 4 Apr 2014 12:02:16 +0900
From: Graham Neubig <neubig@is.naist.jp>
Subject: Re: [Moses-support] Implementation of pre-reordering for
German and other languages
To: Maxim Khalilov <maxkhalilov@gmail.com>
Cc: "<moses-support@mit.edu>" <moses-support@mit.edu>,
mt-list@eamt.org
Message-ID:
<CADkjOCPp-JH6cgfcCQ4bkX9EsVYgphUH7GtuQ0+nhWip2=zMzg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Maxim,

Just FYI, for IWSLT last year we implemented Collins's pre-ordering rules
over Penn Treebank-style parses, which means you can use any treebank
parser that supports German (such as the Berkeley parser):
http://www.phontron.com/paper/sudoh13iwslt.pdf

We couldn't get all of Collins's rules because we only trained the parser
using the phrase categories, not the extra annotations like subject, etc.,
but it still lets you do the major ones such as the verb reordering, and
gave us a nice bonus over regular phrase based models.
We probably can't provide the code, as it was written by someone in a
company, but the rules are so simple it really shouldn't be very hard to
implement at all for someone familiar with MT.

Graham


On Wed, Mar 26, 2014 at 2:07 AM, Maxim Khalilov <maxkhalilov@gmail.com>wrote:

> Dear Moses community,
>
> I am looking for a ready to use or easily customizable implementation of
> pre-reordering algorithms for Moses? In particular, I'm interested in
> language pairs with German as a source language and a variety of languages
> as target, so probably the best solution is syntax based.
>
> As a starting point, I would consider the algorithm described in (Collins
> et al., 2005), but I don't know if there is an implementation available and
> which parser it relies on.
>
> Thanks for your help beforehand,
> Maxim Khalilov
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140404/2a8a4d44/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 90, Issue 11
*********************************************

0 Response to "Moses-support Digest, Vol 90, Issue 11"

Post a Comment