Moses-support Digest, Vol 104, Issue 40

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Fwd: Re: Major bug found in Moses (liling tan)
2. Re: c++11 support (Ulrich Germann)
3. Re: Major bug found in Moses (Matt Post)


----------------------------------------------------------------------

Message: 1
Date: Wed, 17 Jun 2015 20:36:52 +0200
From: liling tan <alvations@gmail.com>
Subject: Re: [Moses-support] Fwd: Re: Major bug found in Moses
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAKzPaJLb9L0spfpNCGSQWm4nRJon+g8mawt2ws2vJDqAhUJ4pA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear James and Moses devs,

I guess everyone's hunch would be whether you've done tuning correctly
before the awkward results you've reported.

----

Read on for sob story about me and moses and some guide to sooth the tone,
try i may.

Please skip/delete, if you would not like to read the sob story =)

I am not an expert in moses nor someone who have used it extensively. But I
can clearly say that the Moses learning curve is steeper than most NLP/MT
libraries.

Still it is worth going through the rite of passage, from step 1-9:
http://www.statmt.org/moses/?n=FactoredTraining.PrepareData . Beyond that
there is a zoo of other considerations such as tuning, decoding tricks,
other translation model training and more recently how language model is
trained.

However to say that there is a major bug in moses just because you aren't
getting the desired result is an ungrounded statement. There are many
others who have achieved results before you, just because you can't do it
means it's broken.

It took me close to 2 years to understand some bits of how SMT works and
even smaller bit of how moses work. My best learning experience came from
participating in WMT shared task and competing against other researchers.
At best, I learnt how to use moses after doing that. The only time I start
to understand a little SMT is when I'm forced to grind through the Philip
Koehn's book and try reimplementing a super simplistic unrealistic decoder
while talking to the moses devs at MT Marathon. Still there's much for me
to learn.

Possibly, linking these resources to the moses page might help sooth the
pains of getting to know moses:

- *Tutorials:*
- *TAUS tutoral*:
https://translate.taus.net/translate/mosescore/machine-translation-and-moses-tutorial#getting-started
- http://www.idiap.ch/~apbelis/hlt-course/TP-MT-instructions.pdf
- http://www.cs.upc.edu/~cristinae/CV/docs/tutorialSMTprint.pdf
- http://nlp.cs.upc.edu/lrec-mttutorial/#
- *MT Marathon slides*, e.g.
http://statmt.org/mtm14/index.php?n=Main.TalksLecturesLabs ,
http://www.statmt.org/mtma15/index.php?n=Main.Program
- *Baseline systems for other sources:*
- *WAT*:
http://orchid.kuee.kyoto-u.ac.jp/WAT/baseline/baselineSystems.html
- *Friends of Moses*:
- http://www.cdec-decoder.org/ ,
- https://kheafield.com/code/kenlm/ ,
- http://joshua-decoder.org/ ,
- http://www-nlp.stanford.edu/wiki/Software/Phrasal2
- http://stp.lingfil.uu.se/~ch/docent-lab.html , etc.

Even if the above isn't on the official moses site, would I have the
"blessings of the moses dev" to put them up on Wikipedia
https://en.wikipedia.org/wiki/Moses_(machine_translation)?

Sadly, it's my last year of my PhD journey and I'm finalizing my
experiments albeit knowing little of SMT and moses. But I would still like
to either port some of the moses code to python or write wrappers for NLTK
to call moses.

BTW, for now, these are valid lines in NLTK:

>>> import nltk
>>> nltk.download('moses_sample')


The Moses developers have done a great job in open-sourcing one of the
first SMT device that have spun off others and they're at doing it. @James,
if you ask politely, usually some nice devs will guide you to the right
answer(s).

Regards,
Liling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/ed5d8de2/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 17 Jun 2015 20:50:27 +0100
From: Ulrich Germann <ulrich.germann@gmail.com>
Subject: Re: [Moses-support] c++11 support
To: undisclosed-recipients:;
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAHQSRUp5yJB2XfhwORSFOcZwM04y=ck7-6qFr8kOWckJrRHS7Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I'd strongly advise against being too avant garde. Moses has a large user
base, and many users are still using (or have to use) stable, run-off-the
mill linux installations that are a few years old yet still officially
supported. In my opinion, our reference architecture for core moses
functionality should be the oldest Ubuntu LTS version still under official
support, currently Ubuntu 12.04. I have to admit that I don't keep track
closely what's happening with C++, but for me gcc-4.6 plus the boost
libraries still does the trick. Why the rush to the latest and greatest?
What exactly is so broken that we need C++11 to fix it?

- Uli

On Tue, Jun 16, 2015 at 5:02 PM, Rico Sennrich <rico.sennrich@gmx.ch> wrote:

> Hi list,
>
> some code in mosesdecoder (oxlm, c++tokenizer) already requires c++11. To
> let people benefit from the usability and functionality improvements of
> c++11, it would be beneficial to allow the use of c++11 features in all of
> the code.
>
> before people start making big changes to the codebase, we should make sure
> that there are no good reasons against allowing c++11 features, such as
> lack
> of compiler support.
>
> I pushed a minimal commit (6c0f875) to test the waters. If this introduces
> bugs, or if users still rely on old compilers without c++11 support, please
> complain here.
>
> best wishes,
> Rico
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
Ulrich Germann
Senior Researcher
School of Informatics
University of Edinburgh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/1003043d/attachment-0001.htm

------------------------------

Message: 3
Date: Wed, 17 Jun 2015 10:25:27 -0400
From: Matt Post <post@cs.jhu.edu>
Subject: Re: [Moses-support] Major bug found in Moses
To: "Read, James C" <jcread@essex.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>, "Arnold, Doug"
<doug@essex.ac.uk>
Message-ID: <9454F29F-34D1-4FEA-A6FA-1C003819A2DD@cs.jhu.edu>
Content-Type: text/plain; charset="utf-8"

I think you are misunderstanding how decoding works. The highest-weighted translation of each source phrase is not necessarily the one with the best BLEU score. This is why the decoder retains many options, so that it can search among them (together with their reorderings). The LM is an important component in making these selections.

Also, how did you weight the many probabilities attached to each phrase (to determine which was the most probable)? The tuning phase of decoding selects weights designed to optimize BLEU score. If you weighted them evenly, that is going to exacerbate this experiment.

matt



> On Jun 17, 2015, at 10:22 AM, Read, James C <jcread@essex.ac.uk> wrote:
>
> All I did was break the link to the language model and then perform filtering. How is that a methodoligical mistake? How else would one test the efficacy of the TM in isolation?
>
> I remain convinced that this is undersirable behaviour and therefore a bug.
>
> James
>
>
> From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
> Sent: Wednesday, June 17, 2015 5:12 PM
> To: Read, James C
> Cc: Arnold, Doug; moses-support@mit.edu
> Subject: Re: [Moses-support] Major bug found in Moses
>
> Hi James
> No, not at all. I would say that is expected behaviour. It's how search spaces and optimization works. If anything these are methodological mistakes on your side, sorry. You are doing weird thinds to the decoder and then you are surprised to get weird results from it.
> W dniu 2015-06-17 16:07, Read, James C napisa?(a):
>>
>> So, do we agree that this is undersirable behaviour and therefore a bug?
>>
>> James
>>
>> From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
>> Sent: Wednesday, June 17, 2015 5:01 PM
>> To: Read, James C
>> Subject: Re: [Moses-support] Major bug found in Moses
>>
>> As I said. With an unpruned phrase table and an decoder that just optmizes some unreasonble set of weights all bets are off, so if you get very low BLEU point there, it's not surprising. It's probably jumping around in a very weird search space. With a pruned phrase table you restrict the search space VERY strongly. Nearly everything that will be produced is a half-decent translation. So yes, I can imagine that would happen.
>> Marcin
>> W dniu 2015-06-17 15:56, Read, James C napisa?(a):
>> You would expect an improvement of 37 BLEU points?
>>
>> James
>>
>>
>> From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
>> Sent: Wednesday, June 17, 2015 4:32 PM
>> To: Read, James C
>> Cc: Moses-support@mit.edu; Arnold, Doug
>> Subject: Re: [Moses-support] Major bug found in Moses
>>
>> Hi James,
>> there are many more factors involved than just probability, for instance word penalties, phrase penalities etc. To be able to validate your own claim you would need to set weights for all those non-probabilities to zero. Otherwise there is no hope that moses will produce anything similar to the most probable translation. And based on that there is no surprise that there may be different translations. A pruned phrase table will produce naturally less noise, so I would say the behaviour you describe is quite exactly what I would expect to happen.
>> Best,
>> Marcin
>> W dniu 2015-06-17 15:26, Read, James C napisa?(a):
>> Hi all,
>>
>> I tried unsuccessfully to publish experiments showing this bug in Moses behaviour. As a result I have lost interest in attempting to have my work published. Nonetheless I think you all should be aware of an anomaly in Moses' behaviour which I have thoroughly exposed and should be easy enough for you to reproduce.
>>
>> As I understand it the TM logic of Moses should select the most likely translations according to the TM. I would therefore expect a run of Moses with no LM to find sentences which are the most likely or at least close to the most likely according to the TM.
>>
>> To test this behaviour I performed two runs of Moses. One with an unfiltered phrase table the other with a filtered phrase table which left only the most likely phrase pair for each source language phrase. The results were truly startling. I observed huge differences in BLEU score. The filtered phrase tables produced much higher BLEU scores. The beam size used was the default width of 100. I would not have been surprised in the differences in BLEU scores where minimal but they were quite high.
>>
>> I have been unable to find a logical explanation for this behaviour other than to conclude that there must be some kind of bug in Moses which causes a TM only run of Moses to perform poorly in finding the most likely translations according to the TM when there are less likely phrase pairs included in the race.
>>
>> I hope this information will be useful to the Moses community and that the cause of the behaviour can be found and rectified.
>>
>> James
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>
>>
>>
>>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/67f00315/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 104, Issue 40
**********************************************

0 Response to "Moses-support Digest, Vol 104, Issue 40"

Post a Comment