Moses-support Digest, Vol 104, Issue 39

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Major bug found in Moses (Matt Post)
2. Re: Major bug found in Moses (Read, James C)

----------------------------------------------------------------------

Message: 1
Date: Wed, 17 Jun 2015 14:11:26 -0400
From: Matt Post <post@cs.jhu.edu>
Subject: Re: [Moses-support] Major bug found in Moses
To: "Read, James C" <jcread@essex.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>, "Arnold, Doug"
<doug@essex.ac.uk>
Message-ID: <3CD64831-7DE3-4724-900F-EAE9BF081BA1@cs.jhu.edu>
Content-Type: text/plain; charset=us-ascii

When you filter the TM, you reported that you used the fourth weight. When you translate with the full TM, what weights did you assign to the TM? If you used the default, I believe it would equally weight all the phrasal features (i.e., 1 1 1 1). This would explain why decoding with the full TM does not give the same result as filtering first. The moses.ini in your unfiltered translation experiment should assign weights of "0 0 0 1" to the TM features.

> On Jun 17, 2015, at 1:52 PM, Read, James C <jcread@essex.ac.uk> wrote:
>
> The analogy doesn't seem to be helping me understand just how exactly it is a desirable quality of a TM to
>
> a) completely break down if no LM is used (thank you for showing that such is not always the case)
> b) be dependent on a tuning step to help it find the higher scoring translations
>
> What you seem to be essentially saying is that the TM cannot find the higher scoring translations because I didn't pretune the system to do so. And I am supposed to accept that such is a desirable quality of a system whose very job is to find the higher scoring translations.
>
> Further, I am still unclear which features you prequire a system to be tuned on. At the very least it seems that I have discovered the selection process that tuning seems to be making up for in some unspecified and altogether opaque way.
>
> James
>
>
> ________________________________________
> From: Hieu Hoang <hieuhoang@gmail.com>
> Sent: Wednesday, June 17, 2015 8:34 PM
> To: Read, James C; Kenneth Heafield; moses-support@mit.edu
> Cc: Arnold, Doug
> Subject: Re: [Moses-support] Major bug found in Moses
>
> 4 BLEU is nothing to sniff at :) I was answering Ken's tangent aspersion
> that LM are needed for tuning.
>
> I have some sympathy for you. You're looking at ways to improve
> translation by reducing the search space. I've bashed my head against
> this wall for a while as well without much success.
>
> However, as everyone is telling you, you haven't understood the role of
> tuning. Without tuning, you're pointing your lab rat to some random part
> of the search space, instead of away from the furry animal with whiskers
> and towards the yellow cheesy thing
>
> On 17/06/2015 20:45, Read, James C wrote:
>> Doesn't look like the LM is contributing all that much then does it?
>>
>> James
>>
>> ________________________________________
>> From: moses-support-bounces@mit.edu <moses-support-bounces@mit.edu> on behalf of Hieu Hoang <hieuhoang@gmail.com>
>> Sent: Wednesday, June 17, 2015 7:35 PM
>> To: Kenneth Heafield; moses-support@mit.edu
>> Subject: Re: [Moses-support] Major bug found in Moses
>>
>> On 17/06/2015 20:13, Kenneth Heafield wrote:
>>> I'll bite.
>>>
>>> The moses.ini files ship with bogus feature weights. One is required to
>>> tune the system to discover good weights for their system. You did not
>>> tune. The results of an untuned system are meaningless.
>>>
>>> So for example if the feature weights are all zeros, then the scores are
>>> all zero. The system will arbitrarily pick some awful translation from
>>> a large space of translations.
>>>
>>> The filter looks at one feature p(target | source). So now you've
>>> constrained the awful untuned model to a slightly better region of the
>>> search space.
>>>
>>> In other words, all you've done is a poor approximation to manually
>>> setting the weight to 1.0 on p(target | source) and the rest to 0.
>>>
>>> The problem isn't that you are running without a language model (though
>>> we generally do not care what happens without one). The problem is that
>>> you did not tune the feature weights.
>>>
>>> Moreover, as Marcin is pointing out, I wouldn't necessarily expect
>>> tuning to work without an LM.
>> Tuning does work without a LM. The results aren't half bad. fr-en
>> europarl (pb):
>> with LM: 22.84
>> retuned without LM: 18.33
>>> On 06/17/15 11:56, Read, James C wrote:
>>>> Actually the approximation I expect to be:
>>>>
>>>> p(e|f)=p(f|e)
>>>>
>>>> Why would you expect this to give poor results if the TM is well trained? Surely the results of my filtering experiments provve otherwise.
>>>>
>>>> James
>>>>
>>>> ________________________________________
>>>> From: moses-support-bounces@mit.edu <moses-support-bounces@mit.edu> on behalf of Rico Sennrich <rico.sennrich@gmx.ch>
>>>> Sent: Wednesday, June 17, 2015 5:32 PM
>>>> To: moses-support@mit.edu
>>>> Subject: Re: [Moses-support] Major bug found in Moses
>>>>
>>>> Read, James C <jcread@...> writes:
>>>>
>>>>> I have been unable to find a logical explanation for this behaviour other
>>>> than to conclude that there must be some kind of bug in Moses which causes a
>>>> TM only run of Moses to perform poorly in finding the most likely
>>>> translations according to the TM when
>>>>> there are less likely phrase pairs included in the race.
>>>> I may have overlooked something, but you seem to have removed the language
>>>> model from your config, and used default weights. your default model will
>>>> thus (roughly) implement the following model:
>>>>
>>>> p(e|f) = p(e|f)*p(f|e)
>>>>
>>>> which is obviously wrong, and will give you poor results. This is not a bug
>>>> in the code, but a poor choice of models and weights. Standard steps in SMT
>>>> (like tuning the model weights on a development set, and including a
>>>> language model) will give you the desired results.
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>> --
>> Hieu Hoang
>> Researcher
>> New York University, Abu Dhabi
>> http://www.hoang.co.uk/hieu
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> .
>>
>
> --
> Hieu Hoang
> Researcher
> New York University, Abu Dhabi
> http://www.hoang.co.uk/hieu
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

Message: 2
Date: Wed, 17 Jun 2015 15:48:14 +0000
From: "Read, James C" <jcread@essex.ac.uk>
Subject: Re: [Moses-support] Major bug found in Moses
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>, "Arnold, Doug"
<doug@essex.ac.uk>
Message-ID:
<DB3PR06MB0713ADF9AF14EE5D93EC5BC485A60@DB3PR06MB0713.eurprd06.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-2"

1) So if I've understood you correctly you are saying we have a system that is purposefully designed to perform poorly with a disabled LM and this is the proof that the LM is the most fundamental part. Any attempt to prove otherwise by, e.g. filtering the phrase table to help the disfunctional search algorithm, does not constitute proof that the TM is the most fundamental component of the system and if designed correctly can perform just fine on its own but rather only evidence that the researcher is not using the system as intended (the intention being to break the TM to support the idea that the LM is the most fundamental part).

2) If you still feel that the LM is the most fundamental component I challenge you to disable the TM and perform LM only translations and see what kind of BLEU scores you get.

In conclusion, I do hope that you don't feel that potential investors in MT systems lack the intelligence to see through these logical fallacies. Can we now just admit that the system is broke and get around to fixing it?

James

________________________________
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Sent: Wednesday, June 17, 2015 5:29 PM
To: Read, James C
Cc: Arnold, Doug; moses-support@mit.edu
Subject: Re: [Moses-support] Major bug found in Moses

To paint you a picture:

Imagine you have a rat in a labyrinth (the labyrinth is the TM and the search space). That rat is quite good at finding the center of that labyrinth. Now you somehow disable that rat's sense of smell, sense of direction, and long-term short-term memory (that's the LM). Can you expect the rat to find the center? Or will it just tumble around, bumping into walls and not find anything? That's what you did to the decoder when disabling the LM.

Now you prune the TM. In the labyrinth that's like closing all the doors that would lead the rat away from the center. There are still a few corridors left, but they all point into the general direction of the point where the rat is supposed to go. Although it may never quite reach it. Now you put that same handicapped rat into the labyrinth where all ways lead more or less to the center. Are you really surprised that the clueless rat find the center nearly every time now?

That's what happend. It's not a bug. The LM is probably the strongest feature in a MT system. If you take that away you see what happens.

W dniu 2015-06-17 16:22, Read, James C napisa?(a):

All I did was break the link to the language model and then perform filtering. How is that a methodoligical mistake? How else would one test the efficacy of the TM in isolation?

I remain convinced that this is undersirable behaviour and therefore a bug.

James

________________________________
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Sent: Wednesday, June 17, 2015 5:12 PM
To: Read, James C
Cc: Arnold, Doug; moses-support@mit.edu
Subject: Re: [Moses-support] Major bug found in Moses

Hi James

No, not at all. I would say that is expected behaviour. It's how search spaces and optimization works. If anything these are methodological mistakes on your side, sorry. You are doing weird thinds to the decoder and then you are surprised to get weird results from it.

W dniu 2015-06-17 16:07, Read, James C napisa?(a):

So, do we agree that this is undersirable behaviour and therefore a bug?

James

________________________________
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Sent: Wednesday, June 17, 2015 5:01 PM
To: Read, James C
Subject: Re: [Moses-support] Major bug found in Moses

As I said. With an unpruned phrase table and an decoder that just optmizes some unreasonble set of weights all bets are off, so if you get very low BLEU point there, it's not surprising. It's probably jumping around in a very weird search space. With a pruned phrase table you restrict the search space VERY strongly. Nearly everything that will be produced is a half-decent translation. So yes, I can imagine that would happen.

Marcin

W dniu 2015-06-17 15:56, Read, James C napisa?(a):

You would expect an improvement of 37 BLEU points?

James

________________________________
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Sent: Wednesday, June 17, 2015 4:32 PM
To: Read, James C
Cc: Moses-support@mit.edu; Arnold, Doug
Subject: Re: [Moses-support] Major bug found in Moses

Hi James,

there are many more factors involved than just probability, for instance word penalties, phrase penalities etc. To be able to validate your own claim you would need to set weights for all those non-probabilities to zero. Otherwise there is no hope that moses will produce anything similar to the most probable translation. And based on that there is no surprise that there may be different translations. A pruned phrase table will produce naturally less noise, so I would say the behaviour you describe is quite exactly what I would expect to happen.

Best,

Marcin

W dniu 2015-06-17 15:26, Read, James C napisa?(a):

Hi all,

I tried unsuccessfully to publish experiments showing this bug in Moses behaviour. As a result I have lost interest in attempting to have my work published. Nonetheless I think you all should be aware of an anomaly in Moses' behaviour which I have thoroughly exposed and should be easy enough for you to reproduce.

As I understand it the TM logic of Moses should select the most likely translations according to the TM. I would therefore expect a run of Moses with no LM to find sentences which are the most likely or at least close to the most likely according to the TM.

To test this behaviour I performed two runs of Moses. One with an unfiltered phrase table the other with a filtered phrase table which left only the most likely phrase pair for each source language phrase. The results were truly startling. I observed huge differences in BLEU score. The filtered phrase tables produced much higher BLEU scores. The beam size used was the default width of 100. I would not have been surprised in the differences in BLEU scores where minimal but they were quite high.

I have been unable to find a logical explanation for this behaviour other than to conclude that there must be some kind of bug in Moses which causes a TM only run of Moses to perform poorly in finding the most likely translations according to the TM when there are less likely phrase pairs included in the race.

I hope this information will be useful to the Moses community and that the cause of the behaviour can be found and rectified.

James

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu<mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/d14b922f/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 104, Issue 39
**********************************************

Moses-support Digest, Vol 104, Issue 39

0 Response to "Moses-support Digest, Vol 104, Issue 39"

Post a Comment