Moses-support Digest, Vol 104, Issue 84

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Major bug found in Moses (Read, James C)
2. Re: Major bug found in Moses (Read, James C)
3. Re: Major bug found in Moses (Hasegawa-Johnson, Mark Allan)


----------------------------------------------------------------------

Message: 1
Date: Wed, 24 Jun 2015 15:21:04 +0000
From: "Read, James C" <jcread@essex.ac.uk>
Subject: Re: [Moses-support] Major bug found in Moses
To: Lane Schwartz <dowobeha@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<DB3PR06MB07131E040C6CD9B32648E27285AF0@DB3PR06MB0713.eurprd06.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

May I humbly suggest that we do some market research and see how many institutions/organisations out there dream about an MT system that out of the box performs at 37 BLEU points less that merely substituting each phrase for its most likely translation? I dare say that most users would expect a system to perform *better* than such a blatantly obvious baseline out of the box.


So, please, can we stop trying to play the academic high ground here and just accept that the default behaviour of Moses is much less than desirable?


James


________________________________
From: Lane Schwartz <dowobeha@gmail.com>
Sent: Wednesday, June 24, 2015 5:56 PM
To: Read, James C
Cc: Rico Sennrich; moses-support@mit.edu
Subject: Re: [Moses-support] Major bug found in Moses


On Wed, Jun 24, 2015 at 9:05 AM, Read, James C <jcread@essex.ac.uk<mailto:jcread@essex.ac.uk>> wrote:

As the title of this thread makes clear the purpose of reporting the bug was not to invite a discussion about conclusions made in my draft paper. Clearly a community that builds its career around research in SMT is unlikely to agree with those kinds of conclusions. The purpose was to report the flaw in the default behaviour of Moses in the hope that we could all agree that something ought to be done about it.

So far you seem to be the only one who has come even close to acknowledging that there is a problem with Moses default behaviour.


James,

I wasn't talking about the conclusion in your paper. I was talking about the conclusion in your email:

If the default behaviour produces BLEU scores considerably lower than merely selecting the most likely translation of each phrase then evidently there is something very wrong with the default behaviour.

Your conclusion, quoted above, is seriously flawed.

There is not "something very wrong with the default behavior" of Moses. You have not exposed a bug in Moses.

What you have exposed is your own lack of understanding of modern statistical machine translation, and your unwillingness to listen when others take the time to explain how and why you are mistaken.

I am happy to help explain things to people who are willing to listen. However, you have shown yourself to be not only rude but obstinate and willfully ignorant. I hope that others who find this thread may find it informative. You appear to have learned nothing from it.

Until you become willing to listen to others, and until you take a statistical machine translation class and are willing to pay attention to what you learn there, I don't see any point in taking the time to explain things further. As far as I am concerned, this discussion is over.

Sincerely,
Lane Schwartz


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150624/14683d35/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 24 Jun 2015 15:29:18 +0000
From: "Read, James C" <jcread@essex.ac.uk>
Subject: Re: [Moses-support] Major bug found in Moses
To: "John D. Burger" <john@mitre.org>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<DB3PR06MB07139B9D0EBA4354B87B902485AF0@DB3PR06MB0713.eurprd06.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Please allow me to give a synthesis of my understanding of your response:

a) we understand that out of the box Moses performs notably less well than merely selecting the most likely translation for each phrase
b) we don't see this as a problem because for years we've been applying a different type of fix
c) we have no intention of rectifying the problem or even acknowledging that there is a problem
d) we would rather continue performing this gratuitous step and insisting that our users perform it also

Please explain to me. Why even bother running the training process if you have already decided that the default setup should not be designed to maximise on the probabilities learned during that step?

James

________________________________________
From: John D. Burger <john@mitre.org>
Sent: Wednesday, June 24, 2015 6:03 PM
To: Read, James C
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Major bug found in Moses

> On Jun 24, 2015, at 10:47 , Read, James C <jcread@essex.ac.uk> wrote:
>
> So you still think it's fine that the default would perform at 37 BLEU points less than just selecting the most likely translation of each phrase?

Yes, I'm pretty sure we all think that's fine, because one of the steps of building a system is tuning.

Is this really the essence of your complaint? That the behavior without tuning is not very good?

(Please try to reply without your usual snarkiness.)

- John Burger
MITRE

> You know I think I would have to try really hard to design a system that performed so poorly.
>
> James
>
> ________________________________________
> From: amittai axelrod <amittai@umiacs.umd.edu>
> Sent: Wednesday, June 24, 2015 5:36 PM
> To: Read, James C; Lane Schwartz
> Cc: moses-support@mit.edu; Philipp Koehn
> Subject: Re: [Moses-support] Major bug found in Moses
>
> what *i* would do is tune my systems.
>
> ~amittai
>
> On 6/24/15 09:15, Read, James C wrote:
>> Thank you for such an invitation. Let's see. Given the choice of
>>
>> a) reading through thousands of lines of code trying to figure out why the default behaviour performs considerably worse than merely selecting the most likely translation of each phrase or
>> b) spending much less time implementing a simple system that does just that
>>
>> which one would you do?
>>
>> For all know maybe I've already implemented such a system that does just that and not only that improves considerably on such a basic benchmark. But given that on this list we don't seem to be able to accept that there is a problem with the default behaviour of Moses I can only conclude that nobody would be interested in access to the code of such a system.
>>
>> James
>>
>> ________________________________________
>> From: amittai axelrod <amittai@umiacs.umd.edu>
>> Sent: Friday, June 19, 2015 7:52 PM
>> To: Read, James C; Lane Schwartz
>> Cc: moses-support@mit.edu; Philipp Koehn
>> Subject: Re: [Moses-support] Major bug found in Moses
>>
>> if we don't understand the problem, how can we possibly fix it?
>> all the relevant code is open source. go for it!
>>
>> ~amittai
>>
>> On 6/19/15 12:49, Read, James C wrote:
>>> So, all I did was filter out the less likely phrase pairs and the BLEU
>>> score shot up. Was that such a stroke of genius? Was that not blindingly
>>> obvious?
>>>
>>>
>>> Your telling me that redesigning the search algorithm to prefer higher
>>> scoring phrase pairs is all we need to do to get a best paper at ACL?
>>>
>>>
>>> James
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Lane Schwartz <dowobeha@gmail.com>
>>> *Sent:* Friday, June 19, 2015 7:40 PM
>>> *To:* Read, James C
>>> *Cc:* Philipp Koehn; Burger, John D.; moses-support@mit.edu
>>> *Subject:* Re: [Moses-support] Major bug found in Moses
>>> On Fri, Jun 19, 2015 at 11:28 AM, Read, James C <jcread@essex.ac.uk
>>> <mailto:jcread@essex.ac.uk>> wrote:
>>>
>>> What I take issue with is the en-masse denial that there is a
>>> problem with the system if it behaves in such a way with no LM + no
>>> pruning and/or tuning.
>>>
>>>
>>> There is no mass denial taking place.
>>>
>>> Regardless of whether or not you tune, the decoder will do its best to
>>> find translations with the highest model score. That is the expected
>>> behavior.
>>>
>>> What I have tried to tell you, and what other people have tried to tell
>>> you, is that translations with high model scores are not necessarily
>>> good translations.
>>>
>>> We all want our models to be such that high model scores correspond to
>>> good translations, and that low model scores correspond with bad
>>> translations. But unfortunately, our models do not innately have this
>>> characteristic. We all know this. We also know a good way to deal with
>>> this shortcoming, namely tuning. Tuning is the process by which we
>>> attempt to ensure that high model scores correspond to high quality
>>> translations, and that low model scores correspond to low quality
>>> translations.
>>>
>>> If you can design models that naturally correspond with translation
>>> quality without tuning, that's great. If you can do that, you've got a
>>> great shot at winning a Best Paper award at ACL.
>>>
>>> In the meantime, you may want to consider an apology for your rude
>>> behavior and unprofessional attitude.
>>>
>>> Goodbye.
>>> Lane
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support




------------------------------

Message: 3
Date: Wed, 24 Jun 2015 15:36:01 +0000
From: "Hasegawa-Johnson, Mark Allan" <jhasegaw@illinois.edu>
Subject: Re: [Moses-support] Major bug found in Moses
To: "Read, James C" <jcread@essex.ac.uk>, "John D. Burger"
<john@mitre.org>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<260DF00C2B63AD4EA8046A81805D803703382C13@CITESMBX1.ad.uillinois.edu>
Content-Type: text/plain; charset="us-ascii"

It would be really wonderful if Moses had an out-of-the-box example that ran without further tuning. Would you be willing to create that for us? We would greatly appreciate it.

The open source community exists on a somewhat different model than the commercial software community. In the open-source community, if a feature doesn't exist, and if you believe it should exist, then the correct response is "may I contribute this feature to the codebase, please"?

The fact that no such feature currently exists in Moses means that none of its current users have ever had a need for it. That probably means that all of its current users are machine translation experts, who have no need for an out-of-the-box example that runs without tuning. You are quite correct that it would be nice to expand the user base, so that it includes people who are not machine translation experts, but just want a tool that runs reasonably well out-of-the-box. Since nobody is paid to maintain Moses, however, nobody has ever yet had sufficient incentive to create such an example. If you believe that you have sufficient incentive to create such an example, then please do; we would appreciate it.

Thanks.


-----Original Message-----
From: moses-support-bounces@mit.edu [mailto:moses-support-bounces@mit.edu] On Behalf Of Read, James C
Sent: Wednesday, June 24, 2015 10:29 AM
To: John D. Burger
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Major bug found in Moses

Please allow me to give a synthesis of my understanding of your response:

a) we understand that out of the box Moses performs notably less well than merely selecting the most likely translation for each phrase
b) we don't see this as a problem because for years we've been applying a different type of fix
c) we have no intention of rectifying the problem or even acknowledging that there is a problem
d) we would rather continue performing this gratuitous step and insisting that our users perform it also

Please explain to me. Why even bother running the training process if you have already decided that the default setup should not be designed to maximise on the probabilities learned during that step?

James

________________________________________
From: John D. Burger <john@mitre.org>
Sent: Wednesday, June 24, 2015 6:03 PM
To: Read, James C
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Major bug found in Moses

> On Jun 24, 2015, at 10:47 , Read, James C <jcread@essex.ac.uk> wrote:
>
> So you still think it's fine that the default would perform at 37 BLEU points less than just selecting the most likely translation of each phrase?

Yes, I'm pretty sure we all think that's fine, because one of the steps of building a system is tuning.

Is this really the essence of your complaint? That the behavior without tuning is not very good?

(Please try to reply without your usual snarkiness.)

- John Burger
MITRE

> You know I think I would have to try really hard to design a system that performed so poorly.
>
> James
>
> ________________________________________
> From: amittai axelrod <amittai@umiacs.umd.edu>
> Sent: Wednesday, June 24, 2015 5:36 PM
> To: Read, James C; Lane Schwartz
> Cc: moses-support@mit.edu; Philipp Koehn
> Subject: Re: [Moses-support] Major bug found in Moses
>
> what *i* would do is tune my systems.
>
> ~amittai
>
> On 6/24/15 09:15, Read, James C wrote:
>> Thank you for such an invitation. Let's see. Given the choice of
>>
>> a) reading through thousands of lines of code trying to figure out
>> why the default behaviour performs considerably worse than merely
>> selecting the most likely translation of each phrase or
>> b) spending much less time implementing a simple system that does
>> just that
>>
>> which one would you do?
>>
>> For all know maybe I've already implemented such a system that does just that and not only that improves considerably on such a basic benchmark. But given that on this list we don't seem to be able to accept that there is a problem with the default behaviour of Moses I can only conclude that nobody would be interested in access to the code of such a system.
>>
>> James
>>
>> ________________________________________
>> From: amittai axelrod <amittai@umiacs.umd.edu>
>> Sent: Friday, June 19, 2015 7:52 PM
>> To: Read, James C; Lane Schwartz
>> Cc: moses-support@mit.edu; Philipp Koehn
>> Subject: Re: [Moses-support] Major bug found in Moses
>>
>> if we don't understand the problem, how can we possibly fix it?
>> all the relevant code is open source. go for it!
>>
>> ~amittai
>>
>> On 6/19/15 12:49, Read, James C wrote:
>>> So, all I did was filter out the less likely phrase pairs and the
>>> BLEU score shot up. Was that such a stroke of genius? Was that not
>>> blindingly obvious?
>>>
>>>
>>> Your telling me that redesigning the search algorithm to prefer
>>> higher scoring phrase pairs is all we need to do to get a best paper at ACL?
>>>
>>>
>>> James
>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> ----
>>> *From:* Lane Schwartz <dowobeha@gmail.com>
>>> *Sent:* Friday, June 19, 2015 7:40 PM
>>> *To:* Read, James C
>>> *Cc:* Philipp Koehn; Burger, John D.; moses-support@mit.edu
>>> *Subject:* Re: [Moses-support] Major bug found in Moses On Fri, Jun
>>> 19, 2015 at 11:28 AM, Read, James C <jcread@essex.ac.uk
>>> <mailto:jcread@essex.ac.uk>> wrote:
>>>
>>> What I take issue with is the en-masse denial that there is a
>>> problem with the system if it behaves in such a way with no LM + no
>>> pruning and/or tuning.
>>>
>>>
>>> There is no mass denial taking place.
>>>
>>> Regardless of whether or not you tune, the decoder will do its best
>>> to find translations with the highest model score. That is the
>>> expected behavior.
>>>
>>> What I have tried to tell you, and what other people have tried to
>>> tell you, is that translations with high model scores are not
>>> necessarily good translations.
>>>
>>> We all want our models to be such that high model scores correspond
>>> to good translations, and that low model scores correspond with bad
>>> translations. But unfortunately, our models do not innately have
>>> this characteristic. We all know this. We also know a good way to
>>> deal with this shortcoming, namely tuning. Tuning is the process by
>>> which we attempt to ensure that high model scores correspond to high
>>> quality translations, and that low model scores correspond to low
>>> quality translations.
>>>
>>> If you can design models that naturally correspond with translation
>>> quality without tuning, that's great. If you can do that, you've got
>>> a great shot at winning a Best Paper award at ACL.
>>>
>>> In the meantime, you may want to consider an apology for your rude
>>> behavior and unprofessional attitude.
>>>
>>> Goodbye.
>>> Lane
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 104, Issue 84
**********************************************

0 Response to "Moses-support Digest, Vol 104, Issue 84"

Post a Comment