Moses-support Digest, Vol 104, Issue 86

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Use high-quality corpus for training or turning? (Dingyuan Wang)
2. Re: Major bug found in Moses (John D. Burger)
3. Re: Use high-quality corpus for training or turning?
(Philipp Koehn)
4. Re: Major bug found in Moses (Matthias Huck)

----------------------------------------------------------------------

Message: 1
Date: Thu, 25 Jun 2015 00:52:51 +0800
From: Dingyuan Wang <abcdoyle888@gmail.com>
Subject: [Moses-support] Use high-quality corpus for training or
turning?
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAFt8H75NK_4+4jpkmDKg-McoNw0mxSkQUcBgFBJahnOXv1PEWw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear all,

I have collected a lot of parallel texts. A large number of them are from
web pages and aligned by rules and algorithms, some of which lacks many
sentences on one side (5:1), so the auto alignment contains lots of errors.
Some of them are well aligned per paragraph. A few of them are mostly
single pieces of articles which are aligned by hand or already aligned.
Since the amount of data is not so great (less than a hundred MB), I must
use it efficiently.
At all cases I would manually check the test set line by line.
Should I prefer the high-quality data for turning, and why?
(I am actually seeking a explanation to convince myself to do so.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150625/5448ed99/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 24 Jun 2015 14:03:34 -0400
From: "John D. Burger" <john@mitre.org>
Subject: Re: [Moses-support] Major bug found in Moses
To: "Read, James C" <jcread@essex.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <7AA5C599-3B53-40A9-B3D6-38DACAFB0094@mitre.org>
Content-Type: text/plain; charset=iso-8859-1

> On Jun 24, 2015, at 11:29 , Read, James C <jcread@essex.ac.uk> wrote:

>
> Please allow me to give a synthesis of my understanding of your response:
>
> a) we understand that out of the box Moses performs notably less well than merely selecting the most likely translation for each phrase

"Out of the box" Moses produces no translations at all - that is, before I run text cleaning, word alignment, phrase extraction, etc.

> b) we don't see this as a problem because for years we've been applying a different type of fix

Tuning is not a "fix" - it is an integral part of the entire training process.

> c) we have no intention of rectifying the problem or even acknowledging that there is a problem

I guess from your point of view this is true.

> d) we would rather continue performing this gratuitous step and insisting that our users perform it also

In what sense is it gratuitous? Gratuitous means unnecessary and unwarranted. It vastly improves the results - seems warranted to me. If you think you have an alternative to tuning THAT PERFORMS BETTER THAN TUNING, the community would welcome it.

> Please explain to me. Why even bother running the training process if you have already decided that the default setup should not be designed to maximise on the probabilities learned during that step?

I guess I don't know what you mean by "default setup". As I said, the default setup produces no translations at all, a BLEU of 0. Should we fix this somehow as well? What do you suggest the system should do if the user skips various other steps in the training process?

Perhaps by "default setup" you mean "the normal training procedure". For me this includes tuning, as I think it does for everyone else on the list. It also includes language modeling, by the way.

- John Burger
MITRE

> James
>
> ________________________________________
> From: John D. Burger <john@mitre.org>
> Sent: Wednesday, June 24, 2015 6:03 PM
> To: Read, James C
> Cc: moses-support@mit.edu
> Subject: Re: [Moses-support] Major bug found in Moses
>
>> On Jun 24, 2015, at 10:47 , Read, James C <jcread@essex.ac.uk> wrote:
>>
>> So you still think it's fine that the default would perform at 37 BLEU points less than just selecting the most likely translation of each phrase?
>
> Yes, I'm pretty sure we all think that's fine, because one of the steps of building a system is tuning.
>
> Is this really the essence of your complaint? That the behavior without tuning is not very good?
>
> (Please try to reply without your usual snarkiness.)
>
> - John Burger
> MITRE
>
>> You know I think I would have to try really hard to design a system that performed so poorly.
>>
>> James
>>
>> ________________________________________
>> From: amittai axelrod <amittai@umiacs.umd.edu>
>> Sent: Wednesday, June 24, 2015 5:36 PM
>> To: Read, James C; Lane Schwartz
>> Cc: moses-support@mit.edu; Philipp Koehn
>> Subject: Re: [Moses-support] Major bug found in Moses
>>
>> what *i* would do is tune my systems.
>>
>> ~amittai
>>
>> On 6/24/15 09:15, Read, James C wrote:
>>> Thank you for such an invitation. Let's see. Given the choice of
>>>
>>> a) reading through thousands of lines of code trying to figure out why the default behaviour performs considerably worse than merely selecting the most likely translation of each phrase or
>>> b) spending much less time implementing a simple system that does just that
>>>
>>> which one would you do?
>>>
>>> For all know maybe I've already implemented such a system that does just that and not only that improves considerably on such a basic benchmark. But given that on this list we don't seem to be able to accept that there is a problem with the default behaviour of Moses I can only conclude that nobody would be interested in access to the code of such a system.
>>>
>>> James
>>>
>>> ________________________________________
>>> From: amittai axelrod <amittai@umiacs.umd.edu>
>>> Sent: Friday, June 19, 2015 7:52 PM
>>> To: Read, James C; Lane Schwartz
>>> Cc: moses-support@mit.edu; Philipp Koehn
>>> Subject: Re: [Moses-support] Major bug found in Moses
>>>
>>> if we don't understand the problem, how can we possibly fix it?
>>> all the relevant code is open source. go for it!
>>>
>>> ~amittai
>>>
>>> On 6/19/15 12:49, Read, James C wrote:
>>>> So, all I did was filter out the less likely phrase pairs and the BLEU
>>>> score shot up. Was that such a stroke of genius? Was that not blindingly
>>>> obvious?
>>>>
>>>>
>>>> Your telling me that redesigning the search algorithm to prefer higher
>>>> scoring phrase pairs is all we need to do to get a best paper at ACL?
>>>>
>>>>
>>>> James
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:* Lane Schwartz <dowobeha@gmail.com>
>>>> *Sent:* Friday, June 19, 2015 7:40 PM
>>>> *To:* Read, James C
>>>> *Cc:* Philipp Koehn; Burger, John D.; moses-support@mit.edu
>>>> *Subject:* Re: [Moses-support] Major bug found in Moses
>>>> On Fri, Jun 19, 2015 at 11:28 AM, Read, James C <jcread@essex.ac.uk
>>>> <mailto:jcread@essex.ac.uk>> wrote:
>>>>
>>>> What I take issue with is the en-masse denial that there is a
>>>> problem with the system if it behaves in such a way with no LM + no
>>>> pruning and/or tuning.
>>>>
>>>>
>>>> There is no mass denial taking place.
>>>>
>>>> Regardless of whether or not you tune, the decoder will do its best to
>>>> find translations with the highest model score. That is the expected
>>>> behavior.
>>>>
>>>> What I have tried to tell you, and what other people have tried to tell
>>>> you, is that translations with high model scores are not necessarily
>>>> good translations.
>>>>
>>>> We all want our models to be such that high model scores correspond to
>>>> good translations, and that low model scores correspond with bad
>>>> translations. But unfortunately, our models do not innately have this
>>>> characteristic. We all know this. We also know a good way to deal with
>>>> this shortcoming, namely tuning. Tuning is the process by which we
>>>> attempt to ensure that high model scores correspond to high quality
>>>> translations, and that low model scores correspond to low quality
>>>> translations.
>>>>
>>>> If you can design models that naturally correspond with translation
>>>> quality without tuning, that's great. If you can do that, you've got a
>>>> great shot at winning a Best Paper award at ACL.
>>>>
>>>> In the meantime, you may want to consider an apology for your rude
>>>> behavior and unprofessional attitude.
>>>>
>>>> Goodbye.
>>>> Lane
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>

------------------------------

Message: 3
Date: Wed, 24 Jun 2015 16:11:02 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] Use high-quality corpus for training or
turning?
To: Dingyuan Wang <abcdoyle888@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAAFADDBCDoofBnSj4NJDdrc68KKk70vL4_u1uhKEv67dvib6tQ@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi,

it is beneficial if the tuning set
- is representative of what you want to translate
- is a relatively literal translation, so the MT system has a chance
to match the reference

-phi

On Wed, Jun 24, 2015 at 12:52 PM, Dingyuan Wang <abcdoyle888@gmail.com> wrote:
> Dear all,
>
> I have collected a lot of parallel texts. A large number of them are from
> web pages and aligned by rules and algorithms, some of which lacks many
> sentences on one side (5:1), so the auto alignment contains lots of errors.
> Some of them are well aligned per paragraph. A few of them are mostly single
> pieces of articles which are aligned by hand or already aligned.
> Since the amount of data is not so great (less than a hundred MB), I must
> use it efficiently.
> At all cases I would manually check the test set line by line.
> Should I prefer the high-quality data for turning, and why?
> (I am actually seeking a explanation to convince myself to do so.)
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

------------------------------

Message: 4
Date: Wed, 24 Jun 2015 22:40:47 +0100
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] Major bug found in Moses
To: "Read, James C" <jcread@essex.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>, "Arnold, Doug"
<doug@essex.ac.uk>
Message-ID: <1435182047.2342.29.camel@inf.ed.ac.uk>
Content-Type: text/plain; charset="UTF-8"

Hi James,

Irrespective of the fact that you need to tune the weights of the
log-linear model:

Let me provide more references in order to shed light on how well
established simple pruning techniques are in our field as well as in
related fields (namely, automatic speech recognition).

This list of references might not be what you are looking for, but maybe
other readers can benefit.

V. Steinbiss, B. Tran, H. Ney. Improvements in beam search. In Proc.
of the Int. Conf. on Spoken Language Processing (ICSLP?94), pages
2143-2146, Yokohama, Japan, Sept. 1994.
http://www.steinbiss.de/vst94d.pdf

R. Zens, F. J. Och, and H. Ney. Phrase-Based Statistical Machine
Translation. In German Conf. on Artificial Intelligence (KI), pages
18-32, Aachen, Germany, Sept. 2002.
https://www-i6.informatik.rwth-aachen.de/publications/download/434/Zens-KI-2002.pdf

Philipp Koehn. Pharaoh: a beam search decoder for phrase-based
statistical machine translation models. In Proc. of the AMTA, pages
115-124, Washington, DC, USA, Sept./Oct. 2004.
http://homepages.inf.ed.ac.uk/pkoehn/publications/pharaoh-amta2004.pdf

Robert C. Moore and Chris Quirk. Faster Beam-Search Decoding for Phrasal
Statistical Machine Translation. In Proc. of MT Summit XI, European
Association for Machine Translation, Sept. 2007.
http://research.microsoft.com/pubs/68097/mtsummit2007_beamsearch.pdf

Richard Zens and Hermann Ney. Improvements in Dynamic Programming Beam
Search for Phrase-based Statistical Machine Translation. In Proc. of the
International Workshop on Spoken Language Translation (IWSLT), Honolulu,
HI, USA, Oct. 2008.
http://www.mt-archive.info/05/IWSLT-2008-Zens.pdf

Cheers,
Matthias

On Wed, 2015-06-24 at 13:11 +0000, Read, James C wrote:
> Thank you for reading very careful the draft paper I provided a link
> to and noticing that the Johnson paper is duly cited there. Given that
> you had already noticed this I shall not proceed to explain the
> blinding obvious differences between my very simple filter and their
> filter based on Fisher's exact test.
>
> Other than that it seems painfully clear that the point I meant to
> make has not been understood entirely. If the default behaviour
> produces BLEU scores considerably lower than merely selecting the most
> likely translation of each phrase then evidently there is something
> very wrong with the default behaviour. If we cannot agree on something
> as obvious as that then I really can't see this discussion making any
> productive progress.
>
> James
>
> ________________________________________
> From: moses-support-bounces@mit.edu <moses-support-bounces@mit.edu> on behalf of Rico Sennrich <rico.sennrich@gmx.ch>
> Sent: Friday, June 19, 2015 8:25 PM
> To: moses-support@mit.edu
> Subject: Re: [Moses-support] Major bug found in Moses
>
> [sorry for the garbled message before]
>
> you are right. The idea is pretty obvious. It roughly corresponds to
> 'Histogram pruning' in this paper:
>
> Zens, R., Stanton, D., Xu, P. (2012). A Systematic Comparison of Phrase
> Table Pruning Technique. In Proceedings of the 2012 Joint Conference on
> Empirical Methods in Natural Language Processing and Computational
> Natural Language Learning (EMNLP-CoNLL), pp. 972-983.
>
> The idea has been described in the literature before that (for instance,
> Johnson et al. (2007) only use the top 30 phrase pairs per source
> phrase), and may have been used in practice for even longer. If you read
> the paper above, you will find that histogram pruning does not improve
> translation quality on a state-of-the-art SMT system, and performs
> poorly compared to more advanced pruning techniques.
>
> On 19.06.2015 17:49, Read, James C. wrote:
> > So, all I did was filter out the less likely phrase pairs and the BLEU score shot up. Was that such a stroke of genius? Was that not blindingly obvious?
> >
> >
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 104, Issue 86
**********************************************

Moses-support Digest, Vol 104, Issue 86

0 Response to "Moses-support Digest, Vol 104, Issue 86"

Post a Comment