Moses-support Digest, Vol 104, Issue 32

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Major bug found in Moses (Read, James C)
2. Re: Low coverage in Moses forced decoding (Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Wed, 17 Jun 2015 15:56:24 +0000
From: "Read, James C" <jcread@essex.ac.uk>
Subject: Re: [Moses-support] Major bug found in Moses
To: Rico Sennrich <rico.sennrich@gmx.ch>, "moses-support@mit.edu"
<moses-support@mit.edu>
Cc: "Arnold, Doug" <doug@essex.ac.uk>
Message-ID:
<DB3PR06MB071308583BC9BB9BA462460D85A60@DB3PR06MB0713.eurprd06.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Actually the approximation I expect to be:

p(e|f)=p(f|e)

Why would you expect this to give poor results if the TM is well trained? Surely the results of my filtering experiments provve otherwise.

James

________________________________________
From: moses-support-bounces@mit.edu <moses-support-bounces@mit.edu> on behalf of Rico Sennrich <rico.sennrich@gmx.ch>
Sent: Wednesday, June 17, 2015 5:32 PM
To: moses-support@mit.edu
Subject: Re: [Moses-support] Major bug found in Moses

Read, James C <jcread@...> writes:

> I have been unable to find a logical explanation for this behaviour other
than to conclude that there must be some kind of bug in Moses which causes a
TM only run of Moses to perform poorly in finding the most likely
translations according to the TM when
> there are less likely phrase pairs included in the race.

I may have overlooked something, but you seem to have removed the language
model from your config, and used default weights. your default model will
thus (roughly) implement the following model:

p(e|f) = p(e|f)*p(f|e)

which is obviously wrong, and will give you poor results. This is not a bug
in the code, but a poor choice of models and weights. Standard steps in SMT
(like tuning the model weights on a development set, and including a
language model) will give you the desired results.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support



------------------------------

Message: 2
Date: Wed, 17 Jun 2015 20:07:32 +0400
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Low coverage in Moses forced decoding
To: praveen dakwale <dakwale.praveen@gmail.com>, moses-support
<moses-support@mit.edu>
Message-ID:
<CAEKMkbjAEVXTVQ7q-ysx+E6k7Ynu80oMWSy=wvHE7GXFpzARjQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I didn't check the details of the code, but it does look like it's doing
the right thing.

The unreachable sentences are long and probably non-literal translations.
It's not surprising they would be unreachable.

You can get moses to print near-reachable sentences by adding the soft and
tuneable arguments to the ConstrainedDecoding FF. And then give it a large
weight. This is it looks like in my moses.ini file
[feature]
....
ConstrainedDecoding path=15.ref soft=true tuneable=true

[weight]
....
ConstrainedDecoding0= 1000

For the 1 unreachable sentence, reference is
they would even be watched through screens at the board chairman office
. in addition , a sophisticated wireless telecommunication network would be
created .
The 'near' output is
they would even be watched through screens at the board chairman office
. in addition , a sophisticated wireless telecommunication network .



Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

On 17 June 2015 at 15:35, praveen dakwale <dakwale.praveen@gmail.com> wrote:

> Hi Hieu,
>
> Apologies for late reply. I am sharing the Dropbox link on your this email
> Id. It consists of Phrase table, source and target bitext.
>
> https://www.dropbox.com/sh/chqvondaholtcb2/AAAlJNnneqzxzV8-4hYBSDUpa?dl=0
>
> To give an update of my experiments. I tried decoding few more batches of
> 20000 sentences from my bitext, some of which achieved 75% coverage, while
> some other remain around 45-50%.
>
> My bitext consists of data from AFP, Xinhua and LDC. Though, I can't
> generalize without decoding full bitext, I have observed that batches with
> lower coverage are mostly from AFP.
> It appears to me that low reference translation quality in AFP could be a
> reason for low coverage. Can you give your view on this ?
>
> Also, can you inform me whether the forced decoding functionality in moses
> implements the 'Leaving one out' approach as discussed in (Wuebker et al
> 2010) http://anthology.aclweb.org/P/P10/P10-1049.pdf
>
>
> Thanks for your patience and support
>
> Regards
> Praveen
>
>
> On Tue, Jun 16, 2015 at 8:51 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>
>> ps. don't send me the model files via email, they're too big. Use
>> dropbox or google drive etc
>>
>>
>>
>> On 16/06/2015 19:35, Hieu Hoang wrote:
>>
>> Thanks
>>
>> It looks ok to me, I see nothing wrong with it. The low coverage may be
>> correct, or there's a bug in the forced decoding code.
>>
>> If u want me to debug it, please provide the model files
>> On 16 Jun 2015 19:32, "praveen dakwale" <dakwale.praveen@gmail.com>
>> wrote:
>>
>>> The english.txt file that I sent in previous mail contains the starting
>>> 100 lines of the same (reference) file. I am attaching the complete zipped
>>> file here
>>>
>>> On Tue, Jun 16, 2015 at 5:24 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>>>
>>>> can you please send me the file
>>>>
>>>> /datastore/praveen/forced_decoding_test/decodes/no_contstraint_batch_1/aligned.english.1
>>>>
>>>>
>>>>
>>>> On 16/06/2015 19:20, praveen dakwale wrote:
>>>>
>>>> Hi Hieu,
>>>>
>>>> Thanks for reply. Please find the 3 files attached. Following is call
>>>> for one of the runs. For different runs, I am changing the parameters :
>>>>
>>>> nohup nice $MOSESHOME/bin/moses -config
>>>> /datastore/praveen/forced_decoding_test/experiments/exp4/moses-tuned.ini
>>>> -input-file
>>>> /datastore/praveen/forced_decoding_test/experiments/exp1/aligned.arabic.1
>>>> -s 2000 -b 0.00000000000000000000000000001 -distortion-limit -1 -threads 40
>>>> -n-best-list
>>>> /datastore/praveen/forced_decoding_test/experiments/exp4/forced.english.batch1.nbest
>>>> 100 -print-alignment-info-in-n-best -report-segmentation 1>
>>>> /datastore/praveen/forced_decoding_test/experiments/exp4/1.tuned-filtered.output
>>>> 2> /datastore/praveen/forced_decoding_test/experiments/exp4/1.decode.out &
>>>>
>>>> For this run I got a coverage of ~41 %
>>>>
>>>> Apologies for the size of the files.
>>>>
>>>>
>>>> Thanks and Regards
>>>> Praveen
>>>>
>>>> On Tue, Jun 16, 2015 at 4:55 PM, Hieu Hoang <hieuhoang@gmail.com>
>>>> wrote:
>>>>
>>>>> can you please show me your moses.ini files, the exact command you
>>>>> are using to do force decoding.
>>>>>
>>>>> Also, can you send me few lines of your input file and your reference
>>>>> file.
>>>>>
>>>>>
>>>>> On 16/06/2015 18:48, praveen dakwale wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> I am trying to implement forced decoding procedure of (Weubker et al
>>>>> 2010) using Moses constrained decoding functionality for a Arabic-English
>>>>> bitext of ~1 million sentence. For start, I am using Moses to decode a
>>>>> sample of 20000 sentences from this bitext. Though I have tried to relax
>>>>> all the parameters to as maximum as possible, I haven?t been able to obtain
>>>>> a coverage of more than 50%. I am describing below models and the
>>>>> parameters I have used and their maximum/minimum value :
>>>>>
>>>>> 1. Phrase table built from the same bitext using moses training
>>>>> (grow-diag-final-and algorithm) (~60 million phrase pairs)
>>>>> 2. Distortion penalty, phrase penalty
>>>>> 3. No LM or reordering models. Infinite distortion limit
>>>>> 4. ttable limit : 1000000
>>>>> 5. Stack size (maximum : 3000)
>>>>> 6. Beam size (10^(-10) to 10^(-30) (min))
>>>>> 7. Stack diversity : 0-1
>>>>>
>>>>> Some of them I have used as discussed in this report (
>>>>> http://www-labs.iro.umontreal.ca/~foster/papers/forcedecode-techrep12.pdf
>>>>> )
>>>>>
>>>>>
>>>>> Can anyone explain if I am missing any other parameters set to
>>>>> default in moses that I should relax to increase my coverage ? Or what are
>>>>> the default values of all possible parameters in moses ?
>>>>>
>>>>> Thanks and Regards
>>>>> Praveen
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>> --
>>>>> Hieu Hoang
>>>>> Researcher
>>>>> New York University, Abu Dhabihttp://www.hoang.co.uk/hieu
>>>>>
>>>>>
>>>>
>>>> --
>>>> Hieu Hoang
>>>> Researcher
>>>> New York University, Abu Dhabihttp://www.hoang.co.uk/hieu
>>>>
>>>>
>>>
>> --
>> Hieu Hoang
>> Researcher
>> New York University, Abu Dhabihttp://www.hoang.co.uk/hieu
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/998902a7/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 104, Issue 32
**********************************************

0 Response to "Moses-support Digest, Vol 104, Issue 32"

Post a Comment