Moses-support Digest, Vol 100, Issue 89

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Number of Unique Hypotheses in the N-best List (Erin? Dikici)
2. Re: Number of Unique Hypotheses in the N-best List (Erin? Dikici)
3. Documentation describing Moses n-best list extraction
(Lane Schwartz)

----------------------------------------------------------------------

Message: 1
Date: Wed, 25 Feb 2015 13:19:32 +0200
From: Erin? Dikici <erinc.dikici@boun.edu.tr>
Subject: Re: [Moses-support] Number of Unique Hypotheses in the N-best
List
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAJ=2YW2jW+bVBvbhgNFyDH5Ue0ZOGzPx07FQCMUi3MV+FExr3Q@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hello again,

On Tue, Feb 24, 2015 at 10:18 PM, Rico Sennrich wrote:

> did you actually cut away the scores? It's possible that you have duplicates
> with different scores, so they will show up as different lines with 'sort |
> uniq', but will be merged if you do 'cut -d'|' -f4 | sort | uniq' as
> Matthias suggested.
>
> Yes, the numbers I reported was on pure text; there were no scores. I used
"awk" to cut the scores instead of "cut", which basically produces the same
result.

On Tue, Feb 24, 2015 at 10:03 PM, Matthias Huck <mhuck@inf.ed.ac.uk> wrote:

Also note that n-best-factor takes effect only if distinct is active.
>

I tried that before reading your reply, and I confirm this on Moses v3.0

Please try to investigate what's going on (if you have the time).
>

On Tue, Feb 24, 2015 at 10:38 PM, Hieu Hoang wrote:
the decoding may have changed but the decoding algorithms should be exactly
the same. The scores should be exactly the same (apart from rounding
differences and OOV words, which shouldn't affect the search at all). If
you have any evidence that you're getting different output, please let me
know. It would be good if you can provide that model files so I can
replicate the result

On Tue, Feb 24, 2015 at 10:38 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:

>
> On 24/02/15 19:08, Erin? Dikici wrote:
>
> (Apparently the Gmane web interface turned my reply into garbled text,
> sorry for the double posting)
>
> Thanks again for your quick answers.
>
> Yes, 32 and 2 are the counts after "sort | uniq | wc -l". The total number
> of hypotheses returned for both cases was 50.
>
> I removed the "distinct"s from (my local copy of)
> scripts/training/mert-moses.pl (lines 1261 and 1263), and that solved the
> problem! Now I can get 32 unique hypotheses with v3.0, too.
>
> In fact, I am pretty sure I was able to get 50 unique hypotheses (out of a
> 50-best list) with the same configuration back in version 0.x. I hope the
> new -n-best-factor will do the trick.
>
> the decoding may have changed but the decoding algorithms should be
> exactly the same. The scores should be exactly the same (apart from
> rounding differences and OOV words, which shouldn't affect the search at
> all). If you have any evidence that you're getting different output, please
> let me know. It would be good if you can provide that model files so I can
> replicate the result
>
>
> Best,
>
> ED
>
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> --
> Hieu Hoang
> Research Associate (until March 2015)
> ** searching for interesting commercial MT position **
> University of Edinburghhttp://www.hoang.co.uk/hieu
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150225/3d0fcf33/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 25 Feb 2015 15:03:34 +0200
From: Erin? Dikici <erinc.dikici@boun.edu.tr>
Subject: Re: [Moses-support] Number of Unique Hypotheses in the N-best
List
To: undisclosed-recipients:;
Cc: "moses-support@mit edu" <moses-support@mit.edu>
Message-ID:
<CAJ=2YW1SuRabJ1VF5nTxK+xzmBBhH-B2jeKxXqgXHBCgv+jJ8Q@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

On Tue, Feb 24, 2015 at 10:03 PM, Matthias Huck wrote:

Please try to investigate what's going on (if you have the time).

So far, I have been able to obtain a list of 50 unique hypotheses using
either of these two methods in v3.0:

1. Manually adding the "distinct" option to the -n-best-list parameter
when calling moses.

Note that my version of mert-moses.pl does not contain the "distinct"
keywords. I could not understand why keeping them in the first place did
not produce a unique n-best list, though.

2. Manually changing the PhrasePenalty parameter to exp(1) (=2.718)
Comparing the test.filtered.ini.1 file and the phrase table to those of the
same experiment I had done back in version 0.x, I noticed that the phrase
penalty value has been removed from the phrase table and included in the
ini file as a standard feature function. For my example, this value was
computed to be -0.999959. I changed this value to 2.718 and rerun the moses
command (without even using the "distinct" option), which produced 50
unique hypotheses.

I must also add that the n-best lists generated by these two methods are
not exactly the same. For my application, I find the hypotheses output by
method2 more useful.

Machine translation is not my area of specialization, so I do not know
whether setting the phrase penalty to a fixed value is a bad practice. But
at least, it works for me. Is there a way to set this value in the
configuration file so that I do not have to change the ini file each time I
run the experiment?

Thanks,

ED
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150225/b18a6cfb/attachment-0001.htm

------------------------------

Message: 3
Date: Wed, 25 Feb 2015 07:08:34 -0600
From: Lane Schwartz <dowobeha@gmail.com>
Subject: [Moses-support] Documentation describing Moses n-best list
extraction
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CABv3vZkyeDJB0fp3=9vJBE3ThUqLm1i=qYq5uwVbEicK79stiA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Is there a particular paper that describes the current technique(s)
used for n-best list extraction within Moses?

Thanks,
Lane

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 100, Issue 89
**********************************************

Moses-support Digest, Vol 100, Issue 89

0 Response to "Moses-support Digest, Vol 100, Issue 89"

Post a Comment