Moses-support Digest, Vol 100, Issue 85

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Number of Unique Hypotheses in the N-best List (Erin? Dikici)
2. Re: Number of Unique Hypotheses in the N-best List (Erin? Dikici)
3. Re: Number of Unique Hypotheses in the N-best List (Matthias Huck)
4. Re: Number of Unique Hypotheses in the N-best List (Rico Sennrich)
5. Re: Number of Unique Hypotheses in the N-best List (Hieu Hoang)
6. Single score in phrase table (Marcin Junczys-Dowmunt)


----------------------------------------------------------------------

Message: 1
Date: Tue, 24 Feb 2015 18:15:48 +0000 (UTC)
From: Erin? Dikici <erinc.dikici@boun.edu.tr>
Subject: Re: [Moses-support] Number of Unique Hypotheses in the N-best
List
To: moses-support@mit.edu
Message-ID: <loom.20150224T190056-860@post.gmane.org>
Content-Type: text/plain; charset=utf-8

Matthias Huck <mhuck@...> writes:

>
> Hi Erin?,
>
> On Tue, 2015-02-24 at 16:24 +0000, Matthias Huck wrote:
> > I'd assume that your 32 entries of the n-best list weren't actually
> > unique, though, but a number of duplicates of the (two) very same
> > outputs, as "distinct" should simply avoid duplicate entries.
>
> Actually, could you please check for us whether I'm right with this
> assumption? If I'm not, then some other modification since version 2.1
> might affect your experiment. I hope that's not the case.
>
> Run something like
> cut -d'|' -f4 | sort | uniq | wc -l
> on the n-best list with 32 entries. It should print 2.
>
> Or did you do this already? (You're mentioning "unique hypotheses" in
> your mail.)
>
> Ck?????(??5?????(??((()Q??????????????????????????()e???????????????????????????????????????????Q???????????)??????????????????????????????????()$????????????????????????????????????)??????????????????????????????????????????????????????)????????9??$??????????????????????????????????()%???????$??????????????$????????????????????????????????????(????????????????????????????????????????????????$???????)?????????????????????????????() ???()



------------------------------

Message: 2
Date: Tue, 24 Feb 2015 21:08:46 +0200
From: Erin? Dikici <erinc.dikici@boun.edu.tr>
Subject: Re: [Moses-support] Number of Unique Hypotheses in the N-best
List
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAJ=2YW3ZCV1UPXZLb2h2=stm5XLpBa3sspGhAcuNudXoP=7=KQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

(Apparently the Gmane web interface turned my reply into garbled text,
sorry for the double posting)

Thanks again for your quick answers.

Yes, 32 and 2 are the counts after "sort | uniq | wc -l". The total number
of hypotheses returned for both cases was 50.

I removed the "distinct"s from (my local copy of)
scripts/training/mert-moses.pl (lines 1261 and 1263), and that solved the
problem! Now I can get 32 unique hypotheses with v3.0, too.

In fact, I am pretty sure I was able to get 50 unique hypotheses (out of a
50-best list) with the same configuration back in version 0.x. I hope the
new -n-best-factor will do the trick.

Best,

ED
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150224/1713b14e/attachment-0001.htm

------------------------------

Message: 3
Date: Tue, 24 Feb 2015 20:03:54 +0000
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] Number of Unique Hypotheses in the N-best
List
To: Erin? Dikici <erinc.dikici@boun.edu.tr>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <1424808234.2192.535.camel@portedgar>
Content-Type: text/plain; charset="UTF-8"

Hi,

That's really not at all what is supposed to happen. You should get only
unique entries in the n-best list with the "distinct" parameter. (Maybe
less than 50 if n-best-factor is set to a low value, but there shouldn't
be any duplicates.)

I cannot find any reason why the "distinct" parameter wouldn't do what
it's supposed to do. But maybe I'm missing something. The relevant
method should be Manager::CalcNBest() (in moses/Manager.cpp). As far as
I can tell, there have been no recent modifications to it in Moses
master.

Please try to investigate what's going on (if you have the time).

Also note that n-best-factor takes effect only if distinct is active.
There's no point in setting it if distinct is inactive or
malfunctioning. It would potentially help you to fill up your n-best
list if you got less than n (=50) entries with the distinct parameter.

Cheers,
Matthias


On Tue, 2015-02-24 at 21:08 +0200, Erin? Dikici wrote:
> (Apparently the Gmane web interface turned my reply into garbled text,
> sorry for the double posting)
>
> Thanks again for your quick answers.
>
> Yes, 32 and 2 are the counts after "sort | uniq | wc -l". The total
> number
> of hypotheses returned for both cases was 50.
>
> I removed the "distinct"s from (my local copy of)
> scripts/training/mert-moses.pl (lines 1261 and 1263), and that solved
> the
> problem! Now I can get 32 unique hypotheses with v3.0, too.
>
> In fact, I am pretty sure I was able to get 50 unique hypotheses (out
> of a
> 50-best list) with the same configuration back in version 0.x. I hope
> the
> new -n-best-factor will do the trick.
>
> Best,
>
> ED
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 4
Date: Tue, 24 Feb 2015 20:17:56 +0000 (UTC)
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] Number of Unique Hypotheses in the N-best
List
To: moses-support@mit.edu
Message-ID: <loom.20150224T211619-324@post.gmane.org>
Content-Type: text/plain; charset=utf-8

Erin? Dikici <erinc.dikici@...> writes:

>
> Thanks again for your quick answers.Yes, 32 and 2 are the counts after
"sort | uniq | wc -l".
>

did you actually cut away the scores? It's possible that you have duplicates
with different scores, so they will show up as different lines with 'sort |
uniq', but will be merged if you do 'cut -d'|' -f4 | sort | uniq' as
Matthias suggested.

best wishes,
Rico



------------------------------

Message: 5
Date: Tue, 24 Feb 2015 20:38:12 +0000
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Number of Unique Hypotheses in the N-best
List
To: Erin? Dikici <erinc.dikici@boun.edu.tr>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID: <54ECE134.9020602@gmail.com>
Content-Type: text/plain; charset="windows-1252"


On 24/02/15 19:08, Erin? Dikici wrote:
> (Apparently the Gmane web interface turned my reply into garbled text,
> sorry for the double posting)
>
> Thanks again for your quick answers.
>
> Yes, 32 and 2 are the counts after "sort | uniq | wc -l". The total number
> of hypotheses returned for both cases was 50.
>
> I removed the "distinct"s from (my local copy of)
> scripts/training/mert-moses.pl <http://mert-moses.pl> (lines 1261 and
> 1263), and that solved the
> problem! Now I can get 32 unique hypotheses with v3.0, too.
>
> In fact, I am pretty sure I was able to get 50 unique hypotheses (out of a
> 50-best list) with the same configuration back in version 0.x. I hope the
> new -n-best-factor will do the trick.
the decoding may have changed but the decoding algorithms should be
exactly the same. The scores should be exactly the same (apart from
rounding differences and OOV words, which shouldn't affect the search at
all). If you have any evidence that you're getting different output,
please let me know. It would be good if you can provide that model files
so I can replicate the result
>
> Best,
>
> ED
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

--
Hieu Hoang
Research Associate (until March 2015)
** searching for interesting commercial MT position **
University of Edinburgh
http://www.hoang.co.uk/hieu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150224/3ce42ab6/attachment-0001.htm

------------------------------

Message: 6
Date: Tue, 24 Feb 2015 23:49:13 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: [Moses-support] Single score in phrase table
To: moses-support <moses-support@mit.edu>
Message-ID: <54ECFFE9.7080306@amu.edu.pl>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi,
I have a problem with a single score phrase table. All scores have been
combined into one score as a linear combination of scores and weights.
However, for both, my compact phrase table the the in memory phrase
table, all input result in UNK for all input tokens. The phrases are
correctly found and returned by both phrase tables (including future
score calculation), so this happens somewhere later. Any ideas?

Best,
Marcin


------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 100, Issue 85
**********************************************

0 Response to "Moses-support Digest, Vol 100, Issue 85"

Post a Comment