Moses-support Digest, Vol 115, Issue 29

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. MT Test set size (Michaeel Kazi)
2. Re: MT Test set size (Ondrej Bojar)
3. Fwd: Moses-support post from douibameur@gmail.com requires
approval (Hieu Hoang)
4. Re: BTEC Corpus (Cyrine NASRI)


----------------------------------------------------------------------

Message: 1
Date: Wed, 25 May 2016 18:12:45 +0000 (UTC)
From: Michaeel Kazi <michaeel.kazi@ll.mit.edu>
Subject: [Moses-support] MT Test set size
To: moses-support@mit.edu
Message-ID: <loom.20160525T200633-671@post.gmane.org>
Content-Type: text/plain; charset=us-ascii

Hi all,

Why are MT test sets the sizes they are? Most are between 1200 and 3000
sentences, usually with one reference, but occasionally some have 4
references. How are these sizes justified? I am sure they are not arbitrary,
but I did not find an answer in most conference proceedings. What is the
goal? (For example, maybe the goal is that a difference of 0.1 BLEU is
statistically significant at 95% CI...)

What about multiple references? Is it better to have a test set with 1200
sentences and 4 references, or a test set with 4800 sentences and 1
reference? Any intuition?

Thanks, everyone. I have been curious about this for a while, and am sure
there is much insight to be gained from the people on this forum!

Kazi




------------------------------

Message: 2
Date: Wed, 25 May 2016 20:56:30 +0200 (CEST)
From: Ondrej Bojar <bojar@ufal.mff.cuni.cz>
Subject: Re: [Moses-support] MT Test set size
To: Michaeel Kazi <michaeel.kazi@ll.mit.edu>
Cc: moses-support@mit.edu
Message-ID:
<1845717294.19380.1464202590892.JavaMail.zimbra@ufal.mff.cuni.cz>
Content-Type: text/plain; charset=utf-8

Hi, Kazi,

the main reason is funding, I guess. We've picked some sizes at the beginning and we were quite happy with the results, so we sticked to that when planning budgets for follow-up projects.

A graph that somewhat answers your second question (for English->Czech) is in Findings of WMT13, Section 5.

Cheers, Ondrej.

----- Original Message -----
> From: "Michaeel Kazi" <michaeel.kazi@ll.mit.edu>
> To: moses-support@mit.edu
> Sent: Wednesday, 25 May, 2016 20:12:45
> Subject: [Moses-support] MT Test set size

> Hi all,
>
> Why are MT test sets the sizes they are? Most are between 1200 and 3000
> sentences, usually with one reference, but occasionally some have 4
> references. How are these sizes justified? I am sure they are not arbitrary,
> but I did not find an answer in most conference proceedings. What is the
> goal? (For example, maybe the goal is that a difference of 0.1 BLEU is
> statistically significant at 95% CI...)
>
> What about multiple references? Is it better to have a test set with 1200
> sentences and 4 references, or a test set with 4800 sentences and 1
> reference? Any intuition?
>
> Thanks, everyone. I have been curious about this for a while, and am sure
> there is much insight to be gained from the people on this forum!
>
> Kazi
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

--
Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz)
http://www.cuni.cz/~obo


------------------------------

Message: 3
Date: Wed, 25 May 2016 23:54:55 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: [Moses-support] Fwd: Moses-support post from
douibameur@gmail.com requires approval
To: douibameur@gmail.com, moses-support <moses-support@mit.edu>
Message-ID: <86029a90-6057-9472-3c33-61f3b6d39bad@gmail.com>
Content-Type: text/plain; charset="windows-1252"

Hi Ameur

please subscribe to the Moses mailing list before posting to it. You can
subscribe here:

http://mailman.mit.edu/mailman/listinfo/moses-support

To answer your question - there is a flag in the decoder that does what
you want

-print-alignment-info-in-n-best

-------- Forwarded Message --------
Subject: Moses-support post from douibameur@gmail.com requires approval
Date: Wed, 25 May 2016 05:35:16 -0400
From: moses-support-owner@mit.edu
To: moses-support-owner@mit.edu



As list administrator, your authorization is requested for the
following mailing list posting:

List: Moses-support@mit.edu
From: douibameur@gmail.com
Subject: how to get the alignment in MOSES ?
Reason: Post by non-member to a members-only list

At your convenience, visit:

http://mailman.mit.edu/mailman/admindb/moses-support

to approve or deny the request.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160525/a260d216/attachment-0001.html
-------------- next part --------------
An embedded message was scrubbed...
From: ameur douib <douibameur@gmail.com>
Subject: how to get the alignment in MOSES ?
Date: Wed, 25 May 2016 11:35:13 +0200
Size: 5129
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160525/a260d216/attachment-0002.eml
-------------- next part --------------
An embedded message was scrubbed...
From: moses-support-request@mit.edu
Subject: confirm 629366f6412ababcf76398051e435c9cdd293e6b
Date: no date
Size: 631
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160525/a260d216/attachment-0003.eml

------------------------------

Message: 4
Date: Thu, 26 May 2016 16:28:02 +0200
From: Cyrine NASRI <cyrine.nasri@univ-lorraine.fr>
Subject: Re: [Moses-support] BTEC Corpus
To: "Burger, John D." <john@mitre.org>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAPg_V0hXiybUDJKNTjCrYzRq4GEzoG7ocj4ww+kyqce5cw09RA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Thank you John for your reply, but I see that the link doesn't work.

Bests

2016-05-22 18:41 GMT+02:00 Burger, John D. <john@mitre.org>:

> Not sure if this is what you want:
>
> http://iwslt2010.fbk.eu/node/58
>
> - John Burger
> MITRE
>
> > On May 22, 2016, at 05:04, Cyrine NASRI <cyrine.nasri@univ-lorraine.fr>
> wrote:
> >
> > Hello,
> >
> > I am looking for a BTEC corpus, but cannot find it on IWSLT website.
> >
> > Can you help me to find it
> >
> > Thank you
> >
> > Bests
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160526/7f117224/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 115, Issue 29
**********************************************

0 Response to "Moses-support Digest, Vol 115, Issue 29"

Post a Comment