Moses-support Digest, Vol 96, Issue 4

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Tune set size (Roee Aharoni)
2. Re: Tune set size (Tom Hoar)


----------------------------------------------------------------------

Message: 1
Date: Sun, 05 Oct 2014 02:22:42 -0700 (PDT)
From: "Roee Aharoni" <roee.aharoni@gmail.com>
Subject: [Moses-support] Tune set size
To: moses-support@mit.edu
Message-ID: <1412500962365.c951589e@Nodemailer>
Content-Type: text/plain; charset="utf-8"

Hi,In a recent post it was mentioned that "600k line tuning set is way too big. It will?take forever. It's better to reduce it to 2-3k lines."
Is there a reference to an empirical experiment searching for an "optimal" MERT tune set size?


Thanks,

?
Sent from Mailbox
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141005/b67005ca/attachment-0001.htm

------------------------------

Message: 2
Date: Sun, 05 Oct 2014 17:07:07 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] Tune set size
To: moses-support@mit.edu
Message-ID: <5431184B.5040009@precisiontranslationtools.com>
Content-Type: text/plain; charset="windows-1252"

We use a random sample size calculation to determine the optimal sample
size based on each bitext corpus
size.http://en.wikipedia.org/wiki/Sample_size_determination
<http://en.wikipedia.org/wiki/Sample_size_determination>. In an
interesting choice of words, the wikipedia's introduction states, "The
sample size is an important feature of any empirical study in which the
goal is to make inferences about a population from a sample."

As it turns out, most corpora we encounter, the tuning set sizes fall
somewhere in the middle of the range Philipp suggested, i.e. 2-3K lines.

Tom


On 10/05/2014 04:22 PM, Roee Aharoni wrote:
> Hi,
> In a recent post it was mentioned that "600k line tuning set is way
> too big. It will take forever. It's better to reduce it to 2-3k lines."
> Is there a reference to an empirical experiment searching for an
> "optimal" MERT tune set size?
>
> Thanks,
>
> ?
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141005/e0f9c9c5/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 96, Issue 4
********************************************

0 Response to "Moses-support Digest, Vol 96, Issue 4"

Post a Comment