Moses-support Digest, Vol 114, Issue 1

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: mkcls fault (Hieu Hoang)
2. Re: Maximum Phrase Table length (Hieu Hoang)
3. Re: Maximum Phrase Table length (Philipp Koehn)
4. Re: Maximum Phrase Table length (Alexandru Ceausu)


----------------------------------------------------------------------

Message: 1
Date: Thu, 31 Mar 2016 20:34:42 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] mkcls fault
To: Tom Hoar <tahoar@precisiontranslationtools.com>,
moses-support@mit.edu
Message-ID: <56FD7BD2.2040103@gmail.com>
Content-Type: text/plain; charset="windows-1252"

Hiya

On 29/03/2016 14:13, Tom Hoar wrote:
> I think (M)GIZA++ expertise has faded over the years, but I'm hoping
> someone has some ideas about this.
>
> A user start training with an extremely small TMX file with a few
> dozen of parallel segments. Our preparation tools reduced the parallel
> corpus to only 2 pairs (attached). The user was running the Windows
> version (mkcls.exe) but we verified the same error on Linux. The
> train-model.perl script failed in step 1 (log also attached).
> Specifically, the mkcls binary failed with this error message:
>
> Assertion failed!
> File: src/mkcls/StatVar.cpp, Line 110
> Expression: index>=0&&index<n
>
> I've never seen this error with a respectable corpus size. So, I did
> some tests.
>
> * test 1, copy same sentences 14,500 times, got assertion (failed).
> * test 2, copy each of the pairs only 7,250 times, got assertion
> (failed).
> * test 3, added 4,000 unique pairs, no assertion (success).
> * test 4, reduced to 2 original + 3 new pairs, no assertion (success).
> * test 5, reduced to 2 original + 2 new pairs, assertion returned
> (failed).
>
> It seems this assertion is linked to lack of variety in the training
> corpus. Can anyone confirm this observation? Has anyone ever
> experienced this error?
>
> If no one's seen this with a larger corpus, a terminal failure due to
> lack of variety is probably good. Would it be accurate if we add an
> error message to the effect, "terminal error due to insufficient
> variety in the training corpus"?
Be my guest. You're probably not the 1st to encounter these edge cases,
but the first to have to deal with them.
>
> Thanks for any ideas/suggestions.
> Tom
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160331/3588ef44/attachment-0001.html

------------------------------

Message: 2
Date: Thu, 31 Mar 2016 20:43:37 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Maximum Phrase Table length
To: Vincent Nguyen <vnguyen@neuf.fr>, moses-support
<moses-support@mit.edu>
Message-ID: <56FD7DE9.5050405@gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed



On 31/03/2016 13:58, Vincent Nguyen wrote:
> Hello,
>
> Does someone have some support to this (found in the doc) :
>
> Maximum Phrase Length
>
> The maximum length of phrases is limited to 7 words. The maximum phrase
> length impacts the size of the phrase translation table, so shorter
> limits may be desirable, if phrase table size is an issue. Previous
> experiments have shown that performance increases only slightly when
> including phrases of more that 3 words.
>
> Summary
>
> --max-phrase-length -- maximum length of phrases entered into
> phrase table (default 7)
>
>
> If there is no major improvement above 3, why is the default 7, and is
> there a benchmark somewhere ?
it may not be major improvements, but even minor improvements are
important in some cases. Unless there's a good reason for changing the
default, it's too much hassle to change it.

You'll have to make sure the regression tests still works, and answer
all the queries on the mailing list about why it's changed.
>
>
> Thanks
> Vincent.
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



------------------------------

Message: 3
Date: Thu, 31 Mar 2016 17:35:33 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] Maximum Phrase Table length
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAAFADDCVi_-9McMqmkmQgnG1jsU8AQQNV0sdaYBAGw+bv8HCpg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

the last time I tested this number is here (Table 7):
http://www.statmt.org/wmt13/pdf/WMT12.pdf

However, there may be benefits to bigger phrases in more narrow domains
where translations follow stricter guidelines, rather than the news sets
tested here.

-phi

On Thu, Mar 31, 2016 at 3:43 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:

>
>
> On 31/03/2016 13:58, Vincent Nguyen wrote:
> > Hello,
> >
> > Does someone have some support to this (found in the doc) :
> >
> > Maximum Phrase Length
> >
> > The maximum length of phrases is limited to 7 words. The maximum phrase
> > length impacts the size of the phrase translation table, so shorter
> > limits may be desirable, if phrase table size is an issue. Previous
> > experiments have shown that performance increases only slightly when
> > including phrases of more that 3 words.
> >
> > Summary
> >
> > --max-phrase-length -- maximum length of phrases entered into
> > phrase table (default 7)
> >
> >
> > If there is no major improvement above 3, why is the default 7, and is
> > there a benchmark somewhere ?
> it may not be major improvements, but even minor improvements are
> important in some cases. Unless there's a good reason for changing the
> default, it's too much hassle to change it.
>
> You'll have to make sure the regression tests still works, and answer
> all the queries on the mailing list about why it's changed.
> >
> >
> > Thanks
> > Vincent.
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160331/f0b1b5be/attachment-0001.html

------------------------------

Message: 4
Date: Fri, 1 Apr 2016 10:32:36 +0200
From: Alexandru Ceausu <alceausu@gmail.com>
Subject: Re: [Moses-support] Maximum Phrase Table length
To: Vincent Nguyen <vnguyen@neuf.fr>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAOScuS_PAE-GMBopfLO_uWQaX1BZgkZnW7-8K5bR3SgzuCOZXA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Vincent,

It is a runtime parameter. It does not necessary match the max phrase of
training.
It also affects how xml-input is handled. If you are using translation
constrains, this parameter has to be set to the maximum number of covered
source words.
I think that the default is set at 20.

Best regards,
Al. Ceausu

On Thu, Mar 31, 2016 at 2:58 PM, Vincent Nguyen <vnguyen@neuf.fr> wrote:

> Hello,
>
> Does someone have some support to this (found in the doc) :
>
> Maximum Phrase Length
>
> The maximum length of phrases is limited to 7 words. The maximum phrase
> length impacts the size of the phrase translation table, so shorter
> limits may be desirable, if phrase table size is an issue. Previous
> experiments have shown that performance increases only slightly when
> including phrases of more that 3 words.
>
> Summary
>
> --max-phrase-length -- maximum length of phrases entered into
> phrase table (default 7)
>
>
> If there is no major improvement above 3, why is the default 7, and is
> there a benchmark somewhere ?
>
>
> Thanks
> Vincent.
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160401/6bb28b9a/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 114, Issue 1
*********************************************

0 Response to "Moses-support Digest, Vol 114, Issue 1"

Post a Comment