Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: 5-gram discount out of range for adjusted count 2
(James Baker)
----------------------------------------------------------------------
Message: 1
Date: Mon, 3 Dec 2018 12:52:53 +0000
From: James Baker <james.d.baker@gmail.com>
Subject: Re: [Moses-support] 5-gram discount out of range for adjusted
count 2
To: <moses@kheafield.com>
Cc: moses-support@mit.edu
Message-ID:
<CAOa=L2x0GVCRrr3-_KVyb8S9adTCWPpHVvca98G3Ecob0pwCpw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Strangely, if I take a random sample of 75% of that same data, it works
just fine. I can use that for the time being, but it is a curious "feature"!
James
On Mon, 3 Dec 2018 at 12:34, James Baker <james.d.baker@gmail.com> wrote:
> What would constitute duplicated in this context? The number of duplicated
> lines in the document is relatively small, but it's possible some of the
> lines have similar text.
>
> $ wc lm_data.en
> 1876364 21359196 96962517 lm_data.en
> $ sort lm_data.en | uniq > lm_data_uniq.en
> $ wc lm_data_uniq.en
> 1487703 15801025 71344598 lm_data_uniq.en
>
> I'd have thought there should be enough unique data in there though, as
> the file is a combined version of the following datasets from OPUS:
>
> * GNOME
> * OpenSubtitles 2018
> * Tanzil
> * Tatoeba
> * Ubuntu
>
> Thanks,
> James
>
> On Mon, 3 Dec 2018 at 11:58, Kenneth Heafield <moses@kheafield.com> wrote:
>
>> Hi,
>>
>> If I had to guess, you have a lot of duplicated text?
>>
>> Kenneth
>> On 12/3/18 11:23 AM, James Baker wrote:
>>
>> Morning,
>>
>> I've been trying to train a language model using the following command:
>>
>> /opt/model-builder/mosesdecoder/bin/lmplz -o 5 -S 80% -T /tmp <
>> lm_data.en > model.lm
>>
>> But I'm getting the following error:
>>
>> === 1/5 Counting and sorting n-grams ===
>> Reading /opt/model-builder/training/lm_data.en
>>
>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>
>> ****************************************************************************************************
>> Unigram tokens 21187448 types 117756
>> === 2/5 Calculating and sorting adjusted counts ===
>> Chain sizes: 1:1413072 2:5151762432 3:9659554816 4:15455287296
>> 5:22538960896
>> terminate called after throwing an instance of
>> 'lm::builder::BadDiscountException'
>> what():
>> /opt/model-builder/mosesdecoder/lm/builder/adjust_counts.cc:61 in void
>> lm::builder::{anonymous}::StatCollector::CalculateDiscounts(const
>> lm::builder::DiscountConfig&) threw BadDiscountException because
>> `discounts_[i].amount[j] < 0.0 || discounts_[i].amount[j] > j'.
>> ERROR: 5-gram discount out of range for adjusted count 2: -6.80247
>>
>> The data I'm training on has come from the OPUS project. I found some
>> references online to issues when there isn't enough training data, but I
>> think I have sufficient data and have previously trained on a lot less (and
>> even on a subset of my current data):
>>
>> $ wc lm_data.en
>> 1874495 21187448 96148754 lm_data.en
>>
>> Any ideas what might be causing the problem?
>>
>> James
>>
>> _______________________________________________
>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20181203/7ef61f89/attachment-0001.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 146, Issue 3
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 146, Issue 3"
Post a Comment