Moses-support Digest, Vol 82, Issue 38

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. lmplz: BadDiscountException: Is this small or artificial
data? (Marcin Junczys-Dowmunt)
2. Re: lmplz: BadDiscountException: Is this small or artificial
data? (Kenneth Heafield)
3. Re: Did the configuration files change (Hieu Hoang)
4. Re: lmplz: BadDiscountException: Is this small or artificial
data? (Marcin Junczys-Dowmunt)


----------------------------------------------------------------------

Message: 1
Date: Tue, 27 Aug 2013 14:04:46 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: [Moses-support] lmplz: BadDiscountException: Is this small or
artificial data?
To: moses-support@mit.edu
Message-ID: <fc039dbb97e71e5fd82612e0ff488946@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"



Hi Kenneth,

I am getting the following error from lmplz on training data built from
POS-tags:

/home/m.junczys/kenlm/lm/builder/adjust_counts.cc:42 in void
lm::builder::{anonymous}::StatCollector::CalculateDiscounts() threw
BadDiscountException because `s.n[j] == 0'.
Could not calculate Kneser-Ney discounts for 1-grams with adjusted
count 3 because we didn't observe any 1-grams with adjusted count 2; Is
this small or artificial data?

The data is indeed not particularly big, 27M tokens, only around 50
different types, there is one unigram that appears 1 time, the next
common appears 15 times, the next 400 times etc. So there are many gaps
concerning frequencies. Is there a clever way around that, for instance
adding some artificial unigrams to the training data?

Best,

Marcin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130827/ba53b0bc/attachment-0001.htm

------------------------------

Message: 2
Date: Tue, 27 Aug 2013 13:38:04 +0100
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] lmplz: BadDiscountException: Is this
small or artificial data?
To: moses-support@mit.edu
Message-ID: <521C9DAC.6070205@kheafield.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I look forward to working with you at MT Marathon on adding features to
lmplz.

Kneser-Ney smoothing is not well defined when there is no singleton.
You'll get a similar error message from SRILM on this data with KN. I
suggest you consider a different smoothing method. Currently, lmplz
only implements unpruned interpolated modified Kneser-Ney.

The following lmplz features are desired, sorted in order by the number
of people poking me about them:

1. Interpolation.
2. Pruning.
3. Configurable <unk> handling (UMD, question at ACL, one e-mail).
4. Additional smoothing methods (Chris Dyer and you).
5. Better memory accounting/backoff instead of exception.
6. Scaling/sharding.

Kenneth

On 08/27/13 13:04, Marcin Junczys-Dowmunt wrote:
> Hi Kenneth,
>
> I am getting the following error from lmplz on training data built from
> POS-tags:
>
> /home/m.junczys/kenlm/lm/builder/adjust_counts.cc:42 in void
> lm::builder::{anonymous}::StatCollector::CalculateDiscounts() threw
> BadDiscountException because `s.n[j] == 0'.
> Could not calculate Kneser-Ney discounts for 1-grams with adjusted count
> 3 because we didn't observe any 1-grams with adjusted count 2; Is this
> small or artificial data?
>
> The data is indeed not particularly big, 27M tokens, only around 50
> different types, there is one unigram that appears 1 time, the next
> common appears 15 times, the next 400 times etc. So there are many gaps
> concerning frequencies. Is there a clever way around that, for instance
> adding some artificial unigrams to the training data?
>
> Best,
>
> Marcin
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

Message: 3
Date: Tue, 27 Aug 2013 13:47:47 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Did the configuration files change
To: Jo?o Gra?a <gracaninja@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbhyw5udx_3u+1j8rQqsDs=ZOH3rcJREnkx1HGW46e66kA@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

there's a character
q
at the beginning of the file.

Delete it


On 26 August 2013 22:34, Jo?o Gra?a <gracaninja@gmail.com> wrote:

> Dear Hieu,
>
> I am trying to use the new version of moses with the old format moses.ini
> from the pre-trained models.
>
> In attach is the configuration file I am currently trying to use which is
> a modified copy from the tunning file taken from
> http://www.statmt.org/moses/RELEASE-1.0/models/en-es/tuning/moses.tuned.ini.1
>
> However when I run moses I get this error:
>
> vagrant@precise64:/vagrant/mt-models/en-es$ ~/mosesdecoder/bin/moses -f
> moses.ini
> Defined parameters (per moses.ini or switch):
> : q
> config: moses.ini
> distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6
> /vagrant/mt-models/en-es/reordering-table.1.wbe-msd-bidirectional-fe.gz
> distortion-limit: 6
> input-factors: 0
> lmodel-file: 9 0 5 /vagrant/mt-models/en-es/europarl.binlm.1
> mapping: 0 T 0
> ttable-file: 0 0 0 5 /vagrant/mt-models/en-es/phrase-table.1
> ttable-limit: 20
> weight-d: 0.048861 0.0882949 0.0561228 0.0794274 0.152473 -0.00619951
> 0.0546715
> weight-l: 0.0772417
> weight-t: 0.0617933 0.0247665 0.0492006 0.0490465 0.0571993
> weight-w: -0.194702
> ERROR:Unknown parameter
>
> Do you know what I am setting wrong?
>
> Thanks for you help,
>
> Jo?o
>
>
> On Tue, Aug 6, 2013 at 12:10 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>
>> The old format.
>>
>>
>> On 6 August 2013 12:07, Jo?o Gra?a <gracaninja@gmail.com> wrote:
>>
>>> Hi Hieu,
>>>
>>> Thanks a lot for your help.
>>>
>>> Which format do the pre-made models use?
>>>
>>> Thanks,
>>>
>>> Joao
>>>
>>>
>>> On Tue, Aug 6, 2013 at 12:03 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>>
>>>> ah, the moses.ini file format has recently changed. The sample files
>>>> are in the new format.
>>>>
>>>> Older versions of moses don't understand the new format. (New Moses can
>>>> understand both old and new format)
>>>>
>>>> To get the source code for the new Moses version:
>>>> git@github.com:moses-smt/mosesdecoder.git
>>>>
>>>> If you really want to stick with the old moses, the old sample files
>>>> are here:
>>>> http://www.statmt.org/moses/download/sample-models.old.tgz
>>>>
>>>> However, the sample models are unrealistically small. These premade
>>>> models are more realistic:
>>>> http://www.statmt.org/moses/RELEASE-1.0/models/
>>>>
>>>>
>>>> On 6 August 2013 11:42, Jo?o Gra?a <gracaninja@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am trying to install moses on my system and run it with the sample
>>>>> models.
>>>>> I am using the moses linux package from
>>>>> http://www.statmt.org/~jie/linux/moses_1.0-1_amd64.deb
>>>>>
>>>>> and then I try to run the sample models available from the web site.
>>>>>
>>>>> http://www.statmt.org/moses/download/sample-models.tgz
>>>>>
>>>>> but when i try to run the decode I get the following error:
>>>>>
>>>>> vagrant@precise64:/vagrant/sample-models$ /opt/moses/bin/moses -f
>>>>> phrase-model/moses.ini < phrase-model/in
>>>>> Defined parameters (per moses.ini or switch):
>>>>> config: phrase-model/moses.ini
>>>>> feature: KENLM name=LM factor=0 order=3 num-features=1
>>>>> path=lm/europarl.srilm.gz Distortion WordPenalty UnknownWordPenalty
>>>>> PhraseDictionaryMemory input-factor=0 output-factor=0
>>>>> path=phrase-model/phrase-table num-features=1 table-limit=10
>>>>> input-factors: 0
>>>>> mapping: T 0
>>>>> n-best-list: nbest.txt 100
>>>>> weight: WordPenalty0= 0 LM= 1 Distortion0= 1
>>>>> PhraseDictionaryMemory0= 1
>>>>> ERROR:Unknown parameter feature
>>>>> ERROR:Unknown parameter weight
>>>>> ERROR:No phrase translation table (ttable-file)
>>>>>
>>>>>
>>>>> Was there any major changes on the configuration file moses.ini that
>>>>> renders the previous releases of moses unusable with the sample models?
>>>>>
>>>>> Thanks you for all your help,
>>>>>
>>>>> Jo?o
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Hieu Hoang
>>>> Research Associate
>>>> University of Edinburgh
>>>> http://www.hoang.co.uk/hieu
>>>>
>>>>
>>>
>>
>>
>> --
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>>
>>
>


--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130827/ab651439/attachment-0001.htm

------------------------------

Message: 4
Date: Tue, 27 Aug 2013 14:50:20 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] lmplz: BadDiscountException: Is this
small or artificial data?
To: moses-support@mit.edu
Message-ID: <8bcdf591521a009bda0e880d84b0ca2c@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"



OK, I was hoping for a dirty method to bypass that. No pressure :)

W dniu 2013-08-27 14:38, Kenneth Heafield napisa?(a):

> Hi,
>
> I look forward to working with you at MT Marathon on adding features to
> lmplz.
>
> Kneser-Ney smoothing is not well defined when there is no singleton.
> You'll get a similar error message from SRILM on this data with KN. I
> suggest you consider a different smoothing method. Currently, lmplz
> only implements unpruned interpolated modified Kneser-Ney.
>
> The following lmplz features are desired, sorted in order by the number
> of people poking me about them:
>
> 1. Interpolation.
> 2. Pruning.
> 3. Configurable <unk> handling (UMD, question at ACL, one e-mail).
> 4. Additional smoothing methods (Chris Dyer and you).
> 5. Better memory accounting/backoff instead of exception.
> 6. Scaling/sharding.
>
> Kenneth


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130827/66b460fd/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 82, Issue 38
*********************************************

0 Response to "Moses-support Digest, Vol 82, Issue 38"

Post a Comment