Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. moses.ini (Ouafa Benterki)
2. Re: different versions of moses yielding different
translations (Rico Sennrich)
----------------------------------------------------------------------
Message: 1
Date: Thu, 26 Nov 2015 19:38:36 +0100
From: Ouafa Benterki <obenterki@gmail.com>
Subject: [Moses-support] moses.ini
To: moses-support@mit.edu
Message-ID:
<CAO=hEkNaTrCvg9bHOY7c711KqA-+8OiH_YF7hdf6owEW-C8Hxw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
hello,
my question is regarding moses.ini, if we uses IRSTLM should we
replace the KENLM by IRSTLM in moses.ini
thanks
On Thu, Nov 26, 2015 at 6:00 PM, <moses-support-request@mit.edu> wrote:
> Send Moses-support mailing list submissions to
> moses-support@mit.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mailman.mit.edu/mailman/listinfo/moses-support
> or, via email, send a message with subject or body 'help' to
> moses-support-request@mit.edu
>
> You can reach the person managing the list at
> moses-support-owner@mit.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Moses-support digest..."
>
>
> Today's Topics:
>
> 1. Re: Language model question (Dingyuan Wang)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 27 Nov 2015 00:05:51 +0800
> From: Dingyuan Wang <abcdoyle888@gmail.com>
> Subject: Re: [Moses-support] Language model question
> To: Vincent Nguyen <vnguyen@neuf.fr>
> Cc: moses-support <moses-support@mit.edu>
> Message-ID:
> <
> CAFt8H74H6ta+ijKC_CHao2dvUchqnONvk64Q+JDN99JK5b-zEg@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> I tend to fix it in the tokenization script, or I would solve this in some
> pre-processing scripts if there are any obvious patterns in the noise.
>
> --
> Dingyuan
> 2015?11?26? 21:09? "Vincent Nguyen" <vnguyen@neuf.fr>???
>
> > Hi all,
> >
> > I have a question regarding LMs.
> >
> > Let's take the example of news.2014.shuffle.en
> >
> > When we process it through punctuation normalization for english
> > language, it will for instance put a " " before an apostrophe
> > "it is'nt" = > "it is 'nt"
> >
> > BUT it contains some noise, for instance there is some french sentences
> > in the corpus, for which the apostrophe process will not be suited
> > "j'aime" => "j 'aime" => it will create the token 'aime
> >
> > My point is the following,
> >
> > At stage of LM building, how can we prune to eliminate such token like
> > "'aime" so that it does not create wrong uni-grams, nor bi-grams, ...
> >
> > the ngram -minprune only take 2 as a minimum so wrong unigrams will
> > still be taken in the LM.
> >
> >
> > Hope I'm clear enough ....
> >
> > Vincent
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://mailman.mit.edu/mailman/private/moses-support/attachments/20151126/e6c989a0/attachment-0001.html
>
> ------------------------------
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> End of Moses-support Digest, Vol 109, Issue 70
> **********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151126/68361d02/attachment-0001.html
------------------------------
Message: 2
Date: Thu, 26 Nov 2015 18:48:01 +0000
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] different versions of moses yielding
different translations
To: moses-support@mit.edu, Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Message-ID: <565753E1.6070400@gmx.ch>
Content-Type: text/plain; charset="windows-1252"
Hi list, Marcin,
I've added a regtest that covers multimodel with compact phrase tables
(phrase.multimodel-compactptable), and I've identified the offending
commit with git bisect to be commit
a804894378b2695bde78bdbff10e9d0f0afb7cc7.
@Marcin: do you have an idea what could have caused the regression?
best wishes,
Rico
On 26/11/15 15:45, Hieu Hoang wrote:
> I don't know anything about multi-model. Michael Denkowski seemed to
> have made a few changes
> https://github.com/moses-smt/mosesdecoder/commits/master/moses/TranslationModel/PhraseDictionaryMultiModel.cpp
>
>
> On 26/11/2015 13:52, Barry Haddow wrote:
>> Hi Vito
>>
>> It's clear from your example that PhraseDictionaryMultiModel is
>> giving different scores in each version (compare the 1st hypothesis
>> of old with the 3rd of new), and that should not happen.
>>
>> I'm not familiar with the changes made to this class, so maybe
>> someone that is can suggest where to look. Hieu?
>>
>> cheers - Barry
>>
>> On 26/11/15 13:39, Vito Mandorino wrote:
>>> Yes, the moses.ini is the same in the two cases. I don't see any
>>> difference other than the moses version. Here is the 5-best list for
>>> the segment 'test' in the two cases. The phrase-table scores are
>>> different and the rankings change accordingly.
>>>
>>> ---
>>>
>>> echo 'test' | old_mosesdecoder/bin/moses -f ../moses.ini -mp
>>> -n-best-list nbest_oldMoses 5
>>>
>>> 0 ||| test ||| LexicalReordering0= -0.859778 0 0 0 0 0 Distortion0=
>>> 0 LM0= -46.0739 WordPenalty0= -1 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -1.36401 -1.16642 -2.38112 -1.93671 |||
>>> -1.72803
>>> 0 ||| ?preuve ||| LexicalReordering0= 0 0 0 0 0 0 Distortion0= 0
>>> LM0= -33.438 WordPenalty0= -1 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -2.40188 -2.66496 -4.1693 -4.0067 |||
>>> -2.04003
>>> 0 ||| test sur ||| LexicalReordering0= -5.1761 0 0 0 0 0
>>> Distortion0= 0 LM0= -55.2752 WordPenalty0= -2 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -1.63217 -2.48226 -3.3935 -4.16375 |||
>>> -2.2293
>>> 0 ||| test sur les ||| LexicalReordering0= -4.29043 0 0 0 0 0
>>> Distortion0= 0 LM0= -57.1853 WordPenalty0= -3 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -1.71765 -2.88763 -4.30406 -7.17601 |||
>>> -2.37806
>>> 0 ||| tester ||| LexicalReordering0= 0 0 0 0 0 0 Distortion0= 0
>>> LM0= -47.6457 WordPenalty0= -1 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -2.1744 -2.19974 -4.52128 -4.49808 |||
>>> -2.49226
>>>
>>> ---
>>>
>>> echo 'test' | new_mosesdecoder/bin/moses -f ../moses.ini -mp
>>> -n-best-list nbest_newMoses 5
>>>
>>> 0 ||| test sur les ||| LexicalReordering0= -4.29043 0 0 0 0 0
>>> Distortion0= 0 LM0= -57.1853 WordPenalty0= -3 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -0.968934 -1.36372 -1.54406 -1.60562
>>> ||| -1.3334
>>> 0 ||| crit?re de la ||| LexicalReordering0= -4.14314 0 0 0 0 0
>>> Distortion0= 0 LM0= -58.2007 WordPenalty0= -3 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -0.916291 -1.4366 -1.55314 -1.60891 |||
>>> -1.35742
>>> 0 ||| test ||| LexicalReordering0= -0.859778 0 0 0 0 0 Distortion0=
>>> 0 LM0= -46.0739 WordPenalty0= -1 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -1.17713 -1.00259 -1.42326 -1.24867 |||
>>> -1.4286
>>> 0 ||| test sur ||| LexicalReordering0= -5.1761 0 0 0 0 0
>>> Distortion0= 0 LM0= -55.2752 WordPenalty0= -2 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -0.92759 -1.26035 -1.45418 -1.53457 |||
>>> -1.53375
>>> 0 ||| crit?re de ||| LexicalReordering0= -4.41886 0 0 0 0 0
>>> Distortion0= 0 LM0= -57.5157 WordPenalty0= -2 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -1.09861 -1.4366 -1.53505 -1.59179 |||
>>> -1.6079
>>>
>>>
>>> Vito
>>>
>>> 2015-11-26 12:16 GMT+01:00 Barry Haddow <bhaddow@inf.ed.ac.uk
>>> <mailto:bhaddow@inf.ed.ac.uk>>:
>>>
>>> Hi Vito
>>>
>>> The tcmalloc message is normal.
>>>
>>> Are you absolutely sure you are using the same model (and same
>>> pre- and post-processing)? A difference of 5 or 14 bleu should
>>> be quite visible in the output. What do the outputs look like?
>>>
>>> cheers - Barry
>>>
>>>
>>> On 26/11/15 09:58, Vito Mandorino wrote:
>>>> Hi Barry,
>>>>
>>>> actually with OnDisk table there is virtually no difference
>>>> (0.2 average difference no matter if re-tuning has been done or
>>>> not).
>>>> With compact Phrase-table however the difference is larger. The
>>>> latest test this morning yields a loss of 14 Bleu score points
>>>> without re-tuning. I don't know which could be the cause.
>>>> Sometimes there is this message on loading the phrase-tables
>>>> tcmalloc: large alloc 1149427712 bytes == 0x28a54000 @
>>>>
>>>> After re-tuning however the difference in BLEU score gets
>>>> smaller even with compact phrase-table.
>>>>
>>>> Best regards,
>>>> Vito
>>>>
>>>> 2015-11-25 21:23 GMT+01:00 Barry Haddow <bhaddow@inf.ed.ac.uk>:
>>>>
>>>> Hi Vito
>>>>
>>>> The 0.2 difference is after retuning? That's normal then.
>>>>
>>>> But a difference of 5 bleu without retuning suggests a bug.
>>>> Did you say that this only happens with
>>>> PhraseDictionaryMultiModel?
>>>>
>>>> cheers - Barry
>>>>
>>>>
>>>> On 25/11/15 13:53, Vito Mandorino wrote:
>>>>> Thank you. In our tests it seems that with the OnDisk
>>>>> table the quality is basically the same between the two
>>>>> versions of Moses (average 0.2 difference in score Bleu)
>>>>> but for the CompactPhraseTable the difference is larger (2
>>>>> points Bleu loss in average after re-tuning with the new
>>>>> version of Moses, and more than 5 points Bleu in average
>>>>> without re-tuning).
>>>>> Do you think a better quality would be obtained by running
>>>>> a complete re-training of the model with the new version
>>>>> of Moses?
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Vito
>>>>>
>>>>> 2015-11-24 16:31 GMT+01:00 Hieu Hoang <hieuhoang@gmail.com>:
>>>>>
>>>>> There was a change in the underlying datastructure for
>>>>> stacks, it changed from std::set (ordered) to
>>>>> boost::unordered_set.
>>>>> https://github.com/moses-smt/mosesdecoder/commit/6b182ee5e987a5b2823aea7eaaa7ef0457c6a30d
>>>>> This got some speed gains
>>>>>
>>>>> 1 5 10 15 20 25 30 35
>>>>> 56 real4m57.795s real1m19.005s real0m51.636s
>>>>> real0m49.624s real0m49.869s real0m52.475s
>>>>> real0m53.806s real 0m54.957s
>>>>> 13/10 baseline user4m41.255s user5m45.086s
>>>>> user6m34.053s user8m12.430s user8m10.667s
>>>>> user8m16.486s user8m10.592s user 8m13.859s
>>>>>
>>>>> sys0m16.514s sys0m35.494s sys0m54.513s
>>>>> sys1m10.643s sys1m18.449s sys1m21.738s
>>>>> sys1m23.133s sys 1m25.048s
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 57 real4m41.148s real1m16.002s real0m50.747s
>>>>> real0m48.711s real0m49.130s real0m51.473s
>>>>> real0m53.141s real 0m54.513s
>>>>> (56) + unordered set stack user4m23.968s
>>>>> user5m30.356s user6m26.167s user7m39.286s
>>>>> user7m56.229s user7m52.669s user7m56.978s user
>>>>> 7m56.216s
>>>>>
>>>>> sys0m17.231s sys0m35.063s sys0m54.081s
>>>>> sys1m10.137s sys1m17.194s sys1m22.912s
>>>>> sys1m25.948s sys 1m26.247s
>>>>>
>>>>>
>>>>> However, the hypotheses are now added to the stack in
>>>>> a different order so there will be slight differences
>>>>> in results
>>>>>
>>>>>
>>>>> On 24/11/2015 13:53, Vito Mandorino wrote:
>>>>>> Hi,
>>>>>>
>>>>>> in some of our tests a recent version of Moses
>>>>>> (pulled from github last week) and an older one do
>>>>>> not give the same translations on the same source
>>>>>> segment (with the same moses.ini).
>>>>>> Here is the 5-best list for the translation of 'test'
>>>>>> with the last week version:
>>>>>>
>>>>>> 0 ||| test ||| LexicalReordering0= -1.1969 0 0 0 0 0
>>>>>> Distortion0= 0 LM0= -51.1788 WordPenalty0= -1
>>>>>> PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>>>>> -3.03811 -2.5834 -2.08503 -1.83075 ||| -1.27754
>>>>>> 0 ||| testing ||| LexicalReordering0= 0 0 0 0 0 0
>>>>>> Distortion0= 0 LM0= -35.1495 WordPenalty0= -1
>>>>>> PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>>>>> -5.21045 -5.04877 -4.71131 -4.66382 ||| -1.70337
>>>>>> 0 ||| funds ||| LexicalReordering0= -3.1355 0 0 0 0 0
>>>>>> Distortion0= 0 LM0= -11.3753 WordPenalty0= -1
>>>>>> PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>>>>> -10.8209 -10.6835 -5.14555 -5.73388 ||| -1.77009
>>>>>> 0 ||| known as a ||| LexicalReordering0= -3.1355 0 0
>>>>>> 0 0 0 Distortion0= 0 LM0= -58.8877 WordPenalty0= -3
>>>>>> PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>>>>> -4.42285 -11.9339 -5.14555 -18.0392 ||| -1.89152
>>>>>> 0 ||| as a ||| LexicalReordering0= -3.1355 0 0 0 0 0
>>>>>> Distortion0= 0 LM0= -35.5353 WordPenalty0= -2
>>>>>> PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>>>>> -9.34698 -11.9339 -5.14555 -9.14874 ||| -1.89159
>>>>>>
>>>>>> and with the older version of Moses:
>>>>>>
>>>>>> 0 ||| funds ||| LexicalReordering0= -3.1355 0 0 0 0
>>>>>> 0 Distortion0= 0 LM0= -11.3753 WordPenalty0= -1
>>>>>> PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>>>>> -2.52548 -2.52544 -2.45544 -2.48609 ||| -0.815668
>>>>>> 0 ||| as a ||| LexicalReordering0= -3.1355 0 0 0 0 0
>>>>>> Distortion0= 0 LM0= -35.5353 WordPenalty0= -2
>>>>>> PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>>>>> -2.52464 -2.52565 -2.45544 -2.5244 ||| -0.953799
>>>>>> 0 ||| as ||| LexicalReordering0= -3.1355 0 0 0 0 0
>>>>>> Distortion0= 0 LM0= -34.1633 WordPenalty0= -1
>>>>>> PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>>>>> -2.5256 -2.52565 -2.45544 -2.48609 ||| -1.07254
>>>>>> 0 ||| known as a ||| LexicalReordering0= -3.1355 0 0
>>>>>> 0 0 0 Distortion0= 0 LM0= -58.8877 WordPenalty0= -3
>>>>>> PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>>>>> -2.38597 -2.52565 -2.45544 -2.52573 ||| -1.07536
>>>>>> 0 ||| is known as a ||| LexicalReordering0= -3.1355
>>>>>> 0 0 0 0 0 Distortion0= 0 LM0= -80.8518 WordPenalty0=
>>>>>> -4 PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>>>>> -2.37158 -2.52565 -2.45544 -2.52573 ||| -1.18753
>>>>>>
>>>>>> This looks very strange. The only difference is in
>>>>>> the phrase-table scores. Do you have any idea of what
>>>>>> is going on? The only possibility which come to mind
>>>>>> is maybe a different handling of the
>>>>>> PhraseDictionaryMultiModel feature.
>>>>>> The moses.ini is in attachment.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Vito
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>>>>
>>>>>> Description : Description : lingua_custodia_final
>>>>>> full logo
>>>>>>
>>>>>> */The Translation Trustee/*
>>>>>>
>>>>>> *1, Place Charles de Gaulle, **78180
>>>>>> Montigny-le-Bretonneux*
>>>>>>
>>>>>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
>>>>>> <tel:%2B33%206%2084%2065%2068%2089>*
>>>>>>
>>>>>> *Email :****<mailto:massinissa.ahmim@linguacustodia.com>vito.mandorino@linguacustodia.com***
>>>>>>
>>>>>> *Website :****www.linguacustodia.com -
>>>>>> www.thetranslationtrustee.com *
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>> --
>>>>> Hieu Hoang
>>>>> http://www.hoang.co.uk/hieu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>>>
>>>>> Description : Description : lingua_custodia_final full logo
>>>>>
>>>>> */The Translation Trustee/*
>>>>>
>>>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>>>
>>>>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
>>>>> <tel:%2B33%206%2084%2065%2068%2089>*
>>>>>
>>>>> *Email :****<mailto:massinissa.ahmim@linguacustodia.com>vito.mandorino@linguacustodia.com***
>>>>>
>>>>> *Website :****www.linguacustodia.com -
>>>>> www.thetranslationtrustee.com
>>>>> <http://www.thetranslationtrustee.com/>*
>>>>>
>>>>>
>>>>>
>>>>> This body part will be downloaded on demand.
>>>>
>>>>
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>>
>>>> Description : Description : lingua_custodia_final full logo
>>>>
>>>> */The Translation Trustee/*
>>>>
>>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>>
>>>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
>>>> <tel:%2B33%206%2084%2065%2068%2089>*
>>>>
>>>> *Email :****<mailto:massinissa.ahmim@linguacustodia.com>vito.mandorino@linguacustodia.com***
>>>>
>>>> *Website :****www.linguacustodia.com -
>>>> www.thetranslationtrustee.com
>>>> <http://www.thetranslationtrustee.com/>*
>>>>
>>>
>>>
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>> Description : Description : lingua_custodia_final full logo
>>>
>>> */The Translation Trustee/*
>>>
>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89*
>>>
>>> *Email :****<mailto:massinissa.ahmim@linguacustodia.com>vito.mandorino@linguacustodia.com***
>>>
>>> *Website :****www.linguacustodia.com
>>> <http://www.linguacustodia.com/> - www.thetranslationtrustee.com
>>> <http://www.thetranslationtrustee.com/>*
>>>
>>
>>
>>
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151126/d68ad552/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20151126/d68ad552/attachment.jpe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20151126/d68ad552/attachment-0001.jpe
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 109, Issue 71
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 109, Issue 71"
Post a Comment