Moses-support Digest, Vol 109, Issue 67

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Language model question (Vincent Nguyen)
2. Re: different versions of moses yielding different
translations (Vito Mandorino)


----------------------------------------------------------------------

Message: 1
Date: Thu, 26 Nov 2015 14:07:09 +0100
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: [Moses-support] Language model question
To: moses-support <moses-support@mit.edu>
Message-ID: <565703FD.8060004@neuf.fr>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi all,

I have a question regarding LMs.

Let's take the example of news.2014.shuffle.en

When we process it through punctuation normalization for english
language, it will for instance put a " " before an apostrophe
"it is'nt" = > "it is 'nt"

BUT it contains some noise, for instance there is some french sentences
in the corpus, for which the apostrophe process will not be suited
"j'aime" => "j 'aime" => it will create the token 'aime

My point is the following,

At stage of LM building, how can we prune to eliminate such token like
"'aime" so that it does not create wrong uni-grams, nor bi-grams, ...

the ngram -minprune only take 2 as a minimum so wrong unigrams will
still be taken in the LM.


Hope I'm clear enough ....

Vincent


------------------------------

Message: 2
Date: Thu, 26 Nov 2015 14:39:43 +0100
From: Vito Mandorino <vito.mandorino@linguacustodia.com>
Subject: Re: [Moses-support] different versions of moses yielding
different translations
To: Barry Haddow <bhaddow@inf.ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CA+8mSmGRJLB_eA3bVYBsi8JTN5aV1kTN0eXSt7Ao8RhOqH3maQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Yes, the moses.ini is the same in the two cases. I don't see any difference
other than the moses version. Here is the 5-best list for the segment
'test' in the two cases. The phrase-table scores are different and the
rankings change accordingly.

---

echo 'test' | old_mosesdecoder/bin/moses -f ../moses.ini -mp -n-best-list
nbest_oldMoses 5

0 ||| test ||| LexicalReordering0= -0.859778 0 0 0 0 0 Distortion0= 0 LM0=
-46.0739 WordPenalty0= -1 PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
-1.36401 -1.16642 -2.38112 -1.93671 ||| -1.72803
0 ||| ?preuve ||| LexicalReordering0= 0 0 0 0 0 0 Distortion0= 0 LM0=
-33.438 WordPenalty0= -1 PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
-2.40188 -2.66496 -4.1693 -4.0067 ||| -2.04003
0 ||| test sur ||| LexicalReordering0= -5.1761 0 0 0 0 0 Distortion0= 0
LM0= -55.2752 WordPenalty0= -2 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -1.63217 -2.48226 -3.3935 -4.16375 ||| -2.2293
0 ||| test sur les ||| LexicalReordering0= -4.29043 0 0 0 0 0 Distortion0=
0 LM0= -57.1853 WordPenalty0= -3 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -1.71765 -2.88763 -4.30406 -7.17601 |||
-2.37806
0 ||| tester ||| LexicalReordering0= 0 0 0 0 0 0 Distortion0= 0 LM0=
-47.6457 WordPenalty0= -1 PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
-2.1744 -2.19974 -4.52128 -4.49808 ||| -2.49226

---

echo 'test' | new_mosesdecoder/bin/moses -f ../moses.ini -mp -n-best-list
nbest_newMoses 5

0 ||| test sur les ||| LexicalReordering0= -4.29043 0 0 0 0 0 Distortion0=
0 LM0= -57.1853 WordPenalty0= -3 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -0.968934 -1.36372 -1.54406 -1.60562 |||
-1.3334
0 ||| crit?re de la ||| LexicalReordering0= -4.14314 0 0 0 0 0
Distortion0= 0 LM0= -58.2007 WordPenalty0= -3 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -0.916291 -1.4366 -1.55314 -1.60891 |||
-1.35742
0 ||| test ||| LexicalReordering0= -0.859778 0 0 0 0 0 Distortion0= 0 LM0=
-46.0739 WordPenalty0= -1 PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
-1.17713 -1.00259 -1.42326 -1.24867 ||| -1.4286
0 ||| test sur ||| LexicalReordering0= -5.1761 0 0 0 0 0 Distortion0= 0
LM0= -55.2752 WordPenalty0= -2 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -0.92759 -1.26035 -1.45418 -1.53457 |||
-1.53375
0 ||| crit?re de ||| LexicalReordering0= -4.41886 0 0 0 0 0 Distortion0= 0
LM0= -57.5157 WordPenalty0= -2 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -1.09861 -1.4366 -1.53505 -1.59179 ||| -1.6079


Vito

2015-11-26 12:16 GMT+01:00 Barry Haddow <bhaddow@inf.ed.ac.uk>:

> Hi Vito
>
> The tcmalloc message is normal.
>
> Are you absolutely sure you are using the same model (and same pre- and
> post-processing)? A difference of 5 or 14 bleu should be quite visible in
> the output. What do the outputs look like?
>
> cheers - Barry
>
>
> On 26/11/15 09:58, Vito Mandorino wrote:
>
> Hi Barry,
>
> actually with OnDisk table there is virtually no difference (0.2 average
> difference no matter if re-tuning has been done or not).
> With compact Phrase-table however the difference is larger. The latest
> test this morning yields a loss of 14 Bleu score points without re-tuning.
> I don't know which could be the cause.
> Sometimes there is this message on loading the phrase-tables
> tcmalloc: large alloc 1149427712 bytes == 0x28a54000 @
>
> After re-tuning however the difference in BLEU score gets smaller even
> with compact phrase-table.
>
> Best regards,
> Vito
>
> 2015-11-25 21:23 GMT+01:00 Barry Haddow <bhaddow@inf.ed.ac.uk>:
>
>> Hi Vito
>>
>> The 0.2 difference is after retuning? That's normal then.
>>
>> But a difference of 5 bleu without retuning suggests a bug. Did you say
>> that this only happens with PhraseDictionaryMultiModel?
>>
>> cheers - Barry
>>
>>
>> On 25/11/15 13:53, Vito Mandorino wrote:
>>
>> Thank you. In our tests it seems that with the OnDisk table the quality
>> is basically the same between the two versions of Moses (average 0.2
>> difference in score Bleu) but for the CompactPhraseTable the difference is
>> larger (2 points Bleu loss in average after re-tuning with the new version
>> of Moses, and more than 5 points Bleu in average without re-tuning).
>> Do you think a better quality would be obtained by running a complete
>> re-training of the model with the new version of Moses?
>>
>>
>> Best regards,
>> Vito
>>
>> 2015-11-24 16:31 GMT+01:00 Hieu Hoang <hieuhoang@gmail.com>:
>>
>>> There was a change in the underlying datastructure for stacks, it
>>> changed from std::set (ordered) to boost::unordered_set.
>>>
>>> https://github.com/moses-smt/mosesdecoder/commit/6b182ee5e987a5b2823aea7eaaa7ef0457c6a30d
>>> This got some speed gains
>>>
>>> 1 5 10 15 20 25 30 35
>>> 56 real 4m57.795s real 1m19.005s real 0m51.636s real
>>> 0m49.624s real 0m49.869s real 0m52.475s real 0m53.806s real
>>> 0m54.957s 13/10 baseline user 4m41.255s user 5m45.086s user
>>> 6m34.053s user 8m12.430s user 8m10.667s user 8m16.486s user
>>> 8m10.592s user 8m13.859s
>>> sys 0m16.514s sys 0m35.494s sys 0m54.513s sys 1m10.643s
>>> sys 1m18.449s sys 1m21.738s sys 1m23.133s sys 1m25.048s
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 57 real 4m41.148s real 1m16.002s real 0m50.747s real
>>> 0m48.711s real 0m49.130s real 0m51.473s real 0m53.141s real
>>> 0m54.513s (56) + unordered set stack user 4m23.968s user 5m30.356s
>>> user 6m26.167s user 7m39.286s user 7m56.229s user 7m52.669s
>>> user 7m56.978s user 7m56.216s
>>> sys 0m17.231s sys 0m35.063s sys 0m54.081s sys 1m10.137s
>>> sys 1m17.194s sys 1m22.912s sys 1m25.948s sys 1m26.247s
>>> However, the hypotheses are now added to the stack in a different order
>>> so there will be slight differences in results
>>>
>>>
>>> On 24/11/2015 13:53, Vito Mandorino wrote:
>>>
>>> Hi,
>>>
>>> in some of our tests a recent version of Moses (pulled from github last
>>> week) and an older one do not give the same translations on the same source
>>> segment (with the same moses.ini).
>>> Here is the 5-best list for the translation of 'test' with the last week
>>> version:
>>>
>>> 0 ||| test ||| LexicalReordering0= -1.1969 0 0 0 0 0 Distortion0= 0
>>> LM0= -51.1788 WordPenalty0= -1 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -3.03811 -2.5834 -2.08503 -1.83075 ||| -1.27754
>>> 0 ||| testing ||| LexicalReordering0= 0 0 0 0 0 0 Distortion0= 0 LM0=
>>> -35.1495 WordPenalty0= -1 PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>> -5.21045 -5.04877 -4.71131 -4.66382 ||| -1.70337
>>> 0 ||| funds ||| LexicalReordering0= -3.1355 0 0 0 0 0 Distortion0= 0
>>> LM0= -11.3753 WordPenalty0= -1 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -10.8209 -10.6835 -5.14555 -5.73388 |||
>>> -1.77009
>>> 0 ||| known as a ||| LexicalReordering0= -3.1355 0 0 0 0 0 Distortion0=
>>> 0 LM0= -58.8877 WordPenalty0= -3 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -4.42285 -11.9339 -5.14555 -18.0392 |||
>>> -1.89152
>>> 0 ||| as a ||| LexicalReordering0= -3.1355 0 0 0 0 0 Distortion0= 0
>>> LM0= -35.5353 WordPenalty0= -2 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -9.34698 -11.9339 -5.14555 -9.14874 |||
>>> -1.89159
>>>
>>> and with the older version of Moses:
>>>
>>> 0 ||| funds ||| LexicalReordering0= -3.1355 0 0 0 0 0 Distortion0= 0
>>> LM0= -11.3753 WordPenalty0= -1 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -2.52548 -2.52544 -2.45544 -2.48609 |||
>>> -0.815668
>>> 0 ||| as a ||| LexicalReordering0= -3.1355 0 0 0 0 0 Distortion0= 0
>>> LM0= -35.5353 WordPenalty0= -2 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -2.52464 -2.52565 -2.45544 -2.5244 |||
>>> -0.953799
>>> 0 ||| as ||| LexicalReordering0= -3.1355 0 0 0 0 0 Distortion0= 0 LM0=
>>> -34.1633 WordPenalty0= -1 PhrasePenalty0= 1 PhraseDictionaryMultiModel0=
>>> -2.5256 -2.52565 -2.45544 -2.48609 ||| -1.07254
>>> 0 ||| known as a ||| LexicalReordering0= -3.1355 0 0 0 0 0 Distortion0=
>>> 0 LM0= -58.8877 WordPenalty0= -3 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -2.38597 -2.52565 -2.45544 -2.52573 |||
>>> -1.07536
>>> 0 ||| is known as a ||| LexicalReordering0= -3.1355 0 0 0 0 0
>>> Distortion0= 0 LM0= -80.8518 WordPenalty0= -4 PhrasePenalty0= 1
>>> PhraseDictionaryMultiModel0= -2.37158 -2.52565 -2.45544 -2.52573 |||
>>> -1.18753
>>>
>>> This looks very strange. The only difference is in the phrase-table
>>> scores. Do you have any idea of what is going on? The only possibility
>>> which come to mind is maybe a different handling of the
>>> PhraseDictionaryMultiModel feature.
>>> The moses.ini is in attachment.
>>>
>>> Best regards,
>>>
>>> Vito
>>>
>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>>
>>> [image: Description : Description : lingua_custodia_final full logo]
>>>
>>> *The Translation Trustee*
>>>
>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
>>> <%2B33%206%2084%2065%2068%2089>*
>>>
>>> *Email :* *
>>> <massinissa.ahmim@linguacustodia.com>vito.mandorino@linguacustodia.com
>>> <vito.mandorino@linguacustodia.com>*
>>>
>>> *Website :* *www.linguacustodia.com <http://www.linguacustodia.com> -
>>> www.thetranslationtrustee.com <http://www.thetranslationtrustee.com/>*
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>> --
>>> Hieu Hoanghttp://www.hoang.co.uk/hieu
>>>
>>>
>>
>>
>> --
>> *M**. Vito MANDORINO -- Chief Scientist*
>>
>>
>> [image: Description : Description : lingua_custodia_final full logo]
>>
>> *The Translation Trustee*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
>> <%2B33%206%2084%2065%2068%2089>*
>>
>> *Email :* *
>> <massinissa.ahmim@linguacustodia.com>vito.mandorino@linguacustodia.com
>> <vito.mandorino@linguacustodia.com>*
>>
>> *Website :* *www.linguacustodia.com <http://www.linguacustodia.com/> -
>> www.thetranslationtrustee.com <http://www.thetranslationtrustee.com/>*
>>
>>
>> This body part will be downloaded on demand.
>>
>>
>>
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
> *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :* *vito.mandorino@linguacustodia.com
> <massinissa.ahmim@linguacustodia.com>*
>
> *Website :* *www.linguacustodia.com <http://www.linguacustodia.com/> -
> www.thetranslationtrustee.com <http://www.thetranslationtrustee.com/>*
>
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>


--
*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

*The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89*

*Email :* *vito.mandorino@linguacustodia.com
<massinissa.ahmim@linguacustodia.com>*

*Website :* *www.linguacustodia.com <http://www.linguacustodia.com/> -
www.thetranslationtrustee.com <http://www.thetranslationtrustee.com/>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151126/119f8c59/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20151126/119f8c59/attachment.jpe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20151126/119f8c59/attachment.jpg

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 109, Issue 67
**********************************************

0 Response to "Moses-support Digest, Vol 109, Issue 67"

Post a Comment