Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: differences between moses and moses2 output (Hieu Hoang)
2. Re: differences between moses and moses2 output (Hieu Hoang)
----------------------------------------------------------------------
Message: 1
Date: Wed, 28 Sep 2016 14:33:00 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] differences between moses and moses2
output
To: Vito Mandorino <vito.mandorino@linguacustodia.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbjxjWNGYx2S1kuHirxSLCcmE9hbFt4PNyREAZBG4WGaiA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
ah ok. do you have a moses.ini and example input sentence to go with that.
pugixml.cpp is used to parse the input sentence for XML markups for
placeholders, forced-translation etc. You shouldn't change the code for
pugixml 'cos it's an imported library that we don't control and we may
reimport in future if there are new releases. The problem seems to be
Moses2' use of the library so it should be fixed in Moses2
Hieu Hoang
http://www.hoang.co.uk/hieu
On 28 September 2016 at 14:22, Vito Mandorino <
vito.mandorino@linguacustodia.com> wrote:
> We are able to replicate the issue with the probingPT version of this
> phrase-table:
>
> ' ||| ' ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
> & ||| & ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
> > ||| > ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
> < ||| < ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
> " ||| " ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
> ||| ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>   |||   ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>
> If we understand well, the origin of the issue is in the function
> strconv_escape in ./contrib/moses2/pugixml.cpp which replaces some of
> these entities with the actual symbol. Commenting out that part seems to
> fix the problem, but we wonder if this may cause any issues elsewhere since
> we don't know the purpose of the entity replacement.
>
> Best regards,
> Vito
>
> 2016-09-28 11:19 GMT+02:00 Hieu Hoang <hieuhoang@gmail.com>:
>
>> Can you make your model files available for download?
>>
>> Moses and Moses2 aren't guaranteed to give exactly the same answer.
>> However, they should be the same quality overall
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 28 September 2016 at 09:53, Vito Mandorino <
>> vito.mandorino@linguacustodia.com> wrote:
>>
>>> Hi,
>>>
>>> we are testing moses2 and we find a decrease in quality which seems to
>>> be related to apostrophes. For instance:
>>>
>>> Source segment 1:
>>> mise ? disposition des actionnaires des documents d' information
>>> relatifs ? la sicav
>>>
>>> MT Moses:
>>> provision shareholders of the briefing material for the sicav
>>>
>>> MT Moses2:
>>> provision of shareholders documents d' information concerning the fund
>>>
>>>
>>> Source segment 2:
>>> tout titre qui deviendrait sp?culatif ? la suite d' une
>>> r?trogradation apr?s son acquisition par le fonds ne sera pas liquid? , ?
>>> moins que le conseiller en investissement n' estime qu' il y va
>>> de l' int?r?t des actionnaires .
>>>
>>> MT Moses:
>>> any security that would become speculative following a downgrading after
>>> its takeover by the fund will not be liquidated , unless the investment
>>> adviser believes it is in the interest of shareholders .
>>>
>>> MT Moses2:
>>> any security that would become speculative following a possible
>>> downgrade d' by the fund after its acquisition will not be liquidated ,
>>> unless the investment advisor believes n' stake qu' l' interest of
>>> shareholders .
>>>
>>> It is actually strange that the raw MT output contains the apostrophe
>>> symbol instead of the ' entity . What could the reason be?
>>>
>>> Best regards,
>>> Vito
>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>>
>>> [image: Description : Description : lingua_custodia_final full logo]
>>>
>>> *The Translation Trustee*
>>>
>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
>>> <%2B33%206%2084%2065%2068%2089>*
>>>
>>> *Email :* *vito.mandorino@linguacustodia.com
>>> <massinissa.ahmim@linguacustodia.com>*
>>>
>>> *Website :*
>>> *www.linguacustodia.finance <http://www.linguacustodia.com/>*
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
> *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :* *vito.mandorino@linguacustodia.com
> <massinissa.ahmim@linguacustodia.com>*
>
> *Website :*
> *www.linguacustodia.finance <http://www.linguacustodia.com/>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160928/566d7bff/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20160928/566d7bff/attachment-0001.jpg
------------------------------
Message: 2
Date: Wed, 28 Sep 2016 15:12:51 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] differences between moses and moses2
output
To: Vito Mandorino <vito.mandorino@linguacustodia.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbjEK4P457uqHw4VsD52kJmiBmAZVDOTsd1giR8TW_529A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
hi Vito,
please git pull and try decoding again. I've just pushed a fix
https://github.com/hieuhoang/mosesdecoder/commit/0005e98b2674906162ce7945c5edd6a42c9ca418
Basically, I've changed changed the behavious of the pugi call so that it
doesn't unescape the &apos words
Hieu Hoang
http://www.hoang.co.uk/hieu
On 28 September 2016 at 14:33, Hieu Hoang <hieuhoang@gmail.com> wrote:
> ah ok. do you have a moses.ini and example input sentence to go with that.
>
> pugixml.cpp is used to parse the input sentence for XML markups for
> placeholders, forced-translation etc. You shouldn't change the code for
> pugixml 'cos it's an imported library that we don't control and we may
> reimport in future if there are new releases. The problem seems to be
> Moses2' use of the library so it should be fixed in Moses2
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 28 September 2016 at 14:22, Vito Mandorino <vito.mandorino@
> linguacustodia.com> wrote:
>
>> We are able to replicate the issue with the probingPT version of this
>> phrase-table:
>>
>> ' ||| ' ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>> & ||| & ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>> > ||| > ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>> < ||| < ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>> " ||| " ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>> ||| ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>   |||   ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>
>> If we understand well, the origin of the issue is in the function
>> strconv_escape in ./contrib/moses2/pugixml.cpp which replaces some of
>> these entities with the actual symbol. Commenting out that part seems to
>> fix the problem, but we wonder if this may cause any issues elsewhere since
>> we don't know the purpose of the entity replacement.
>>
>> Best regards,
>> Vito
>>
>> 2016-09-28 11:19 GMT+02:00 Hieu Hoang <hieuhoang@gmail.com>:
>>
>>> Can you make your model files available for download?
>>>
>>> Moses and Moses2 aren't guaranteed to give exactly the same answer.
>>> However, they should be the same quality overall
>>>
>>> Hieu Hoang
>>> http://www.hoang.co.uk/hieu
>>>
>>> On 28 September 2016 at 09:53, Vito Mandorino <
>>> vito.mandorino@linguacustodia.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> we are testing moses2 and we find a decrease in quality which seems to
>>>> be related to apostrophes. For instance:
>>>>
>>>> Source segment 1:
>>>> mise ? disposition des actionnaires des documents d' information
>>>> relatifs ? la sicav
>>>>
>>>> MT Moses:
>>>> provision shareholders of the briefing material for the sicav
>>>>
>>>> MT Moses2:
>>>> provision of shareholders documents d' information concerning the fund
>>>>
>>>>
>>>> Source segment 2:
>>>> tout titre qui deviendrait sp?culatif ? la suite d' une
>>>> r?trogradation apr?s son acquisition par le fonds ne sera pas liquid? , ?
>>>> moins que le conseiller en investissement n' estime qu' il y va
>>>> de l' int?r?t des actionnaires .
>>>>
>>>> MT Moses:
>>>> any security that would become speculative following a downgrading
>>>> after its takeover by the fund will not be liquidated , unless the
>>>> investment adviser believes it is in the interest of shareholders .
>>>>
>>>> MT Moses2:
>>>> any security that would become speculative following a possible
>>>> downgrade d' by the fund after its acquisition will not be liquidated ,
>>>> unless the investment advisor believes n' stake qu' l' interest of
>>>> shareholders .
>>>>
>>>> It is actually strange that the raw MT output contains the apostrophe
>>>> symbol instead of the ' entity . What could the reason be?
>>>>
>>>> Best regards,
>>>> Vito
>>>>
>>>>
>>>> --
>>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>>
>>>>
>>>> [image: Description : Description : lingua_custodia_final full logo]
>>>>
>>>> *The Translation Trustee*
>>>>
>>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>>
>>>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
>>>> <%2B33%206%2084%2065%2068%2089>*
>>>>
>>>> *Email :* *vito.mandorino@linguacustodia.com
>>>> <massinissa.ahmim@linguacustodia.com>*
>>>>
>>>> *Website :*
>>>> *www.linguacustodia.finance <http://www.linguacustodia.com/>*
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>
>>
>> --
>> *M**. Vito MANDORINO -- Chief Scientist*
>>
>>
>> [image: Description : Description : lingua_custodia_final full logo]
>>
>> *The Translation Trustee*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
>> <%2B33%206%2084%2065%2068%2089>*
>>
>> *Email :* *vito.mandorino@linguacustodia.com
>> <massinissa.ahmim@linguacustodia.com>*
>>
>> *Website :*
>> *www.linguacustodia.finance <http://www.linguacustodia.com/>*
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160928/76f69389/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20160928/76f69389/attachment.jpg
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 119, Issue 41
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 119, Issue 41"
Post a Comment