Moses-support Digest, Vol 103, Issue 44

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Unexpected behaviour of placeables (Hieu Hoang)
2. Re: Unexpected behaviour of placeables (Carla Parra)
3. Re: Unexpected behaviour of placeables (Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Tue, 19 May 2015 13:58:59 +0400
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Unexpected behaviour of placeables
To: carla.parra@hermestrans.com
Cc: Moses Support <moses-support@mit.edu>
Message-ID:
<CAEKMkbiZsCJ=wvcjJ0bUETVCr-9-_=PKw-8a-EdAqhxyyTRyGg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

it looks ok to me, not sure what could be wrong.

i've added a daily test to ensure that the placeholder will work in future.
Perhaps you can have a look at the moses.ini file and to-translate.txt
files to see if there are any differences with yours

https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder

Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

On 19 May 2015 at 11:53, Carla Parra <carla.parra@hermestrans.com> wrote:

> Dear Hieu,
>
> thanks for your reply. I attach the config file, my moses.ini (I think
> this is the one you want to get), and a few lines of our input file,
> already preprocessed. If you want the RAW lines I can also send them to you.
>
> I don't know if this will be a similar issue, but I tried the same
> strategy using the forced translations (<np
> translation="German">Deutsch</np>), and this morning I have observed the
> same, some tags are suddenly appearing in the translation.
>
> Thank you very much for your support!
>
> Carla
>
>
> El 19.05.2015 09:13, Hieu Hoang escribi?:
>
>> what is the exact command you used to decode? Can you please provide
>> the moses.ini file and a few lines of your input data for us to look
>> at.
>>
>> Hieu Hoang
>> Researcher
>>
>> New York University, Abu Dhabi
>>
>> http://www.hoang.co.uk/hieu [3]
>>
>>
>> On 18 May 2015 at 15:35, Carla Parra <carla.parra@hermestrans.com>
>> wrote:
>>
>> Dear all,
>>>
>>> we just finished some experiments using placeables, and we have
>>> observed
>>> several issues that may be worth sharing. I don't know if someone
>>> has
>>> experienced the same, or you were already aware of this, but just
>>> in
>>> case:
>>>
>>> (1) Special characters must be scaped in the "entity" value field.
>>> Otherwise, the cause XML parsing errors at tuning (not at training,
>>> though!), and wrong values are retrieved from the tags (e.g. we had
>>> text
>>> with additional quotation marks, and this caused that the
>>> translation
>>> stopped at the first quotation mark, not yielding the complete
>>> "entity"
>>> value we had encoded).
>>>
>>> (2) <ne> tags are added to sentences as if they were computed as
>>> tokens
>>> during training. (i.e. not ignored, as they just contain the
>>> placeables).
>>> As an example, the English sentence "Allow simple password", is
>>> translated as "Permitir simple contrase?a <ne translation="@tag@"
>>> entity="&lt;/1&gt;">@tag@</ne> ."
>>>
>>> While the first issue is our fault, we do not know what causes the
>>> second one. We have followed the instructions at the MOSES advanced
>>> features site and thus specified "extract-settings = "--Placeholder
>>> @tag@"" in training and "-placeholder-factor 1 -xml-input
>>> exclusive" in
>>> the decoder and evaluation. Has anyone experienced the same thing
>>> and/or
>>> know how to solve this issue?
>>>
>>> Thank you very much. Best regards,
>>>
>>> Carla
>>>
>>> --
>>> Carla Parra Escart?n
>>> Marie Curie Experienced Researcher - EXPERT ITN
>>> http://expert-itn.eu/ [1]
>>> Hermes Traducciones
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support [2]
>>>
>>
>>
>>
>> Links:
>> ------
>> [1] http://expert-itn.eu/
>> [2] http://mailman.mit.edu/mailman/listinfo/moses-support
>> [3] http://www.hoang.co.uk/hieu
>>
>
> --
> Carla Parra Escart?n
> Marie Curie Experienced Researcher - EXPERT ITN
> http://expert-itn.eu/
> Hermes Traducciones
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150519/7eaffd22/attachment-0001.htm

------------------------------

Message: 2
Date: Tue, 19 May 2015 12:39:27 +0200
From: Carla Parra <carla.parra@hermestrans.com>
Subject: Re: [Moses-support] Unexpected behaviour of placeables
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: Moses Support <moses-support@mit.edu>
Message-ID: <a9ad63d374bd7f383c977eda2c4236f4@hermestrans.com>
Content-Type: text/plain; charset=UTF-8; format=flowed

Dear Hieu,

thanks for looking into this! As far as I can see, your to-translate.txt
seems similar to mine (i.e. our tags look the same).

The moses.ini files however are a bit different. Ours was generated by
EMS. While we differ in the feature functions section, the xml-input and
the placeholder-factor are identical. I have an additional weight
section and in my mapping-steps I have "0 T 0", while you only have "T
0". Could any of this be the cause?

What I have observed is that the tags were correctly used where they
should be used, thus retrieving the right translations and markup was
removed. However, in some sentences there appears suddenly a tag, as I
illustrated yesterday in my example:

"Allow simple password", is translated as "Permitir simple contrase?a
<ne translation="@tag@" entity="&lt;/1&gt;">@tag@</ne> ."

The fact that in such cases the tags have not been removed by the script
doing so makes me think that they are somehow learnt in the training
process as individual tokens. I have checked the phrase table, and I
found things like:

"with ||| con <ne translation="@tag@" entity="&lt;2&gt;">@tag@</ne> |||
0.106785 0.447898 0.000321641 4.17018e-05 ||| 0-0 ||| 5 1660 1 ||| |||"

Sometimes the tag is not complete (as if it had been tokenized, which in
principle was prevented by using the protected tokenization and a list
of patterns):

"with the ||| con <ne translation="@tag@" ||| 0.0166851 0.0176098
0.000606043 0.00561383 ||| 0-0 ||| 32 881 1 ||| |||"

I am not an expert, so my guess might be totally wrong, but this makes
me think that somehow MOSES also used the text within the tags in
training. In a previous email I explained that I had encountered
problems when using Chris Dyer's FastAlign because it converted all
special characters to their corresponding codes, so I commented out that
loop. Now I wonder whether this might be the cause of MOSES using the
tags in training? How should I call the word aligner so that it ignores
the tags?


Best,
Carla

El 19.05.2015 11:58, Hieu Hoang escribi?:
> it looks ok to me, not sure what could be wrong.
>
> i've added a daily test to ensure that the placeholder will work in
> future. Perhaps you can have a look at the moses.ini file and
> to-translate.txt files to see if there are any differences with yours
> ?
> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
> [4]
>
> Hieu Hoang
> Researcher
>
> New York University, Abu Dhabi
>
> http://www.hoang.co.uk/hieu [1]
>
> On 19 May 2015 at 11:53, Carla Parra <carla.parra@hermestrans.com>
> wrote:
>
>> Dear Hieu,
>>
>> thanks for your reply. I attach the config file, my moses.ini (I
>> think this is the one you want to get), and a few lines of our input
>> file, already preprocessed. If you want the RAW lines I can also
>> send them to you.
>>
>> I don't know if this will be a similar issue, but I tried the same
>> strategy using the forced translations (<np
>> translation="German">Deutsch</np>), and this morning I have observed
>> the same, some tags are suddenly appearing in the translation.
>>
>> Thank you very much for your support!
>>
>> Carla
>>
>> El 19.05.2015 09:13, Hieu Hoang escribi?:
>> what is the exact command you used to decode? Can you please
>> provide
>> the moses.ini file and a few lines of your input data for us to
>> look
>> at.
>>
>> Hieu Hoang
>> Researcher
>>
>> New York University, Abu Dhabi
>>
>> http://www.hoang.co.uk/hieu [1] [3]
>>
>> On 18 May 2015 at 15:35, Carla Parra <carla.parra@hermestrans.com>
>> wrote:
>>
>> Dear all,
>>
>> we just finished some experiments using placeables, and we have
>> observed
>> several issues that may be worth sharing. I don't know if someone
>> has
>> experienced the same, or you were already aware of this, but just
>> in
>> case:
>>
>> (1) Special characters must be scaped in the "entity" value field.
>> Otherwise, the cause XML parsing errors at tuning (not at training,
>> though!), and wrong values are retrieved from the tags (e.g. we had
>> text
>> with additional quotation marks, and this caused that the
>> translation
>> stopped at the first quotation mark, not yielding the complete
>> "entity"
>> value we had encoded).
>>
>> (2) <ne> tags are added to sentences as if they were computed as
>> tokens
>> during training. (i.e. not ignored, as they just contain the
>> placeables).
>> As an example, the English sentence "Allow simple password", is
>> translated as "Permitir simple contrase?a <ne translation="@tag@"
>> entity="&lt;/1&gt;">@tag@</ne> ."
>>
>> While the first issue is our fault, we do not know what causes the
>> second one. We have followed the instructions at the MOSES advanced
>> features site and thus specified "extract-settings = "--Placeholder
>> @tag@"" in training and "-placeholder-factor 1 -xml-input
>> exclusive" in
>> the decoder and evaluation. Has anyone experienced the same thing
>> and/or
>> know how to solve this issue?
>>
>> Thank you very much. Best regards,
>>
>> Carla
>>
>> --
>> Carla Parra Escart?n
>> Marie Curie Experienced Researcher - EXPERT ITN
>> http://expert-itn.eu/ [2] [1]
>> Hermes Traducciones
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support [3] [2]
>>
>> Links:
>> ------
>> [1] http://expert-itn.eu/ [2]
>> [2] http://mailman.mit.edu/mailman/listinfo/moses-support [3]
>> [3] http://www.hoang.co.uk/hieu [1]
>
> --
> Carla Parra Escart?n
> Marie Curie Experienced Researcher - EXPERT ITN
> http://expert-itn.eu/ [2]
> Hermes Traducciones
>
>
> Links:
> ------
> [1] http://www.hoang.co.uk/hieu
> [2] http://expert-itn.eu/
> [3] http://mailman.mit.edu/mailman/listinfo/moses-support
> [4]
> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder

--
Carla Parra Escart?n
Marie Curie Experienced Researcher - EXPERT ITN
http://expert-itn.eu/
Hermes Traducciones


------------------------------

Message: 3
Date: Tue, 19 May 2015 14:45:43 +0400
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Unexpected behaviour of placeables
To: carla.parra@hermestrans.com
Cc: Moses Support <moses-support@mit.edu>
Message-ID:
<CAEKMkbjVONCJaxibPZJn_WC3GKmHPdhhu_eCYknd=Um8ACJ9Pw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

On 19 May 2015 at 14:39, Carla Parra <carla.parra@hermestrans.com> wrote:

> Dear Hieu,
>
> thanks for looking into this! As far as I can see, your to-translate.txt
> seems similar to mine (i.e. our tags look the same).
>
> The moses.ini files however are a bit different. Ours was generated by
> EMS. While we differ in the feature functions section, the xml-input and
> the placeholder-factor are identical. I have an additional weight section
> and in my mapping-steps I have "0 T 0", while you only have "T 0". Could
> any of this be the cause?
>
> What I have observed is that the tags were correctly used where they
> should be used, thus retrieving the right translations and markup was
> removed. However, in some sentences there appears suddenly a tag, as I
> illustrated yesterday in my example:
>
> "Allow simple password", is translated as "Permitir simple contrase?a <ne
> translation="@tag@" entity="&lt;/1&gt;">@tag@</ne> ."
>
> The fact that in such cases the tags have not been removed by the script
> doing so makes me think that they are somehow learnt in the training
> process as individual tokens. I have checked the phrase table, and I found
> things like:
>
> "with ||| con <ne translation="@tag@" entity="&lt;2&gt;">@tag@</ne> |||
> 0.106785 0.447898 0.000321641 4.17018e-05 ||| 0-0 ||| 5 1660 1 ||| |||"
>
Ah, I see. In the training data, you should only add the @tag@, not the xml
part.

You should look at how the example script works
scripts/generic/ph_numbers.perl


> Sometimes the tag is not complete (as if it had been tokenized, which in
> principle was prevented by using the protected tokenization and a list of
> patterns):
>
> "with the ||| con <ne translation="@tag@" ||| 0.0166851 0.0176098
> 0.000606043 0.00561383 ||| 0-0 ||| 32 881 1 ||| |||"
>
> I am not an expert, so my guess might be totally wrong, but this makes me
> think that somehow MOSES also used the text within the tags in training. In
> a previous email I explained that I had encountered problems when using
> Chris Dyer's FastAlign because it converted all special characters to their
> corresponding codes, so I commented out that loop. Now I wonder whether
> this might be the cause of MOSES using the tags in training? How should I
> call the word aligner so that it ignores the tags?
>
>
> Best,
> Carla
>
> El 19.05.2015 11:58, Hieu Hoang escribi?:
>
>> it looks ok to me, not sure what could be wrong.
>>
>> i've added a daily test to ensure that the placeholder will work in
>> future. Perhaps you can have a look at the moses.ini file and
>> to-translate.txt files to see if there are any differences with yours
>>
>>
>> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
>> [4]
>>
>> Hieu Hoang
>> Researcher
>>
>> New York University, Abu Dhabi
>>
>> http://www.hoang.co.uk/hieu [1]
>>
>> On 19 May 2015 at 11:53, Carla Parra <carla.parra@hermestrans.com>
>> wrote:
>>
>> Dear Hieu,
>>>
>>> thanks for your reply. I attach the config file, my moses.ini (I
>>> think this is the one you want to get), and a few lines of our input
>>> file, already preprocessed. If you want the RAW lines I can also
>>> send them to you.
>>>
>>> I don't know if this will be a similar issue, but I tried the same
>>> strategy using the forced translations (<np
>>> translation="German">Deutsch</np>), and this morning I have observed
>>> the same, some tags are suddenly appearing in the translation.
>>>
>>> Thank you very much for your support!
>>>
>>> Carla
>>>
>>> El 19.05.2015 09:13, Hieu Hoang escribi?:
>>> what is the exact command you used to decode? Can you please
>>> provide
>>> the moses.ini file and a few lines of your input data for us to
>>> look
>>> at.
>>>
>>> Hieu Hoang
>>> Researcher
>>>
>>> New York University, Abu Dhabi
>>>
>>> http://www.hoang.co.uk/hieu [1] [3]
>>>
>>>
>>> On 18 May 2015 at 15:35, Carla Parra <carla.parra@hermestrans.com>
>>> wrote:
>>>
>>> Dear all,
>>>
>>> we just finished some experiments using placeables, and we have
>>> observed
>>> several issues that may be worth sharing. I don't know if someone
>>> has
>>> experienced the same, or you were already aware of this, but just
>>> in
>>> case:
>>>
>>> (1) Special characters must be scaped in the "entity" value field.
>>> Otherwise, the cause XML parsing errors at tuning (not at training,
>>> though!), and wrong values are retrieved from the tags (e.g. we had
>>> text
>>> with additional quotation marks, and this caused that the
>>> translation
>>> stopped at the first quotation mark, not yielding the complete
>>> "entity"
>>> value we had encoded).
>>>
>>> (2) <ne> tags are added to sentences as if they were computed as
>>> tokens
>>> during training. (i.e. not ignored, as they just contain the
>>> placeables).
>>> As an example, the English sentence "Allow simple password", is
>>> translated as "Permitir simple contrase?a <ne translation="@tag@"
>>> entity="&lt;/1&gt;">@tag@</ne> ."
>>>
>>> While the first issue is our fault, we do not know what causes the
>>> second one. We have followed the instructions at the MOSES advanced
>>> features site and thus specified "extract-settings = "--Placeholder
>>> @tag@"" in training and "-placeholder-factor 1 -xml-input
>>> exclusive" in
>>> the decoder and evaluation. Has anyone experienced the same thing
>>> and/or
>>> know how to solve this issue?
>>>
>>> Thank you very much. Best regards,
>>>
>>> Carla
>>>
>>> --
>>> Carla Parra Escart?n
>>> Marie Curie Experienced Researcher - EXPERT ITN
>>> http://expert-itn.eu/ [2] [1]
>>> Hermes Traducciones
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support [3] [2]
>>>
>>> Links:
>>> ------
>>> [1] http://expert-itn.eu/ [2]
>>> [2] http://mailman.mit.edu/mailman/listinfo/moses-support [3]
>>> [3] http://www.hoang.co.uk/hieu [1]
>>>
>>
>> --
>> Carla Parra Escart?n
>> Marie Curie Experienced Researcher - EXPERT ITN
>> http://expert-itn.eu/ [2]
>> Hermes Traducciones
>>
>>
>> Links:
>> ------
>> [1] http://www.hoang.co.uk/hieu
>> [2] http://expert-itn.eu/
>> [3] http://mailman.mit.edu/mailman/listinfo/moses-support
>> [4]
>>
>> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
>>
>
> --
> Carla Parra Escart?n
> Marie Curie Experienced Researcher - EXPERT ITN
> http://expert-itn.eu/
> Hermes Traducciones
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150519/df60bc2b/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 103, Issue 44
**********************************************

0 Response to "Moses-support Digest, Vol 103, Issue 44"

Post a Comment