Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Unexpected behaviour of placeables (Carla Parra)
2. Re: Stripping carriage returns in FilePiece? (Kenneth Heafield)
----------------------------------------------------------------------
Message: 1
Date: Tue, 19 May 2015 13:02:10 +0200
From: Carla Parra <carla.parra@hermestrans.com>
Subject: Re: [Moses-support] Unexpected behaviour of placeables
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: Moses Support <moses-support@mit.edu>
Message-ID: <c72b46e32f47218b7ca6e3e004b017b1@hermestrans.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
I see! Should have thought of that... I will modify my script and change
it accordingly. Hopefully next round it will work better. Thanks!
Carla
El 19.05.2015 12:45, Hieu Hoang escribi?:
> Hieu Hoang
> Researcher
>
> New York University, Abu Dhabi
>
> http://www.hoang.co.uk/hieu [2]
>
> On 19 May 2015 at 14:39, Carla Parra <carla.parra@hermestrans.com>
> wrote:
>
>> Dear Hieu,
>>
>> thanks for looking into this! As far as I can see, your
>> to-translate.txt seems similar to mine (i.e. our tags look the
>> same).
>>
>> The moses.ini files however are a bit different. Ours was generated
>> by EMS. While we differ in the feature functions section, the
>> xml-input and the placeholder-factor are identical. I have an
>> additional weight section and in my mapping-steps I have "0 T 0",
>> while you only have "T 0". Could any of this be the cause?
>>
>> What I have observed is that the tags were correctly used where
>> they should be used, thus retrieving the right translations and
>> markup was removed. However, in some sentences there appears
>> suddenly a tag, as I illustrated yesterday in my example:
>>
>> "Allow simple password", is translated as "Permitir simple
>> contrase?a <ne translation="@tag@" entity="</1>">@tag@</ne>
>> ."
>>
>> The fact that in such cases the tags have not been removed by the
>> script doing so makes me think that they are somehow learnt in the
>> training process as individual tokens. I have checked the phrase
>> table, and I found things like:
>>
>> "with ||| con <ne translation="@tag@" entity="<2>">@tag@</ne>
>> ||| 0.106785 0.447898 0.000321641 4.17018e-05 ||| 0-0 ||| 5 1660 1
>> ||| |||"
>
> Ah, I see. In the training data, you should only add the @tag@, not
> the xml part.
>
> You should look at how the example script works?
> ?? scripts/generic/ph_numbers.perl
>
>> Sometimes the tag is not complete (as if it had been tokenized,
>> which in principle was prevented by using the protected tokenization
>> and a list of patterns):
>>
>> "with the ||| con <ne translation="@tag@" ||| 0.0166851 0.0176098
>> 0.000606043 0.00561383 ||| 0-0 ||| 32 881 1 ||| |||"
>>
>> I am not an expert, so my guess might be totally wrong, but this
>> makes me think that somehow MOSES also used the text within the tags
>> in training. In a previous email I explained that I had encountered
>> problems when using Chris Dyer's FastAlign because it converted all
>> special characters to their corresponding codes, so I commented out
>> that loop. Now I wonder whether this might be the cause of MOSES
>> using the tags in training? How should I call the word aligner so
>> that it ignores the tags?
>>
>> Best,
>> Carla
>>
>> El 19.05.2015 11:58, Hieu Hoang escribi?:
>> it looks ok to me, not sure what could be wrong.
>>
>> i've added a daily test to ensure that the placeholder will work in
>> future. Perhaps you can have a look at the moses.ini file and
>> to-translate.txt files to see if there are any differences with
>> yours
>> ?
>>
> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
>> [1]
>> [4]
>>
>> Hieu Hoang
>> Researcher
>>
>> New York University, Abu Dhabi
>>
>> http://www.hoang.co.uk/hieu [2] [1]
>>
>> On 19 May 2015 at 11:53, Carla Parra <carla.parra@hermestrans.com>
>> wrote:
>>
>> Dear Hieu,
>>
>> thanks for your reply. I attach the config file, my moses.ini (I
>> think this is the one you want to get), and a few lines of our
>> input
>> file, already preprocessed. If you want the RAW lines I can also
>> send them to you.
>>
>> I don't know if this will be a similar issue, but I tried the same
>> strategy using the forced translations (<np
>> translation="German">Deutsch</np>), and this morning I have
>> observed
>> the same, some tags are suddenly appearing in the translation.
>>
>> Thank you very much for your support!
>>
>> Carla
>>
>> El 19.05.2015 09:13, Hieu Hoang escribi?:
>> what is the exact command you used to decode? Can you please
>> provide
>> the moses.ini file and a few lines of your input data for us to
>> look
>> at.
>>
>> Hieu Hoang
>> Researcher
>>
>> New York University, Abu Dhabi
>>
>> http://www.hoang.co.uk/hieu [2] [1] [3]
>>
>> On 18 May 2015 at 15:35, Carla Parra <carla.parra@hermestrans.com>
>> wrote:
>>
>> Dear all,
>>
>> we just finished some experiments using placeables, and we have
>> observed
>> several issues that may be worth sharing. I don't know if someone
>> has
>> experienced the same, or you were already aware of this, but just
>> in
>> case:
>>
>> (1) Special characters must be scaped in the "entity" value field.
>> Otherwise, the cause XML parsing errors at tuning (not at training,
>> though!), and wrong values are retrieved from the tags (e.g. we had
>> text
>> with additional quotation marks, and this caused that the
>> translation
>> stopped at the first quotation mark, not yielding the complete
>> "entity"
>> value we had encoded).
>>
>> (2) <ne> tags are added to sentences as if they were computed as
>> tokens
>> during training. (i.e. not ignored, as they just contain the
>> placeables).
>> As an example, the English sentence "Allow simple password", is
>> translated as "Permitir simple contrase?a <ne translation="@tag@"
>> entity="</1>">@tag@</ne> ."
>>
>> While the first issue is our fault, we do not know what causes the
>> second one. We have followed the instructions at the MOSES advanced
>> features site and thus specified "extract-settings = "--Placeholder
>> @tag@"" in training and "-placeholder-factor 1 -xml-input
>> exclusive" in
>> the decoder and evaluation. Has anyone experienced the same thing
>> and/or
>> know how to solve this issue?
>>
>> Thank you very much. Best regards,
>>
>> Carla
>>
>> --
>> Carla Parra Escart?n
>> Marie Curie Experienced Researcher - EXPERT ITN
>> http://expert-itn.eu/ [3] [2] [1]
>> Hermes Traducciones
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support [4] [3] [2]
>>
>> Links:
>> ------
>> [1] http://expert-itn.eu/ [3] [2]
>> [2] http://mailman.mit.edu/mailman/listinfo/moses-support [4] [3]
>> [3] http://www.hoang.co.uk/hieu [2] [1]
>>
>> ?--
>> ?Carla Parra Escart?n
>> ?Marie Curie Experienced Researcher - EXPERT ITN
>> ?http://expert-itn.eu/ [3] [2]
>> ?Hermes Traducciones
>>
>> Links:
>> ------
>> [1] http://www.hoang.co.uk/hieu [2]
>> [2] http://expert-itn.eu/ [3]
>> [3] http://mailman.mit.edu/mailman/listinfo/moses-support [4]
>> [4]
>>
> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
>> [1]
>
> --
> Carla Parra Escart?n
> Marie Curie Experienced Researcher - EXPERT ITN
> http://expert-itn.eu/ [3]
> Hermes Traducciones
>
>
>
> Links:
> ------
> [1]
> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
> [2] http://www.hoang.co.uk/hieu
> [3] http://expert-itn.eu/
> [4] http://mailman.mit.edu/mailman/listinfo/moses-support
--
Carla Parra Escart?n
Marie Curie Experienced Researcher - EXPERT ITN
http://expert-itn.eu/
Hermes Traducciones
------------------------------
Message: 2
Date: Tue, 19 May 2015 11:09:44 -0400
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Stripping carriage returns in FilePiece?
To: moses-support@mit.edu
Message-ID: <555B5238.80402@kheafield.com>
Content-Type: text/plain; charset=windows-1252
On 05/19/15 00:45, Jeroen Vermeulen wrote:
> On 19/05/15 09:22, Kenneth Heafield wrote:
>> There are non-traditional uses like ReadLine('\0') to read
>> null-delimited tokens.
>
> While exploring this change I didn't find a single use of that parameter
> in the Moses source tree!
It's in some new code in KenLM master that I wouldn't feel bad about
changing.
>
> But there may be uses outside the project of course. That's one of the
> dangers of duplicating code. Is there an overview somewhere of what
> code in Moses was copied in but is actually maintained elsewhere?
We tried git submodules and it was far worse :-\
>
>> But I'd support Jeroen here: the default ReadLine() with no argument
>> should swallow \r.
>
> To be clear though: with my change, FilePiece remains an exact binary
> representation of the file. It's just that ReadLine() returns a
> slightly shorter piece of it, just like it already swallows \n.
>
> (Side note: I've had a quick look at StringPiece now and it looks like a
> really useful abstraction for performance. And apparently that same
> abstraction is going to be in the C++17 standard library as string_view.)
Welcome! Or string_ref as the case may be. I will happily upgrade to
the standard version when the time comes.
This is also why I consider most instances of vector<string> to be a
performance bug, including util/tokenize.hh . Though I understand you
were just moving slow code around.
>
>> In any case if you're going to change code there, can you do it upstream
>> in github.com/kpu/kenlm ? I just gave you commit access.
>
> Will do, thanks. What's the procedure for "downstreaming" that into Moses?
Saw your commit, thanks. It gets copied on occasion and I include the
kenlm git commit it came from. If you want to copy it now that's fine.
A bit more formally one could squash and merge.
>
>
>> Also, how would you feel if I changed it to be FakeIFStream with
>> operator>> extraction, at least for integer/float types?
>
> Sorry, I haven't looked into FakeIFStream at all yet, and I may not
> fully understand the question.
It doesn't exist yet. I am contemplating refactoring FilePiece to have
operator>> and renaming it to FakeIFStream.
>
>
> Jeroen
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 103, Issue 45
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 103, Issue 45"
Post a Comment