Moses-support Digest, Vol 92, Issue 3

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Moses-support Digest, Vol 91, Issue 52 (Hieu Hoang)
2. adding arbitary feature functions on the command line (Hieu Hoang)
3. Re: Moses-support Digest, Vol 91, Issue 52 (Philipp Koehn)


----------------------------------------------------------------------

Message: 1
Date: Sun, 01 Jun 2014 20:00:26 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Moses-support Digest, Vol 91, Issue 52
To: Philipp Koehn <pkoehn@inf.ed.ac.uk>, Hieu Hoang
<Hieu.Hoang@ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <538B784A.6050105@gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed

The tokenizer.perl is getting too large and unwieldy.

It duplicates escape-special-chars and i don't want it to duplicate this
standalone functionality.


On 01/06/14 06:02, Philipp Koehn wrote:
> Hi,
>
> should that be part of the tokenizer and/or the
> escape-special-characters script?
>
> -phi
>
> On Sat, May 31, 2014 at 8:04 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>> thanks everybody.
>>
>> I took marcin's suggestion and wrote a wrapper script. It seems to be doingrt
>> ok. It's gotten past the previous step that it failed on, BLEU scores
>> hasn't been affected
>>
>> i've added it to moses if anyone wants it
>>
>> https://github.com/moses-smt/mosesdecoder/commit/57235268323f97c53a9f214e3bec6e722437230f
>>
>>
>> On 30 May 2014 18:07, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:
>>> How's this?
>>>
>>> cat baa | perl -C -pe 'chomp; s/\p{C}/ /g; $_="$_\n"'
>>>
>>>
>>> W dniu 30.05.2014 18:01, Hieu Hoang pisze:
>>>
>>> in the attached file, there are 2 or more non-printing chars on the 1st
>>> line, between the words 'place' and 'binding'. They should be
>>> removed/replaced with a space. Those chars are deleted by parsers, making
>>> the word alignments incorrect and crashing extract
>>>
>>> The 2nd line is perfectly good utf8. It shouldn't be touched.
>>>
>>> just another friday nlp malaise
>>>
>>>
>>>
>>> On 30 May 2014 17:51, Miles Osborne <miles@inf.ed.ac.uk> wrote:
>>>> it is trivial to change it to say a ? mark.
>>>>
>>>> but I'm not sure what you want as output now. the original request
>>>> was for removing non-printable characters, which the Perl does,
>>>>
>>>> Miles
>>>>
>>>> On 30 May 2014 12:43, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>>>> forgot to say. The input is utf8. The snippet turns
>>>>> gonz?lez
>>>>> to
>>>>> gonz lez
>>>>>
>>>>>
>>>>> On 30 May 2014 17:22, Miles Osborne <miles@inf.ed.ac.uk> wrote:
>>>>>> this perl snippet:
>>>>>>
>>>>>> $line =~ tr/\040-\176/ /c;
>>>>>>
>>>>>> On 30 May 2014 12:17, <moses-support-request@mit.edu> wrote:
>>>>>>> Send Moses-support mailing list submissions to
>>>>>>> moses-support@mit.edu
>>>>>>>
>>>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>> or, via email, send a message with subject or body 'help' to
>>>>>>> moses-support-request@mit.edu
>>>>>>>
>>>>>>> You can reach the person managing the list at
>>>>>>> moses-support-owner@mit.edu
>>>>>>>
>>>>>>> When replying, please edit your Subject line so it is more specific
>>>>>>> than "Re: Contents of Moses-support digest..."
>>>>>>>
>>>>>>>
>>>>>>> Today's Topics:
>>>>>>>
>>>>>>> 1. removing non-printing character (Hieu Hoang)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>> Message: 1
>>>>>>> Date: Fri, 30 May 2014 16:24:30 +0100
>>>>>>> From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
>>>>>>> Subject: [Moses-support] removing non-printing character
>>>>>>> To: moses-support <moses-support@mit.edu>
>>>>>>> Message-ID:
>>>>>>>
>>>>>>> <CAEKMkbj4tEDZYVGeAStmg51+w-5SYE5YGRmibcYPC2j8YbKGfg@mail.gmail.com>
>>>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>>>
>>>>>>> does anyone have a script/program that can remove all non-printing
>>>>>>> characters?
>>>>>>>
>>>>>>> I don't care if it's fast or slow, as long as it's ABSOLUTELY
>>>>>>> removes
>>>>>>> all
>>>>>>> non-printing chars
>>>>>>>
>>>>>>> --
>>>>>>> Hieu Hoang
>>>>>>> Research Associate
>>>>>>> University of Edinburgh
>>>>>>> http://www.hoang.co.uk/hieu
>>>>>>> -------------- next part --------------
>>>>>>> An HTML attachment was scrubbed...
>>>>>>> URL:
>>>>>>>
>>>>>>> http://mailman.mit.edu/mailman/private/moses-support/attachments/20140530/daee61ea/attachment-0001.htm
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> Moses-support@mit.edu
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>>>
>>>>>>> End of Moses-support Digest, Vol 91, Issue 52
>>>>>>> *********************************************
>>>>>>
>>>>>>
>>>>>> --
>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>> Scotland, with registration number SC005336.
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> Moses-support@mit.edu
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Hieu Hoang
>>>>> Research Associate
>>>>> University of Edinburgh
>>>>> http://www.hoang.co.uk/hieu
>>>>>
>>>>>
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>>
>>>>
>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>
>>>
>>> --
>>> Hieu Hoang
>>> Research Associate
>>> University of Edinburgh
>>> http://www.hoang.co.uk/hieu
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>> --
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>



------------------------------

Message: 2
Date: Sun, 01 Jun 2014 20:09:06 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: [Moses-support] adding arbitary feature functions on the
command line
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>, moses-support@mit.edu
Message-ID: <538B7A52.2020405@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

I've just added some more arguments to the decoder so that you can add
new feature functions (+weights) to the decoder without having to change
the ini file. It will work during tuning too.
./moses -feature-add "Feature function args..." -weight-add "FFName=
0.1"

To use it in the EMS:
[TUNING]
decoder-settings = "-threads 8 -feature-add \"Feature function
args...\" -weight-add \"FFName= 0.1\""

[EVALUATION]
decoder-settings = "-threads 8 -feature-add "Feature function
args..." -weight-add "FFName= 0.1""

Note the slight difference in escaping/not escaping quotes


------------------------------

Message: 3
Date: Sun, 1 Jun 2014 22:05:55 +0200
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] Moses-support Digest, Vol 91, Issue 52
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>, Hieu Hoang
<hieu.hoang@ed.ac.uk>
Message-ID:
<CAAFADDDQdanf=Wqqx6pCPTt-yKGkeUXB0v565EBz7kkB7GHuAg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

Fair enough - one can always pipe through both.

-phi
On Jun 1, 2014 3:00 PM, "Hieu Hoang" <hieuhoang@gmail.com> wrote:

> The tokenizer.perl is getting too large and unwieldy.
>
> It duplicates escape-special-chars and i don't want it to duplicate this
> standalone functionality.
>
>
> On 01/06/14 06:02, Philipp Koehn wrote:
>
>> Hi,
>>
>> should that be part of the tokenizer and/or the
>> escape-special-characters script?
>>
>> -phi
>>
>> On Sat, May 31, 2014 at 8:04 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>
>>> thanks everybody.
>>>
>>> I took marcin's suggestion and wrote a wrapper script. It seems to be
>>> doingrt
>>> ok. It's gotten past the previous step that it failed on, BLEU scores
>>> hasn't been affected
>>>
>>> i've added it to moses if anyone wants it
>>>
>>> https://github.com/moses-smt/mosesdecoder/commit/
>>> 57235268323f97c53a9f214e3bec6e722437230f
>>>
>>>
>>> On 30 May 2014 18:07, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:
>>>
>>>> How's this?
>>>>
>>>> cat baa | perl -C -pe 'chomp; s/\p{C}/ /g; $_="$_\n"'
>>>>
>>>>
>>>> W dniu 30.05.2014 18:01, Hieu Hoang pisze:
>>>>
>>>> in the attached file, there are 2 or more non-printing chars on the 1st
>>>> line, between the words 'place' and 'binding'. They should be
>>>> removed/replaced with a space. Those chars are deleted by parsers,
>>>> making
>>>> the word alignments incorrect and crashing extract
>>>>
>>>> The 2nd line is perfectly good utf8. It shouldn't be touched.
>>>>
>>>> just another friday nlp malaise
>>>>
>>>>
>>>>
>>>> On 30 May 2014 17:51, Miles Osborne <miles@inf.ed.ac.uk> wrote:
>>>>
>>>>> it is trivial to change it to say a ? mark.
>>>>>
>>>>> but I'm not sure what you want as output now. the original request
>>>>> was for removing non-printable characters, which the Perl does,
>>>>>
>>>>> Miles
>>>>>
>>>>> On 30 May 2014 12:43, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>>>>
>>>>>> forgot to say. The input is utf8. The snippet turns
>>>>>> gonz?lez
>>>>>> to
>>>>>> gonz lez
>>>>>>
>>>>>>
>>>>>> On 30 May 2014 17:22, Miles Osborne <miles@inf.ed.ac.uk> wrote:
>>>>>>
>>>>>>> this perl snippet:
>>>>>>>
>>>>>>> $line =~ tr/\040-\176/ /c;
>>>>>>>
>>>>>>> On 30 May 2014 12:17, <moses-support-request@mit.edu> wrote:
>>>>>>>
>>>>>>>> Send Moses-support mailing list submissions to
>>>>>>>> moses-support@mit.edu
>>>>>>>>
>>>>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>> or, via email, send a message with subject or body 'help' to
>>>>>>>> moses-support-request@mit.edu
>>>>>>>>
>>>>>>>> You can reach the person managing the list at
>>>>>>>> moses-support-owner@mit.edu
>>>>>>>>
>>>>>>>> When replying, please edit your Subject line so it is more specific
>>>>>>>> than "Re: Contents of Moses-support digest..."
>>>>>>>>
>>>>>>>>
>>>>>>>> Today's Topics:
>>>>>>>>
>>>>>>>> 1. removing non-printing character (Hieu Hoang)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ----------
>>>>>>>>
>>>>>>>> Message: 1
>>>>>>>> Date: Fri, 30 May 2014 16:24:30 +0100
>>>>>>>> From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
>>>>>>>> Subject: [Moses-support] removing non-printing character
>>>>>>>> To: moses-support <moses-support@mit.edu>
>>>>>>>> Message-ID:
>>>>>>>>
>>>>>>>> <CAEKMkbj4tEDZYVGeAStmg51+w-5SYE5YGRmibcYPC2j8YbKGfg@mail.gmail.com
>>>>>>>> >
>>>>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>>>>
>>>>>>>> does anyone have a script/program that can remove all non-printing
>>>>>>>> characters?
>>>>>>>>
>>>>>>>> I don't care if it's fast or slow, as long as it's ABSOLUTELY
>>>>>>>> removes
>>>>>>>> all
>>>>>>>> non-printing chars
>>>>>>>>
>>>>>>>> --
>>>>>>>> Hieu Hoang
>>>>>>>> Research Associate
>>>>>>>> University of Edinburgh
>>>>>>>> http://www.hoang.co.uk/hieu
>>>>>>>> -------------- next part --------------
>>>>>>>> An HTML attachment was scrubbed...
>>>>>>>> URL:
>>>>>>>>
>>>>>>>> http://mailman.mit.edu/mailman/private/moses-support/
>>>>>>>> attachments/20140530/daee61ea/attachment-0001.htm
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Moses-support mailing list
>>>>>>>> Moses-support@mit.edu
>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>
>>>>>>>>
>>>>>>>> End of Moses-support Digest, Vol 91, Issue 52
>>>>>>>> *********************************************
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>> Scotland, with registration number SC005336.
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> Moses-support@mit.edu
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Hieu Hoang
>>>>>> Research Associate
>>>>>> University of Edinburgh
>>>>>> http://www.hoang.co.uk/hieu
>>>>>>
>>>>>>
>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>> Scotland, with registration number SC005336.
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>>
>>>>>
>>>>
>>>> --
>>>> Hieu Hoang
>>>> Research Associate
>>>> University of Edinburgh
>>>> http://www.hoang.co.uk/hieu
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>> --
>>> Hieu Hoang
>>> Research Associate
>>> University of Edinburgh
>>> http://www.hoang.co.uk/hieu
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140601/74215202/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 92, Issue 3
********************************************

0 Response to "Moses-support Digest, Vol 92, Issue 3"

Post a Comment