Moses-support Digest, Vol 92, Issue 1

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Moses-support Digest, Vol 91, Issue 52 (Hieu Hoang)
2. Re: errors in training (Hieu Hoang)
3. Re: Moses-support Digest, Vol 91, Issue 52 (Philipp Koehn)


----------------------------------------------------------------------

Message: 1
Date: Sat, 31 May 2014 19:04:40 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Moses-support Digest, Vol 91, Issue 52
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbgVgPm_O=KjkZoL9ahknis_OScv3mZ3495Kdb2kuM-ysg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

thanks everybody.

I took marcin's suggestion and wrote a wrapper script. It seems to be doing
ok. It's gotten past the previous step that it failed on, BLEU scores
hasn't been affected

i've added it to moses if anyone wants it

https://github.com/moses-smt/mosesdecoder/commit/57235268323f97c53a9f214e3bec6e722437230f


On 30 May 2014 18:07, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:

> How's this?
>
> cat baa | perl -C -pe 'chomp; s/\p{C}/ /g; $_="$_\n"'
>
>
> W dniu 30.05.2014 18:01, Hieu Hoang pisze:
>
> in the attached file, there are 2 or more non-printing chars on the 1st
> line, between the words 'place' and 'binding'. They should be
> removed/replaced with a space. Those chars are deleted by parsers, making
> the word alignments incorrect and crashing extract
>
> The 2nd line is perfectly good utf8. It shouldn't be touched.
>
> just another friday nlp malaise
>
>
>
> On 30 May 2014 17:51, Miles Osborne <miles@inf.ed.ac.uk> wrote:
>
>> it is trivial to change it to say a ? mark.
>>
>> but I'm not sure what you want as output now. the original request
>> was for removing non-printable characters, which the Perl does,
>>
>> Miles
>>
>> On 30 May 2014 12:43, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>> > forgot to say. The input is utf8. The snippet turns
>> > gonz?lez
>> > to
>> > gonz lez
>> >
>> >
>> > On 30 May 2014 17:22, Miles Osborne <miles@inf.ed.ac.uk> wrote:
>> >>
>> >> this perl snippet:
>> >>
>> >> $line =~ tr/\040-\176/ /c;
>> >>
>> >> On 30 May 2014 12:17, <moses-support-request@mit.edu> wrote:
>> >> > Send Moses-support mailing list submissions to
>> >> > moses-support@mit.edu
>> >> >
>> >> > To subscribe or unsubscribe via the World Wide Web, visit
>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >> > or, via email, send a message with subject or body 'help' to
>> >> > moses-support-request@mit.edu
>> >> >
>> >> > You can reach the person managing the list at
>> >> > moses-support-owner@mit.edu
>> >> >
>> >> > When replying, please edit your Subject line so it is more specific
>> >> > than "Re: Contents of Moses-support digest..."
>> >> >
>> >> >
>> >> > Today's Topics:
>> >> >
>> >> > 1. removing non-printing character (Hieu Hoang)
>> >> >
>> >> >
>> >> >
>> ----------------------------------------------------------------------
>> >> >
>> >> > Message: 1
>> >> > Date: Fri, 30 May 2014 16:24:30 +0100
>> >> > From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
>> >> > Subject: [Moses-support] removing non-printing character
>> >> > To: moses-support <moses-support@mit.edu>
>> >> > Message-ID:
>> >> >
>> >> > <CAEKMkbj4tEDZYVGeAStmg51+w-5SYE5YGRmibcYPC2j8YbKGfg@mail.gmail.com>
>> >> > Content-Type: text/plain; charset="utf-8"
>> >> >
>> >> > does anyone have a script/program that can remove all non-printing
>> >> > characters?
>> >> >
>> >> > I don't care if it's fast or slow, as long as it's ABSOLUTELY removes
>> >> > all
>> >> > non-printing chars
>> >> >
>> >> > --
>> >> > Hieu Hoang
>> >> > Research Associate
>> >> > University of Edinburgh
>> >> > http://www.hoang.co.uk/hieu
>> >> > -------------- next part --------------
>> >> > An HTML attachment was scrubbed...
>> >> > URL:
>> >> >
>> http://mailman.mit.edu/mailman/private/moses-support/attachments/20140530/daee61ea/attachment-0001.htm
>> >> >
>> >> > ------------------------------
>> >> >
>> >> > _______________________________________________
>> >> > Moses-support mailing list
>> >> > Moses-support@mit.edu
>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >> >
>> >> >
>> >> > End of Moses-support Digest, Vol 91, Issue 52
>> >> > *********************************************
>> >>
>> >>
>> >>
>> >> --
>> >> The University of Edinburgh is a charitable body, registered in
>> >> Scotland, with registration number SC005336.
>> >> _______________________________________________
>> >> Moses-support mailing list
>> >> Moses-support@mit.edu
>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>> >
>> >
>> > --
>> > Hieu Hoang
>> > Research Associate
>> > University of Edinburgh
>> > http://www.hoang.co.uk/hieu
>> >
>> >
>> > The University of Edinburgh is a charitable body, registered in
>> > Scotland, with registration number SC005336.
>> >
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140531/fa318056/attachment-0001.htm

------------------------------

Message: 2
Date: Sat, 31 May 2014 19:06:08 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] errors in training
To: charmaine ponay <csponay@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbgaus0K_gQ=FB6vU=ODCq7Zyn=feS+P+BX_Nrp=M7qLdg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

there are many possible errors. Look through the mailing list archives to
see what other people have reported
https://www.mail-archive.com/moses-support@mit.edu/


On 31 May 2014 08:04, charmaine ponay <csponay@gmail.com> wrote:

> hi, what could be possible errors when there is an error in the training?
>
> thanks
>
> Regards,
>
> *Charmaine Salvador - Ponay*
> Instructor
> Information and Computer Studies Dept.
> University of Santo Tomas
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140531/9b09b8db/attachment-0001.htm

------------------------------

Message: 3
Date: Sun, 1 Jun 2014 07:02:12 +0200
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] Moses-support Digest, Vol 91, Issue 52
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAAFADDCUv2BhHVH4TnLFrr6yyPj68j_pHcYMFmOr1YybBdbnsQ@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi,

should that be part of the tokenizer and/or the
escape-special-characters script?

-phi

On Sat, May 31, 2014 at 8:04 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
> thanks everybody.
>
> I took marcin's suggestion and wrote a wrapper script. It seems to be doing
> ok. It's gotten past the previous step that it failed on, BLEU scores
> hasn't been affected
>
> i've added it to moses if anyone wants it
>
> https://github.com/moses-smt/mosesdecoder/commit/57235268323f97c53a9f214e3bec6e722437230f
>
>
> On 30 May 2014 18:07, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:
>>
>> How's this?
>>
>> cat baa | perl -C -pe 'chomp; s/\p{C}/ /g; $_="$_\n"'
>>
>>
>> W dniu 30.05.2014 18:01, Hieu Hoang pisze:
>>
>> in the attached file, there are 2 or more non-printing chars on the 1st
>> line, between the words 'place' and 'binding'. They should be
>> removed/replaced with a space. Those chars are deleted by parsers, making
>> the word alignments incorrect and crashing extract
>>
>> The 2nd line is perfectly good utf8. It shouldn't be touched.
>>
>> just another friday nlp malaise
>>
>>
>>
>> On 30 May 2014 17:51, Miles Osborne <miles@inf.ed.ac.uk> wrote:
>>>
>>> it is trivial to change it to say a ? mark.
>>>
>>> but I'm not sure what you want as output now. the original request
>>> was for removing non-printable characters, which the Perl does,
>>>
>>> Miles
>>>
>>> On 30 May 2014 12:43, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>> > forgot to say. The input is utf8. The snippet turns
>>> > gonz?lez
>>> > to
>>> > gonz lez
>>> >
>>> >
>>> > On 30 May 2014 17:22, Miles Osborne <miles@inf.ed.ac.uk> wrote:
>>> >>
>>> >> this perl snippet:
>>> >>
>>> >> $line =~ tr/\040-\176/ /c;
>>> >>
>>> >> On 30 May 2014 12:17, <moses-support-request@mit.edu> wrote:
>>> >> > Send Moses-support mailing list submissions to
>>> >> > moses-support@mit.edu
>>> >> >
>>> >> > To subscribe or unsubscribe via the World Wide Web, visit
>>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >> > or, via email, send a message with subject or body 'help' to
>>> >> > moses-support-request@mit.edu
>>> >> >
>>> >> > You can reach the person managing the list at
>>> >> > moses-support-owner@mit.edu
>>> >> >
>>> >> > When replying, please edit your Subject line so it is more specific
>>> >> > than "Re: Contents of Moses-support digest..."
>>> >> >
>>> >> >
>>> >> > Today's Topics:
>>> >> >
>>> >> > 1. removing non-printing character (Hieu Hoang)
>>> >> >
>>> >> >
>>> >> >
>>> >> > ----------------------------------------------------------------------
>>> >> >
>>> >> > Message: 1
>>> >> > Date: Fri, 30 May 2014 16:24:30 +0100
>>> >> > From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
>>> >> > Subject: [Moses-support] removing non-printing character
>>> >> > To: moses-support <moses-support@mit.edu>
>>> >> > Message-ID:
>>> >> >
>>> >> > <CAEKMkbj4tEDZYVGeAStmg51+w-5SYE5YGRmibcYPC2j8YbKGfg@mail.gmail.com>
>>> >> > Content-Type: text/plain; charset="utf-8"
>>> >> >
>>> >> > does anyone have a script/program that can remove all non-printing
>>> >> > characters?
>>> >> >
>>> >> > I don't care if it's fast or slow, as long as it's ABSOLUTELY
>>> >> > removes
>>> >> > all
>>> >> > non-printing chars
>>> >> >
>>> >> > --
>>> >> > Hieu Hoang
>>> >> > Research Associate
>>> >> > University of Edinburgh
>>> >> > http://www.hoang.co.uk/hieu
>>> >> > -------------- next part --------------
>>> >> > An HTML attachment was scrubbed...
>>> >> > URL:
>>> >> >
>>> >> > http://mailman.mit.edu/mailman/private/moses-support/attachments/20140530/daee61ea/attachment-0001.htm
>>> >> >
>>> >> > ------------------------------
>>> >> >
>>> >> > _______________________________________________
>>> >> > Moses-support mailing list
>>> >> > Moses-support@mit.edu
>>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >> >
>>> >> >
>>> >> > End of Moses-support Digest, Vol 91, Issue 52
>>> >> > *********************************************
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> The University of Edinburgh is a charitable body, registered in
>>> >> Scotland, with registration number SC005336.
>>> >> _______________________________________________
>>> >> Moses-support mailing list
>>> >> Moses-support@mit.edu
>>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Hieu Hoang
>>> > Research Associate
>>> > University of Edinburgh
>>> > http://www.hoang.co.uk/hieu
>>> >
>>> >
>>> > The University of Edinburgh is a charitable body, registered in
>>> > Scotland, with registration number SC005336.
>>> >
>>>
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>
>>
>>
>> --
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 92, Issue 1
********************************************

0 Response to "Moses-support Digest, Vol 92, Issue 1"

Post a Comment