Moses-support Digest, Vol 91, Issue 55

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Moses-support Digest, Vol 91, Issue 52 (Miles Osborne)
2. errors in training (charmaine ponay)


----------------------------------------------------------------------

Message: 1
Date: Fri, 30 May 2014 13:40:17 -0400
From: Miles Osborne <miles@inf.ed.ac.uk>
Subject: Re: [Moses-support] Moses-support Digest, Vol 91, Issue 52
To: Lane Schwartz <dowobeha@gmail.com>
Cc: Hieu Hoang <hieu.hoang@ed.ac.uk>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID:
<CAPRfTYrEp_J33z9tR0LTeybJKL6Uass_erPu6FP8-n4Dy7+fFQ@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

for those specific characters:

perl -C -pe 's/\x{200B}//g'< tmp/baa

but as Lane mentions, you probably need to somehow specify the set of
naughty characters you need to deal with.

Miles

On 30 May 2014 13:23, Lane Schwartz <dowobeha@gmail.com> wrote:
> We also used charlint. It might do what you want.
>
> On Fri, May 30, 2014 at 1:21 PM, Lane Schwartz <dowobeha@gmail.com> wrote:
>> As far as I know, no such general purpose tool exists. We wrote a
>> custom in-house script that removes many, but not all, possible
>> non-printing Unicode characters as part of our WMT submission.
>>
>> I am interested in writing one, though.
>>
>> I think the right way to do this would be to parse the Unicode
>> character database for all characters of certain classes, and build
>> the tool from that data.
>>
>> Lane
>>
>>
>> On Fri, May 30, 2014 at 1:01 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>> in the attached file, there are 2 or more non-printing chars on the 1st
>>> line, between the words 'place' and 'binding'. They should be
>>> removed/replaced with a space. Those chars are deleted by parsers, making
>>> the word alignments incorrect and crashing extract
>>>
>>> The 2nd line is perfectly good utf8. It shouldn't be touched.
>>>
>>> just another friday nlp malaise
>>>
>>>
>>>
>>> On 30 May 2014 17:51, Miles Osborne <miles@inf.ed.ac.uk> wrote:
>>>>
>>>> it is trivial to change it to say a ? mark.
>>>>
>>>> but I'm not sure what you want as output now. the original request
>>>> was for removing non-printable characters, which the Perl does,
>>>>
>>>> Miles
>>>>
>>>> On 30 May 2014 12:43, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>>> > forgot to say. The input is utf8. The snippet turns
>>>> > gonz?lez
>>>> > to
>>>> > gonz lez
>>>> >
>>>> >
>>>> > On 30 May 2014 17:22, Miles Osborne <miles@inf.ed.ac.uk> wrote:
>>>> >>
>>>> >> this perl snippet:
>>>> >>
>>>> >> $line =~ tr/\040-\176/ /c;
>>>> >>
>>>> >> On 30 May 2014 12:17, <moses-support-request@mit.edu> wrote:
>>>> >> > Send Moses-support mailing list submissions to
>>>> >> > moses-support@mit.edu
>>>> >> >
>>>> >> > To subscribe or unsubscribe via the World Wide Web, visit
>>>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >> > or, via email, send a message with subject or body 'help' to
>>>> >> > moses-support-request@mit.edu
>>>> >> >
>>>> >> > You can reach the person managing the list at
>>>> >> > moses-support-owner@mit.edu
>>>> >> >
>>>> >> > When replying, please edit your Subject line so it is more specific
>>>> >> > than "Re: Contents of Moses-support digest..."
>>>> >> >
>>>> >> >
>>>> >> > Today's Topics:
>>>> >> >
>>>> >> > 1. removing non-printing character (Hieu Hoang)
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > ----------------------------------------------------------------------
>>>> >> >
>>>> >> > Message: 1
>>>> >> > Date: Fri, 30 May 2014 16:24:30 +0100
>>>> >> > From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
>>>> >> > Subject: [Moses-support] removing non-printing character
>>>> >> > To: moses-support <moses-support@mit.edu>
>>>> >> > Message-ID:
>>>> >> >
>>>> >> > <CAEKMkbj4tEDZYVGeAStmg51+w-5SYE5YGRmibcYPC2j8YbKGfg@mail.gmail.com>
>>>> >> > Content-Type: text/plain; charset="utf-8"
>>>> >> >
>>>> >> > does anyone have a script/program that can remove all non-printing
>>>> >> > characters?
>>>> >> >
>>>> >> > I don't care if it's fast or slow, as long as it's ABSOLUTELY removes
>>>> >> > all
>>>> >> > non-printing chars
>>>> >> >
>>>> >> > --
>>>> >> > Hieu Hoang
>>>> >> > Research Associate
>>>> >> > University of Edinburgh
>>>> >> > http://www.hoang.co.uk/hieu
>>>> >> > -------------- next part --------------
>>>> >> > An HTML attachment was scrubbed...
>>>> >> > URL:
>>>> >> >
>>>> >> > http://mailman.mit.edu/mailman/private/moses-support/attachments/20140530/daee61ea/attachment-0001.htm
>>>> >> >
>>>> >> > ------------------------------
>>>> >> >
>>>> >> > _______________________________________________
>>>> >> > Moses-support mailing list
>>>> >> > Moses-support@mit.edu
>>>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >> >
>>>> >> >
>>>> >> > End of Moses-support Digest, Vol 91, Issue 52
>>>> >> > *********************************************
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> The University of Edinburgh is a charitable body, registered in
>>>> >> Scotland, with registration number SC005336.
>>>> >> _______________________________________________
>>>> >> Moses-support mailing list
>>>> >> Moses-support@mit.edu
>>>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Hieu Hoang
>>>> > Research Associate
>>>> > University of Edinburgh
>>>> > http://www.hoang.co.uk/hieu
>>>> >
>>>> >
>>>> > The University of Edinburgh is a charitable body, registered in
>>>> > Scotland, with registration number SC005336.
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>
>>>
>>>
>>> --
>>> Hieu Hoang
>>> Research Associate
>>> University of Edinburgh
>>> http://www.hoang.co.uk/hieu
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>>
>> --
>> When a place gets crowded enough to require ID's, social collapse is not
>> far away. It is time to go elsewhere. The best thing about space travel
>> is that it made it possible to go elsewhere.
>> -- R.A. Heinlein, "Time Enough For Love"
>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away. It is time to go elsewhere. The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 2
Date: Sat, 31 May 2014 00:04:41 -0700
From: charmaine ponay <csponay@gmail.com>
Subject: [Moses-support] errors in training
To: moses-support@mit.edu
Message-ID:
<CAB0nykdRbrkPPmv=JAijFwA2ij7HAAAs5s0S1GUQme+Bb=6bAg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

hi, what could be possible errors when there is an error in the training?

thanks

Regards,

*Charmaine Salvador - Ponay*
Instructor
Information and Computer Studies Dept.
University of Santo Tomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140531/fcec4984/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 91, Issue 55
*********************************************

0 Response to "Moses-support Digest, Vol 91, Issue 55"

Post a Comment