Send Moses-support mailing list submissions to
	moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
	http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
	moses-support-request@mit.edu
You can reach the person managing the list at
	moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
   1. Translating words with apostrophies (Shani Shalgi)
   2. Re: Translating words with apostrophies (Vincent Nguyen)
----------------------------------------------------------------------
Message: 1
Date: Sun, 3 Apr 2016 12:42:39 +0300
From: Shani Shalgi <shanishalgi@gmail.com>
Subject: [Moses-support] Translating words with apostrophies
To: Moses-support@mit.edu
Message-ID:
	<CAJ5=UKTjdLE8b_5Avv__XvqM-KeE2kowLEyyViUgBcVzJ100PQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
  I'm new to Moses, have several questions, but I'll start with asking what
happens to apostrophes.
  What I see is that the tokenizer transforms words with apostrophes like
this :
it's --> it ' s
c'est une belle journ?e --> c ' est une belle journ?e
How does this work in translating the meaning of the word it's (English) /
c'est (French)?
I'm using the baseline model which I trained according to the tutorial.
I assume I need to send text tokenized in the same way to be translated
(it's --> it ' s; Otherwise any word that has an apostrophe is not
translated.)
I assumed it ' s (or c ' est) would be considered a phrase, however, I
notice that only the word est is translated (and this is just one example,
in l'?quipe only the word equipe is translated an so forth...)
Am I doing sometihng wrong or misunderstanding something? Or do I need to
change the tokenizer to accept ' in the middle of words?
Thanks in advance,
Shani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160403/2ed2f192/attachment-0001.html
------------------------------
Message: 2
Date: Sun, 3 Apr 2016 17:06:43 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] Translating words with apostrophies
To: moses-support@mit.edu
Message-ID: <57013183.5090908@neuf.fr>
Content-Type: text/plain; charset="windows-1252"
Apostrophe is tricky to handle properly
the tokenizer is language sensitive (see -l option)
in French :
l'?t? => l' ?t? [with a space between ; and ?]
in English :
today's story => today 's story
BUT
the issue is sometime in corpora you will find some misplaced spaces 
before or after the apostrophe
therefore you may get ' as individual tokens.
the other issue is that in corpora you will find various kind of 
apostrophes with various UTF-8 sequences.
You may use the normalize-punctuation.perl script to correct these.
Le 03/04/2016 11:42, Shani Shalgi a ?crit :
> Hi,
>   I'm new to Moses, have several questions, but I'll start with asking 
> what happens to apostrophes.
>   What I see is that the tokenizer transforms words with apostrophes 
> like this :
>
> it's --> it ' s
> c'est une belle journ?e --> c ' est une belle journ?e
>
> How does this work in translating the meaning of the word it's 
> (English) / c'est (French)?
> I'm using the baseline model which I trained according to the tutorial.
> I assume I need to send text tokenized in the same way to be 
> translated (it's --> it ' s; Otherwise any word that has an apostrophe 
> is not translated.)
> I assumed it ' s (or c ' est) would be considered a phrase, however, I 
> notice that only the word est is translated (and this is just one 
> example, in l'?quipe only the word equipe is translated an so forth...)
>
> Am I doing sometihng wrong or misunderstanding something? Or do I need 
> to change the tokenizer to accept ' in the middle of words?
>
> Thanks in advance,
> Shani
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160403/b4a10100/attachment-0001.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 114, Issue 4
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 114, Issue 4"
Post a Comment