Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Tokenization problem (Ihab Ramadan)
2. Re: Tokenization problem (Tom Hoar)
----------------------------------------------------------------------
Message: 1
Date: Wed, 14 Jan 2015 11:37:22 +0200
From: "Ihab Ramadan" <i.ramadan@saudisoft.com>
Subject: Re: [Moses-support] Tokenization problem
To: <moses-support@mit.edu>
Message-ID: <005201d02fdd$b552e150$1ff8a3f0$@saudisoft.com>
Content-Type: text/plain; charset="iso-8859-1"
Dears,
I found the problem
At the line number 289 in the tokenizer.perl script just add a space like
that
The original code
$text =~ s/([\p{IsAlpha}])[']([\p{IsAlpha}])/$1 ' $2/g;
The modified one
$text =~ s/([\p{IsAlpha}])[']([\p{IsAlpha}])/$1 ' $2/g;
By this modification tokenization of files will be the same as tokenizing
one segment
Thanks
From: Ihab Ramadan [mailto:i.ramadan@saudisoft.com]
Sent: Wednesday, January 14, 2015 11:14 AM
To: moses-support@mit.edu
Subject: RE: Tokenization problem
Dears,
I still have this problem, for not confusing the decoder I used the
??no-escape? parameter in the tokenizer.perl script but still have the
problem of adding extra space after quotations for tokenizing files however
in tokenizing a segment it comes without the extra space
For example
In the file
?which will guide you through connecting and configuring your printer's
wireless connection. ? ? ?which will guide you through connecting and
configuring your printer ' s wireless connection .?
As a segment
?which will guide you through connecting and configuring your printer's
wireless connection. ? ? ?which will guide you through connecting and
configuring your printer 's wireless connection .?
I wonder if it is the same script why it generated two different outputs
I have no experience in perl so I could not get the line of code which
differ between if the segment in a file or just one segment passed as a
parameter to the script
Please help
From: Ihab Ramadan [mailto:i.ramadan@saudisoft.com]
Sent: Monday, January 5, 2015 10:09 AM
To: moses-support@mit.edu
Subject: Tokenization problem
Dears,
Using the tokenizer on the training files replaces the apostrophes with
?' s? (with space) but if I use the same script to tokenize a sentence
it makes the apostrophes to be ?'s? (without a space)
This problem confuse the decoder while translation
How to solve this peoblem
Thanks
Best Regards
Ihab Ramadan| Senior Developer| <http://www.saudisoft.com/> Saudisoft -
Egypt | Tel +2 02 330 320 37 Ext- 0 | Mob+201007570826 | Fax+20233032036 |
Follow us on
<http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=V
SRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Apri
mary> linked |
<https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bo
okmark> ZA102637861 | <https://twitter.com/Saudisoft> ZA102637858
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150114/f1522b2b/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1314 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150114/f1522b2b/attachment-0003.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1317 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150114/f1522b2b/attachment-0004.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1351 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150114/f1522b2b/attachment-0005.gif
------------------------------
Message: 2
Date: Wed, 14 Jan 2015 18:48:07 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] Tokenization problem
To: moses-support@mit.edu
Message-ID: <54B65777.1050006@precisiontranslationtools.com>
Content-Type: text/plain; charset="windows-1252"
I don't see the problem. I get the same results with the original
tokenizer.perl script with the command line "echo" or piping from a
file. I.e. no space between the apostrophe and "s"
tahoar@asus-notebook:~$ echo "which will guide you through connecting
and configuring your printer's wireless connection." | tokenizer.perl -q
-l en
which will guide you through connecting and configuring your printer
's wireless connection .
tahoar@asus-notebook:~$ tokenizer.perl -q -l en < test.txt
which will guide you through connecting and configuring your printer
's wireless connection .
which will guide you through connecting and configuring your printer
's wireless connection .
which will guide you through connecting and configuring your printer
's wireless connection .
which will guide you through connecting and configuring your printer
's wireless connection .
which will guide you through connecting and configuring your printer
's wireless connection .
(five copies of your sentence in test.txt)
On 01/14/2015 04:37 PM, Ihab Ramadan wrote:
>
> Dears,
>
> I found the problem
>
> At the line number 289 in the tokenizer.perl script just add a space
> like that
>
> The original code
>
> $text =~ s/([\p{IsAlpha}])[']([\p{IsAlpha}])/$1 ' $2/g;
>
> The modified one
>
> $text =~ s/([\p{IsAlpha}])[']([\p{IsAlpha}])/$1 ' $2/g;
>
> By this modification tokenization of files will be the same as
> tokenizing one segment
>
> Thanks
>
> *From:*Ihab Ramadan [mailto:i.ramadan@saudisoft.com]
> *Sent:* Wednesday, January 14, 2015 11:14 AM
> *To:* moses-support@mit.edu
> *Subject:* RE: Tokenization problem
>
> Dears,
>
> I still have this problem, for not confusing the decoder I used the
> ??no-escape? parameter in the tokenizer.perl script but still have the
> problem of adding extra space after quotations for tokenizing files
> however in tokenizing a segment it comes without the extra space
>
> For example
>
> In the file
>
> ?which will guide you through connecting and configuring your
> printer's wireless connection. ? ??which will guide you through
> connecting and configuring your printer ' s wireless connection .?
>
> As a segment
>
> ?which will guide you through connecting and configuring your
> printer's wireless connection. ? ??which will guide you through
> connecting and configuring your printer 's wireless connection .?
>
> I wonder if it is the same script why it generated two different outputs
>
> I have no experience in perl so I could not get the line of code which
> differ between if the segment in a file or just one segment passed as
> a parameter to the script
>
> Please help
>
> *From:*Ihab Ramadan [mailto:i.ramadan@saudisoft.com]
> *Sent:* Monday, January 5, 2015 10:09 AM
> *To:* moses-support@mit.edu <mailto:moses-support@mit.edu>
> *Subject:* Tokenization problem
>
> Dears,
>
> Using the tokenizer on the training files replaces the apostrophes
> with ?' s? (with space) but if I use the same script to tokenize
> a sentence it makes the apostrophes to be ?'s? (without a space)
>
> This problem confuse the decoder while translation
>
> How to solve this peoblem
>
> Thanks
>
> Best Regards
>
> /Ihab Ramadan/| Senior Developer|Saudisoft <http://www.saudisoft.com/>
> - Egypt| *Tel * +2 02 330 320 37 Ext- 0| Mob+201007570826 |
> Fax+20233032036 | *Follow us on *linked
> <http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>* |
> **ZA102637861*
> <https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>* |
> **ZA102637858* <https://twitter.com/Saudisoft>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150114/b24cef5f/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1314 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150114/b24cef5f/attachment.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1317 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150114/b24cef5f/attachment-0001.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1351 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150114/b24cef5f/attachment-0002.gif
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 99, Issue 26
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 99, Issue 26"
Post a Comment