Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. how to align some new parallel sentences using a trained
model (iamzcy_hit iamzcy_hit)
2. Re: Tokenization problem (Tom Hoar)
3. Re: Tokenization problem (Kenneth Heafield)
----------------------------------------------------------------------
Message: 1
Date: Thu, 15 Jan 2015 08:54:06 +0800
From: iamzcy_hit iamzcy_hit <iamzcyhit@gmail.com>
Subject: [Moses-support] how to align some new parallel sentences
using a trained model
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAGLowvLWHXb_J+=vZqMeOVCOD7Z=Uzyz_Sn=yjv+PTsfSyvn3A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,all
If I've train a alignment model using a huge parallel corpus with the
help of giga++,mgiga or fast-align, now I am given some new sentences pairs
and want to align the words in the sentence, how should I do ?
Best regards
--
???????????????.....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150115/9f3850f8/attachment-0001.htm
------------------------------
Message: 2
Date: Thu, 15 Jan 2015 08:33:17 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] Tokenization problem
To: moses-support@mit.edu
Message-ID: <54B718DD.4030109@precisiontranslationtools.com>
Content-Type: text/plain; charset="windows-1252"
I just ran the same sentence through the newest github clone (today).
corporamgr@domt-v2:~/Public/src/mosesdecoder/scripts/tokenizer$
./tokenizer.perl -no-escape -q -l en < test.txt
which will guide you through connecting and configuring your printer 's
wireless connection .
which will guide you through connecting and configuring your printer 's
wireless connection .
which will guide you through connecting and configuring your printer 's
wireless connection .
which will guide you through connecting and configuring your printer 's
wireless connection .
which will guide you through connecting and configuring your printer 's
wireless connection .
This is not a Perl script problem. What shell and command line are you
using for your "in the file" results? You'll find the problem in either
your shell or your custom tool chain(s) before you run tokenizer.perl.
On 01/14/2015 04:13 PM, Ihab Ramadan wrote:
>
> Dears,
>
> I still have this problem, for not confusing the decoder I used the
> ??no-escape? parameter in the tokenizer.perl script but still have the
> problem of adding extra space after quotations for tokenizing files
> however in tokenizing a segment it comes without the extra space
>
> For example
>
> In the file
>
> ?which will guide you through connecting and configuring your
> printer's wireless connection. ? ??which will guide you through
> connecting and configuring your printer ' s wireless connection .?
>
> As a segment
>
> ?which will guide you through connecting and configuring your
> printer's wireless connection. ? ??which will guide you through
> connecting and configuring your printer 's wireless connection .?
>
> I wonder if it is the same script why it generated two different outputs
>
> I have no experience in perl so I could not get the line of code which
> differ between if the segment in a file or just one segment passed as
> a parameter to the script
>
> Please help
>
> *From:*Ihab Ramadan [mailto:i.ramadan@saudisoft.com]
> *Sent:* Monday, January 5, 2015 10:09 AM
> *To:* moses-support@mit.edu
> *Subject:* Tokenization problem
>
> Dears,
>
> Using the tokenizer on the training files replaces the apostrophes
> with ?' s? (with space) but if I use the same script to tokenize
> a sentence it makes the apostrophes to be ?'s? (without a space)
>
> This problem confuse the decoder while translation
>
> How to solve this peoblem
>
> Thanks
>
> Best Regards
>
> /Ihab Ramadan/| Senior Developer|Saudisoft <http://www.saudisoft.com/>
> - Egypt| *Tel * +2 02 330 320 37 Ext- 0| Mob+201007570826 |
> Fax+20233032036 | *Follow us on *linked
> <http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>* |
> **ZA102637861*
> <https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>* |
> **ZA102637858* <https://twitter.com/Saudisoft>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150115/84784716/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1314 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150115/84784716/attachment-0003.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1317 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150115/84784716/attachment-0004.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1351 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150115/84784716/attachment-0005.gif
------------------------------
Message: 3
Date: Wed, 14 Jan 2015 20:39:14 -0500
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Tokenization problem
To: moses-support@mit.edu
Message-ID: <54B71A42.7040703@kheafield.com>
Content-Type: text/plain; charset=windows-1252
I'll inject that it is plausible there is some weird Unicode going on
there and copy-paste on Linux sometimes canonicalized graphemes. Whilst
I'm inclined to side with Tom, the only way to sort this out is with the
raw file from Ihab as e.g. a gzipped attachment.
Kenneth
On 01/14/2015 08:33 PM, Tom Hoar wrote:
> I just ran the same sentence through the newest github clone (today).
>
> corporamgr@domt-v2:~/Public/src/mosesdecoder/scripts/tokenizer$
> ./tokenizer.perl -no-escape -q -l en < test.txt
> which will guide you through connecting and configuring your printer 's
> wireless connection .
> which will guide you through connecting and configuring your printer 's
> wireless connection .
> which will guide you through connecting and configuring your printer 's
> wireless connection .
> which will guide you through connecting and configuring your printer 's
> wireless connection .
> which will guide you through connecting and configuring your printer 's
> wireless connection .
>
> This is not a Perl script problem. What shell and command line are you
> using for your "in the file" results? You'll find the problem in either
> your shell or your custom tool chain(s) before you run tokenizer.perl.
>
>
>
> On 01/14/2015 04:13 PM, Ihab Ramadan wrote:
>>
>> Dears,
>>
>> I still have this problem, for not confusing the decoder I used the
>> ??no-escape? parameter in the tokenizer.perl script but still have the
>> problem of adding extra space after quotations for tokenizing files
>> however in tokenizing a segment it comes without the extra space
>>
>> For example
>>
>> In the file
>>
>> ?which will guide you through connecting and configuring your
>> printer's wireless connection. ? ??which will guide you through
>> connecting and configuring your printer ' s wireless connection .?
>>
>> As a segment
>>
>> ?which will guide you through connecting and configuring your
>> printer's wireless connection. ? ??which will guide you through
>> connecting and configuring your printer 's wireless connection .?
>>
>> I wonder if it is the same script why it generated two different outputs
>>
>> I have no experience in perl so I could not get the line of code which
>> differ between if the segment in a file or just one segment passed as
>> a parameter to the script
>>
>> Please help
>>
>>
>>
>>
>>
>>
>>
>> *From:*Ihab Ramadan [mailto:i.ramadan@saudisoft.com]
>> *Sent:* Monday, January 5, 2015 10:09 AM
>> *To:* moses-support@mit.edu
>> *Subject:* Tokenization problem
>>
>>
>>
>> Dears,
>>
>> Using the tokenizer on the training files replaces the apostrophes
>> with ?' s? (with space) but if I use the same script to tokenize
>> a sentence it makes the apostrophes to be ?'s? (without a space)
>>
>> This problem confuse the decoder while translation
>>
>> How to solve this peoblem
>>
>> Thanks
>>
>>
>>
>> Best Regards
>>
>> /Ihab Ramadan/| Senior Developer|Saudisoft <http://www.saudisoft.com/>
>> - Egypt| *Tel * +2 02 330 320 37 Ext- 0| Mob+201007570826 |
>> Fax+20233032036 | *Follow us on *linked
>> <http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>* |
>> **ZA102637861*
>> <https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>* |
>> **ZA102637858* <https://twitter.com/Saudisoft>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 99, Issue 28
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 99, Issue 28"
Post a Comment