Moses-support Digest, Vol 93, Issue 18

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Detokenizer (Philipp Koehn)
2. Re: Fwd: about the moses code (amir haghighi)
3. Fwd: about the moses code (Arefeh Kazemi)
4. Re: Detokenizer (Barry Haddow)
5. Re: Detokenizer (Judah Schvimer)


----------------------------------------------------------------------

Message: 1
Date: Mon, 14 Jul 2014 13:07:37 -0400
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] Detokenizer
To: Judah Schvimer <judah.schvimer@mongodb.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAAFADDB3Jy41Njy55ZHMp0w3BYr0j-56QWd7GWq16ecH5WBbeA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi,

the tokenizer / detokenizer are indeed not fully able to
reverse to the original string. It is possible to write such
a tokenizer (not easy), but the one that ships with
Moses does not do the job.

-phi

On Mon, Jul 14, 2014 at 11:53 AM, Judah Schvimer
<judah.schvimer@mongodb.com> wrote:
> Hi,
>
> When I'm using the decoder I have to tokenize my target sentences before I
> translate them. However, when I detokenize them it leaves awkward spaces
> around what was tokenized. is there any way to fix this? It seems to be
> mainly around slashes and colons
>
> Source: :doc:`/tutorial/aggregation-zip-code-data-set`
> Target: : Doc: '/ tutorial / aggregation-zip-code-data-set'
>
> Thanks,
> Judah
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

Message: 2
Date: Mon, 14 Jul 2014 16:28:18 -0800
From: amir haghighi <amir.haghighi.64@gmail.com>
Subject: Re: [Moses-support] Fwd: about the moses code
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>, moses-support
<moses-support@mit.edu>
Message-ID:
<CA+UVbEi44p5Ms7=vAzN9fd9SQMquaBoj8Vb1UEeBqBHnfohmsQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Thank you very much Mr Hieu for uploading this helpful video.
unfortunately some part of it, are corrupted( especially the first
minutes), and what you are typing in the terminal can not be seen.
It would be great if you could upload a better one.
Thank you again.

Amir


On Mon, Jul 14, 2014 at 2:46 AM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:

> I've just uploaded a youtube video about this
> https://www.youtube.com/watch?v=P43h827uLac&feature=youtu.be
> Hope thats useful to you
>
>
>
> On 13 July 2014 22:53, amir haghighi <amir.haghighi.64@gmail.com> wrote:
>
>>
>> Hello all
>>
>> it is a week that I want to open moses code with Netbeans od eclipse IDE
>> and I cant.
>> regrading that moses code does not have any make or configure file, could
>> you please help me how can I open and run its code with those IDEs?
>> I should add my feature function to moses code but I can't even open the
>> code with an IDE.
>>
>> I would be very grateful if you could help me.
>>
>> Thank you
>> Amir
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140714/8a69f39e/attachment-0001.htm

------------------------------

Message: 3
Date: Mon, 14 Jul 2014 18:55:03 -0700
From: Arefeh Kazemi <arefeh_kazemi@yahoo.com>
Subject: [Moses-support] Fwd: about the moses code
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<1405389303.14776.YahooMailNeo@web121702.mail.ne1.yahoo.com>
Content-Type: text/plain; charset="us-ascii"

Thank you Hieu
It was very useful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140714/3d5f0f69/attachment-0001.htm

------------------------------

Message: 4
Date: Tue, 15 Jul 2014 09:02:43 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Detokenizer
To: Judah Schvimer <judah.schvimer@mongodb.com>, moses-support
<moses-support@mit.edu>
Message-ID: <53C4E023.8020608@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Judah

The actual problem here is that you do not want path names split by the
tokeniser. It's only really set up to deal with regular text, but what
you can do is ask it to "protect" certain patterns by using the

-protected <filename>

argument. The file <filename> should contain a list of regular
expressions (one per line), and the tokeniser will not split apart any
tokens which match these REs. I'm guessing that in the example below you
don't want "tutorial" translated into the target language, and if the
tokeniser doesn't split the path then the whole thing will pass through
as an OOV,

cheers - Barry

On 14/07/14 16:53, Judah Schvimer wrote:
> Hi,
>
> When I'm using the decoder I have to tokenize my target sentences
> before I translate them. However, when I detokenize them it leaves
> awkward spaces around what was tokenized. is there any way to fix
> this? It seems to be mainly around slashes and colons
>
> Source: :doc:`/tutorial/aggregation-zip-code-data-set`
> Target: : Doc: '/ tutorial / aggregation-zip-code-data-set'
>
> Thanks,
> Judah
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 5
Date: Tue, 15 Jul 2014 08:59:24 -0400
From: Judah Schvimer <judah.schvimer@mongodb.com>
Subject: Re: [Moses-support] Detokenizer
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CALF9aB6pAyedSiHKNMyfc9H_oBzCvYy1sF1+ABAxigCZSNhgdQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

HI,

Thank you very much! That's incredibly helpful. My one concern is that
before I tokenized the input to the decoder it was crashing. Do you know
what tokens would cause that behavior if left in? Would you recommend just
not tokenizing path names and urls and leaving everything else?

Judah


On Tue, Jul 15, 2014 at 4:02 AM, Barry Haddow <bhaddow@staffmail.ed.ac.uk>
wrote:

> Hi Judah
>
> The actual problem here is that you do not want path names split by the
> tokeniser. It's only really set up to deal with regular text, but what you
> can do is ask it to "protect" certain patterns by using the
>
> -protected <filename>
>
> argument. The file <filename> should contain a list of regular expressions
> (one per line), and the tokeniser will not split apart any tokens which
> match these REs. I'm guessing that in the example below you don't want
> "tutorial" translated into the target language, and if the tokeniser
> doesn't split the path then the whole thing will pass through as an OOV,
>
> cheers - Barry
>
>
> On 14/07/14 16:53, Judah Schvimer wrote:
>
>> Hi,
>>
>> When I'm using the decoder I have to tokenize my target sentences before
>> I translate them. However, when I detokenize them it leaves awkward spaces
>> around what was tokenized. is there any way to fix this? It seems to be
>> mainly around slashes and colons
>>
>> Source: :doc:`/tutorial/aggregation-zip-code-data-set`
>> Target: : Doc: '/ tutorial / aggregation-zip-code-data-set'
>>
>> Thanks,
>> Judah
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140715/13e870c7/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 93, Issue 18
*********************************************

0 Response to "Moses-support Digest, Vol 93, Issue 18"

Post a Comment