Moses-support Digest, Vol 162, Issue 7

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Tokenization (Justin Cunningham)
2. Re: Tokenization (Hieu Hoang)
3. Re: Tokenization (Justin Cunningham)


----------------------------------------------------------------------

Message: 1
Date: Sun, 12 Apr 2020 17:23:06 +0000
From: Justin Cunningham <just1brill@outlook.com>
Subject: [Moses-support] Tokenization
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<DB6PR0601MB2149847F821248D5B88BC64B8CDC0@DB6PR0601MB2149.eurprd06.prod.outlook.com>

Content-Type: text/plain; charset="utf-8"

Hi,

I?m currently working on a Neural Machine Translator but I am quite new to it all. I am trying to tokenise my files in Linux using the following shell script (https://github.com/JustCunn/IrishNMT/blob/master/GaeilgePrepare.sh) and these files:

http://opus.nlpl.eu/download.php?f=EUbookshop/v2/moses/en-ga.txt.zip<http://opus.nlpl.eu/download.php?f=EUbookshop/v2/moses/de-fr.txt.zip>
http://opus.nlpl.eu/download.php?f=QED/v2.0a/moses/en-ga.txt.zip

But it just won?t work. Sometimes it will skip it, others it will just be stuck on the ?Tokenizer... number of threads...?. For context, they are all plain text files. Am I not formatting the text correctly?

I?d appreciate if someone could help me with this as it would be a huge help in my understanding of it all.

Thanks,
Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200412/1c5a6531/attachment-0001.html

------------------------------

Message: 2
Date: Sun, 12 Apr 2020 13:20:43 -0700
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Tokenization
To: Justin Cunningham <just1brill@outlook.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAEKMkbhNP6rcRC9+HvSai8FE_vFb0jO5oYkweCzqD7FDG9KtyQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

the moses tokenizer expects the input from standard in

Hieu Hoang
http://statmt.org/hieu


On Sun, 12 Apr 2020 at 10:27, Justin Cunningham <just1brill@outlook.com>
wrote:

> Hi,
>
> I?m currently working on a Neural Machine Translator but I am quite new to
> it all. I am trying to tokenise my files in Linux using the following shell
> script (https://github.com/JustCunn/IrishNMT/blob/master/GaeilgePrepare.sh)
> and these files:
>
> http://opus.nlpl.eu/download.php?f=EUbookshop/v2/moses/en-ga.txt.zip
> <http://opus.nlpl.eu/download.php?f=EUbookshop/v2/moses/de-fr.txt.zip>
> http://opus.nlpl.eu/download.php?f=QED/v2.0a/moses/en-ga.txt.zip
>
> But it just won?t work. Sometimes it will skip it, others it will just be
> stuck on the ?Tokenizer... number of threads...?. For context, they are all
> plain text files. Am I not formatting the text correctly?
>
> I?d appreciate if someone could help me with this as it would be a huge
> help in my understanding of it all.
>
> Thanks,
> Justin
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200412/3ca33bdc/attachment-0001.html

------------------------------

Message: 3
Date: Sun, 12 Apr 2020 20:39:03 +0000
From: Justin Cunningham <just1brill@outlook.com>
Subject: Re: [Moses-support] Tokenization
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<DB6PR0601MB2149FDD259C5BC003DB579AF8CDC0@DB6PR0601MB2149.eurprd06.prod.outlook.com>

Content-Type: text/plain; charset="us-ascii"

Thanks for replying! It actually ended up being a spelling error in the code.

Thanks,
Justin




------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 162, Issue 7
*********************************************

0 Response to "Moses-support Digest, Vol 162, Issue 7"

Post a Comment