Moses-support Digest, Vol 84, Issue 24

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: How to tell EMS to use an existing LM (Lane Schwartz)
2. Re: How to tell EMS to use an existing LM (Lane Schwartz)
3. Re: tokenizer.perl to not tokenize exclude URLs
(Eleftherios Avramidis)


----------------------------------------------------------------------

Message: 1
Date: Tue, 15 Oct 2013 10:12:01 -0400
From: Lane Schwartz <dowobeha@gmail.com>
Subject: Re: [Moses-support] How to tell EMS to use an existing LM
To: Eleftherios Avramidis <eleftherios.avramidis@dfki.de>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CABv3vZm6zZert-T93Y3VfVMM8ML+oLJNAUurxAvKxJs_Qcn6JA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Thanks!

And, it appears that if you do that EMS will automatically perform
binarization of the LM. If your file has already been binarized, or if
you just want to skip binarization, then instead of doing
lm=/path/to/my/lm do binlm=/path/to/my/lm

Cheers,
Lane


On Tue, Oct 15, 2013 at 10:04 AM, Eleftherios Avramidis
<eleftherios.avramidis@dfki.de> wrote:
> Hi Lane,
>
> in the section which describes the particular language model, you
> uncomment #lm =
> and directly point to your ready model. Type can be defined with the
> parameter 'type'.
>
> from example configuration:
>
> [LM:europarl]
> ### command to run to get raw corpus files
>
> #
>
> #get-corpus-script = ""
>
> ### raw corpus (untokenized)
>
> #
>
> raw-corpus = $wmt12-data/training/europarl-v7.$output-extension
>
> ### tokenized corpus files (may contain long sentences)
>
> #
>
> #tokenized-corpus =
>
> ### if corpus preparation should be skipped,
>
> # point to the prepared language model
>
> #
>
> #lm =
>
>
>
>
> best
> Lefteris
>
> On 15/10/13 15:40, Lane Schwartz wrote:
>> I'm running Moses v1, and I have some existing already-trained LM
>> files that I'd like to use.
>>
>> How can I tell EMS to use an existing LM file (presumably at the same
>> time telling EMS what LM type it is)?
>>
>> Thanks,
>> Lane
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> --
> MSc. Inf. Eleftherios Avramidis
> DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
> Tel. +49-30 238 95-1806
>
> Fax. +49-30 238 95-1810
>
> -------------------------------------------------------------------------------------------
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
>
> Vorsitzender des Aufsichtsrats:
> Prof. Dr. h.c. Hans A. Aukes
>
> Amtsgericht Kaiserslautern, HRB 2313
> -------------------------------------------------------------------------------------------
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



--
When a place gets crowded enough to require ID's, social collapse is not
far away. It is time to go elsewhere. The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"


------------------------------

Message: 2
Date: Tue, 15 Oct 2013 10:13:23 -0400
From: Lane Schwartz <dowobeha@gmail.com>
Subject: Re: [Moses-support] How to tell EMS to use an existing LM
To: Eleftherios Avramidis <eleftherios.avramidis@dfki.de>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CABv3vZ=ftJ6AHQ0no-WmTz-iF=cXw1pzM3v706fLgB4Uh-YjLA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Also, because I'll probably forget how to do this and need to search
the mailing list archives, the type of the existing LM can be
specified by type=??? where ??? is the appropriate LM type.

On Tue, Oct 15, 2013 at 10:12 AM, Lane Schwartz <dowobeha@gmail.com> wrote:
> Thanks!
>
> And, it appears that if you do that EMS will automatically perform
> binarization of the LM. If your file has already been binarized, or if
> you just want to skip binarization, then instead of doing
> lm=/path/to/my/lm do binlm=/path/to/my/lm
>
> Cheers,
> Lane
>
>
> On Tue, Oct 15, 2013 at 10:04 AM, Eleftherios Avramidis
> <eleftherios.avramidis@dfki.de> wrote:
>> Hi Lane,
>>
>> in the section which describes the particular language model, you
>> uncomment #lm =
>> and directly point to your ready model. Type can be defined with the
>> parameter 'type'.
>>
>> from example configuration:
>>
>> [LM:europarl]
>> ### command to run to get raw corpus files
>>
>> #
>>
>> #get-corpus-script = ""
>>
>> ### raw corpus (untokenized)
>>
>> #
>>
>> raw-corpus = $wmt12-data/training/europarl-v7.$output-extension
>>
>> ### tokenized corpus files (may contain long sentences)
>>
>> #
>>
>> #tokenized-corpus =
>>
>> ### if corpus preparation should be skipped,
>>
>> # point to the prepared language model
>>
>> #
>>
>> #lm =
>>
>>
>>
>>
>> best
>> Lefteris
>>
>> On 15/10/13 15:40, Lane Schwartz wrote:
>>> I'm running Moses v1, and I have some existing already-trained LM
>>> files that I'd like to use.
>>>
>>> How can I tell EMS to use an existing LM file (presumably at the same
>>> time telling EMS what LM type it is)?
>>>
>>> Thanks,
>>> Lane
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> --
>> MSc. Inf. Eleftherios Avramidis
>> DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
>> Tel. +49-30 238 95-1806
>>
>> Fax. +49-30 238 95-1810
>>
>> -------------------------------------------------------------------------------------------
>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>>
>> Geschaeftsfuehrung:
>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>> Dr. Walter Olthoff
>>
>> Vorsitzender des Aufsichtsrats:
>> Prof. Dr. h.c. Hans A. Aukes
>>
>> Amtsgericht Kaiserslautern, HRB 2313
>> -------------------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away. It is time to go elsewhere. The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"



--
When a place gets crowded enough to require ID's, social collapse is not
far away. It is time to go elsewhere. The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"


------------------------------

Message: 3
Date: Tue, 15 Oct 2013 16:17:56 +0200
From: Eleftherios Avramidis <eleftherios.avramidis@dfki.de>
Subject: Re: [Moses-support] tokenizer.perl to not tokenize exclude
URLs
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Cc: Moses-support <moses-support@mit.edu>
Message-ID: <525D4E94.6060907@dfki.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Barry,

thanks, both tokenizer and detokenizer work as you said. Problem solved.

best
Lefteris


On 14/10/13 21:38, Barry Haddow wrote:
> Hi Lefty
>
> For the 'protect' option, the format is one regular expression per
> line. For example if you use a file with one line like this:
>
> http://\S+
>
> then it should protect some URLs from tokenisation. It works for me.
> If you have problems then send me the file.
>
> For the -a option, I think the detokeniser should put the hyphens back
> together again, but I have not checked.
>
> cheers - Barry
>
> On 14/10/13 19:22, Eleftherios Avramidis wrote:
>> Hi,
>>
>> I see tokenizer.perl now offers an option for excluding URLs and other
>> expressions. " -protect FILE ... specify file with patters to be
>> protected in tokenisation." Unfortunately there is no explanation of how
>> this optional file should be. I tried several ways of writing regular
>> expressions for URLs, but URLs still come out tokenized. Could you
>> provide an example?
>>
>> My second question concerns the -a option, for aggressive hyphen
>> splitting. Does the detokenizer offer a similar option, to reconstructed
>> separeted hyphens?
>>
>> cheers
>> Lefteris
>>
>


--
MSc. Inf. Eleftherios Avramidis
DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
Tel. +49-30 238 95-1806

Fax. +49-30 238 95-1810

-------------------------------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------------------------------------



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 84, Issue 24
*********************************************

0 Response to "Moses-support Digest, Vol 84, Issue 24"

Post a Comment