Moses-support Digest, Vol 103, Issue 52

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. How can I change LM binarization in EMS without re-tuning?
(Lane Schwartz)
2. Re: When to truecase (Philipp Koehn)
3. Re: How can I change LM binarization in EMS without
re-tuning? (Philipp Koehn)
4. Re: How can I change LM binarization in EMS without
re-tuning? (Matthias Huck)

----------------------------------------------------------------------

Message: 1
Date: Wed, 20 May 2015 14:01:52 -0500
From: Lane Schwartz <dowobeha@gmail.com>
Subject: [Moses-support] How can I change LM binarization in EMS
without re-tuning?
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CABv3vZmbZmns5uHb6kfKD4M2QDuQAavrSX9V_widGUJszFU1WQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I've got a system that I trained using EMS. I'd like to change the
binarization of my LM (for example, the original used KenLM probing, and
now I want KenLM trie with quantization).

If I simply change the lm-binarizer line in my config, EMS assumes that it
should re-run tuning. Is there a way that I can force it to not re-tune in
this case?

Thanks,
Lane
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150520/6ce344cb/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 20 May 2015 15:07:52 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] When to truecase
To: Lane Schwartz <dowobeha@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDBJtSNpEGvZ+7gWQyB3xsn+ttjNc_JTnCeX-Qj=EHxopA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

yes, this is what the RECASER section in EMS enables.

-phi

On Wed, May 20, 2015 at 2:50 PM, Lane Schwartz <dowobeha@gmail.com> wrote:

> Got it. So then, how was casing handled in the "mbr/mp" column? Was all
> of the data lowercased, then models trained, then recasing applied after
> decoding? Or something else?
>
> On Wed, May 20, 2015 at 1:30 PM, Philipp Koehn <phi@jhu.edu> wrote:
>
>> Hi,
>>
>> no, the changes are made incrementally.
>>
>> So the recesed "baseline" is the previous "mbr/mp" column.
>>
>> -phi
>>
>> On Wed, May 20, 2015 at 2:01 PM, Lane Schwartz <dowobeha@gmail.com>
>> wrote:
>>
>>> Philipp,
>>>
>>> In Table 2 of the WMT 2009 paper, are the "baseline" and "truecased"
>>> columns directly comparable? In other words, do the two columns indicate
>>> identical conditions other than a single variable (how and/or when casing
>>> was handled)?
>>>
>>> In the baseline condition, how and when was casing handled?
>>>
>>> Thanks,
>>> Lane
>>>
>>>
>>> On Wed, May 20, 2015 at 12:43 PM, Philipp Koehn <phi@jhu.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> see Section 2.2 in our WMT 2009 submission:
>>>> http://www.statmt.org/wmt09/pdf/WMT-0929.pdf
>>>>
>>>> One practical reason to avoid recasing is the need
>>>> for a second large cased language model.
>>>>
>>>> But there is of course also the practical issue with
>>>> have a unique truecasing scheme for each data
>>>> condition, handling of headlines, all-caps emphasis,
>>>> etc.
>>>>
>>>> It would be worth to revisit this issue again under
>>>> different data conditions / language pairs. Both
>>>> options are readily available in EMS.
>>>>
>>>> Each of the two alternative methods could be
>>>> improved as well. See for instance:
>>>> http://www.aclweb.org/anthology/N06-1001
>>>>
>>>> -phi
>>>>
>>>> -phi
>>>>
>>>>
>>>> On Wed, May 20, 2015 at 12:31 PM, Lane Schwartz <dowobeha@gmail.com>
>>>> wrote:
>>>>
>>>>> Philipp (and others),
>>>>>
>>>>> I'm wondering what people's experience is regarding when truecasing
>>>>> is applied.
>>>>>
>>>>> One option is to truecase the training data, then train your TM and
>>>>> LM using that truecased data. Another option would be to lowercase the
>>>>> data, train TM and LM on the lowercased data, and then perform truecasing
>>>>> after decoding.
>>>>>
>>>>> I assume that the former gives better results, but the latter
>>>>> approach has an advantage in terms of extensibility (namely if you get more
>>>>> data and update your truecase model, you don't have to re-train all of your
>>>>> TMs and LMs).
>>>>>
>>>>> Does anyone have any insights they would care to share on this?
>>>>>
>>>>> Thanks,
>>>>> Lane
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> When a place gets crowded enough to require ID's, social collapse is not
>>> far away. It is time to go elsewhere. The best thing about space travel
>>> is that it made it possible to go elsewhere.
>>> -- R.A. Heinlein, "Time Enough For Love"
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away. It is time to go elsewhere. The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150520/8218dfbf/attachment-0001.htm

------------------------------

Message: 3
Date: Wed, 20 May 2015 15:11:47 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] How can I change LM binarization in EMS
without re-tuning?
To: Lane Schwartz <dowobeha@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDCZxv=a5m4Gsoq90LE3Zyao76Eds4O4kbewu2RjJo7=zQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

you can point to the previous configuration file with the old weights:

[TUNING]

### instead of tuning with this setting, old weights may be recycled
# specify here an old configuration file with matching weights
#
weight-config = $toy-data/weight.ini

-phi

On Wed, May 20, 2015 at 3:01 PM, Lane Schwartz <dowobeha@gmail.com> wrote:

> I've got a system that I trained using EMS. I'd like to change the
> binarization of my LM (for example, the original used KenLM probing, and
> now I want KenLM trie with quantization).
>
> If I simply change the lm-binarizer line in my config, EMS assumes that it
> should re-run tuning. Is there a way that I can force it to not re-tune in
> this case?
>
> Thanks,
> Lane
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150520/a55c39f8/attachment-0001.htm

------------------------------

Message: 4
Date: Wed, 20 May 2015 20:14:17 +0100
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] How can I change LM binarization in EMS
without re-tuning?
To: Lane Schwartz <dowobeha@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <1432149257.30904.728.camel@portedgar>
Content-Type: text/plain; charset="UTF-8"

Hi Lane,

Just do the LM binarization manually, edit the LM feature line in your
tuned moses.ini to point to your new binary LM, and tell the EMS where
to look for the tuned moses.ini:

[TUNING]
config-with-reused-weights = $working-dir/tuning/moses.tuned.ini.10

It won't run tuning if you set config-with-reused-weights.

Cheers,
Matthias

On Wed, 2015-05-20 at 14:01 -0500, Lane Schwartz wrote:
> I've got a system that I trained using EMS. I'd like to change the
> binarization of my LM (for example, the original used KenLM probing, and
> now I want KenLM trie with quantization).
>
> If I simply change the lm-binarizer line in my config, EMS assumes that it
> should re-run tuning. Is there a way that I can force it to not re-tune in
> this case?
>
> Thanks,
> Lane
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 103, Issue 52
**********************************************

Moses-support Digest, Vol 103, Issue 52

0 Response to "Moses-support Digest, Vol 103, Issue 52"

Post a Comment