Moses-support Digest, Vol 111, Issue 45

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Skip OOV when computing Language Model score (Jie Jiang)
2. Re: Skip OOV when computing Language Model score (Ergun Bicici)

----------------------------------------------------------------------

Message: 1
Date: Fri, 15 Jan 2016 15:20:08 +0000
From: Jie Jiang <mail.jie.jiang@gmail.com>
Subject: Re: [Moses-support] Skip OOV when computing Language Model
score
To: Ergun Bicici <ergun.bicici@dfki.de>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAGWrrwNOKuo2Umu188z8s69EnaLtznt1Tc-UF3opsbsFf3p-hg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Ergun:

I think the -skipoovs option would just drop all the n-gram scores that has
OOV in it, rather than using a skip-ngram LM model.

Easy way to test it is just run it with that option to calculate log prob
on a sentence with OOV, and it should result in a rather high score.

Please correct me if I'm wrong...

2016-01-15 14:07 GMT+00:00 Ergun Bicici <ergun.bicici@dfki.de>:

>
> Dear Jie,
>
> There may be some option from SRILM:
> - http://www.speech.sri.com/pipermail/srilm-user/2013q2/001509.html
> - http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html:
> * -skipoovs*
> Instruct the LM to skip over contexts that contain out-of-vocabulary
> words, instead of using a backoff strategy in these cases.
>
> ?if it is not ?there maybe for a reason...
>
> Bing appears fast to index this thread:
> ?http://comments.gmane.org/gmane.comp.nlp.moses.user/14570?
>
>
> *Best Regards,*
> Ergun
>
> Ergun Bi?ici
> DFKI Projektb?ro Berlin
>
>
> On Fri, Jan 15, 2016 at 2:37 PM, Jie Jiang <mail.jie.jiang@gmail.com>
> wrote:
>
>> Hi Ergun:
>>
>> The original request in Quang's post was:
>>
>> *For instance, with the n-gram: "the <unk> house <unk> in", I would like
>> the decoder to assign it the probability of the phrase: "the house in"
>> (existing in the LM).*
>>
>> so each time there is a <unk> when calculating the LM score, you need to
>> look another word further.
>>
>> I believe that it cannot be achieved on current LM tools without
>> modifying the source code, which has already been clarified by Kenneth.
>>
>>
>> 2016-01-15 13:20 GMT+00:00 Ergun Bicici <ergun.bicici@dfki.de>:
>>
>>>
>>> Dear Kenneth,
>>>
>>> In the Moses manual, -drop-unknown switch is mentioned:
>>>
>>> 4.7.2
>>> Handling Unknown Words
>>> Unknown words are copied verbatim to the output. They are also scored by
>>> the language
>>> model, and may be placed out of order. Alternatively, you may want to
>>> drop unknown words.
>>> To do so add the switch -drop-unknown.
>>>
>>> ?Alternatively, you can write a script that replaces all OOV tokens?
>>> with some OOV-token-identifier such as <unk> before sending for
>>> translation.
>>>
>>>
>>> *Best Regards,*
>>> Ergun
>>>
>>> Ergun Bi?ici
>>> DFKI Projektb?ro Berlin
>>>
>>>
>>> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield <moses@kheafield.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I think oov-feature=1 just activates the OOV count feature while
>>>> leaving LM score unchanged. So it would still include p(<unk> | in).
>>>>
>>>> One might try setting the OOV feature weight to -weight_LM *
>>>> weird_moses_internal_constant * log p(<unk>) in an attempt to cancel out
>>>> the log p(<unk>) terms. However that won't work either because:
>>>>
>>>> 1) It will still charge backoff penalties, b(the)b(house) in the
>>>> example.
>>>>
>>>> 2) The context will be lost each time so it's p(house) not p(house |
>>>> the).
>>>>
>>>> If the <unk>s follow a pattern, such as appearing every other word, one
>>>> could insert them into the ARPA file though that would waste memory.
>>>>
>>>> I don't think there's any way to accomplish exactly what OP asked for
>>>> without coding (though it wouldn't be that hard once one understands how
>>>> the LM infrastructure works).
>>>>
>>>> Kenneth
>>>>
>>>> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
>>>> > Hi,
>>>> >
>>>> > You may get the behavior you want by adding
>>>> > "oov-feature=1"
>>>> > to your LM specification line in moses.ini
>>>> > and also add a second weight with value "0" to the corresponding LM
>>>> > weight setting.
>>>> >
>>>> > This will then only use the scores
>>>> > p(the|<s>)
>>>> > p(house|<s>,the,<unk>) ---> backoff to p(house)
>>>> > p(in|<s>,the,<unk>,house,<unk>) ---> backoff to p(in)
>>>> >
>>>> > -phi
>>>> >
>>>> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
>>>> > <quangngocluong@gmail.com <mailto:quangngocluong@gmail.com>> wrote:
>>>> >
>>>> > Dear All,
>>>> >
>>>> > I am currently using a SRILM Language Model (LM) in my Moses
>>>> > decoder. Does anyone know how can I ask the decoder, at the
>>>> decoding
>>>> > time, skip all out-of-vocabulary words when computing the LM score
>>>> > (instead of doing back-off)?
>>>> >
>>>> > For instance, with the n-gram: "the <unk> house <unk> in", I would
>>>> > like the decoder to assign it the probability of the phrase: "the
>>>> > house in" (existing in the LM).
>>>> >
>>>> > Do I need more options/declarations in moses.ini file?
>>>> >
>>>> > Any help is very much appreciated,
>>>> >
>>>> > Best,
>>>> > Quang
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Moses-support mailing list
>>>> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Moses-support mailing list
>>>> > Moses-support@mit.edu
>>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>>
>> --
>>
>> Best regards!
>>
>> Jie Jiang
>>
>>
>

--

Best regards!

Jie Jiang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160115/b0e15d5b/attachment-0001.html

------------------------------

Message: 2
Date: Fri, 15 Jan 2016 16:41:16 +0100
From: Ergun Bicici <ergun.bicici@dfki.de>
Subject: Re: [Moses-support] Skip OOV when computing Language Model
score
To: Jiang Jie <mail.jie.jiang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAB2pGnfXzUKctCfdoFquMbdJgCnw5U9kk96W829YnRZ4qqTY6Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

No comment.

*Best Regards,*
Ergun

Ergun Bi?ici
DFKI Projektb?ro Berlin

On Fri, Jan 15, 2016 at 4:20 PM, Jie Jiang <mail.jie.jiang@gmail.com> wrote:

> Hi Ergun:
>
> I think the -skipoovs option would just drop all the n-gram scores that
> has OOV in it, rather than using a skip-ngram LM model.
>
> Easy way to test it is just run it with that option to calculate log prob
> on a sentence with OOV, and it should result in a rather high score.
>
> Please correct me if I'm wrong...
>
> 2016-01-15 14:07 GMT+00:00 Ergun Bicici <ergun.bicici@dfki.de>:
>
>>
>> Dear Jie,
>>
>> There may be some option from SRILM:
>> - http://www.speech.sri.com/pipermail/srilm-user/2013q2/001509.html
>> - http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html:
>> * -skipoovs*
>> Instruct the LM to skip over contexts that contain out-of-vocabulary
>> words, instead of using a backoff strategy in these cases.
>>
>> ?if it is not ?there maybe for a reason...
>>
>> Bing appears fast to index this thread:
>> ?http://comments.gmane.org/gmane.comp.nlp.moses.user/14570?
>>
>>
>> *Best Regards,*
>> Ergun
>>
>> Ergun Bi?ici
>> DFKI Projektb?ro Berlin
>>
>>
>> On Fri, Jan 15, 2016 at 2:37 PM, Jie Jiang <mail.jie.jiang@gmail.com>
>> wrote:
>>
>>> Hi Ergun:
>>>
>>> The original request in Quang's post was:
>>>
>>> *For instance, with the n-gram: "the <unk> house <unk> in", I would like
>>> the decoder to assign it the probability of the phrase: "the house in"
>>> (existing in the LM).*
>>>
>>> so each time there is a <unk> when calculating the LM score, you need to
>>> look another word further.
>>>
>>> I believe that it cannot be achieved on current LM tools without
>>> modifying the source code, which has already been clarified by Kenneth.
>>>
>>>
>>> 2016-01-15 13:20 GMT+00:00 Ergun Bicici <ergun.bicici@dfki.de>:
>>>
>>>>
>>>> Dear Kenneth,
>>>>
>>>> In the Moses manual, -drop-unknown switch is mentioned:
>>>>
>>>> 4.7.2
>>>> Handling Unknown Words
>>>> Unknown words are copied verbatim to the output. They are also scored
>>>> by the language
>>>> model, and may be placed out of order. Alternatively, you may want to
>>>> drop unknown words.
>>>> To do so add the switch -drop-unknown.
>>>>
>>>> ?Alternatively, you can write a script that replaces all OOV tokens?
>>>> with some OOV-token-identifier such as <unk> before sending for
>>>> translation.
>>>>
>>>>
>>>> *Best Regards,*
>>>> Ergun
>>>>
>>>> Ergun Bi?ici
>>>> DFKI Projektb?ro Berlin
>>>>
>>>>
>>>> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield <moses@kheafield.com
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I think oov-feature=1 just activates the OOV count feature
>>>>> while
>>>>> leaving LM score unchanged. So it would still include p(<unk> | in).
>>>>>
>>>>> One might try setting the OOV feature weight to -weight_LM *
>>>>> weird_moses_internal_constant * log p(<unk>) in an attempt to cancel
>>>>> out
>>>>> the log p(<unk>) terms. However that won't work either because:
>>>>>
>>>>> 1) It will still charge backoff penalties, b(the)b(house) in the
>>>>> example.
>>>>>
>>>>> 2) The context will be lost each time so it's p(house) not p(house |
>>>>> the).
>>>>>
>>>>> If the <unk>s follow a pattern, such as appearing every other word, one
>>>>> could insert them into the ARPA file though that would waste memory.
>>>>>
>>>>> I don't think there's any way to accomplish exactly what OP asked for
>>>>> without coding (though it wouldn't be that hard once one understands
>>>>> how
>>>>> the LM infrastructure works).
>>>>>
>>>>> Kenneth
>>>>>
>>>>> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
>>>>> > Hi,
>>>>> >
>>>>> > You may get the behavior you want by adding
>>>>> > "oov-feature=1"
>>>>> > to your LM specification line in moses.ini
>>>>> > and also add a second weight with value "0" to the corresponding LM
>>>>> > weight setting.
>>>>> >
>>>>> > This will then only use the scores
>>>>> > p(the|<s>)
>>>>> > p(house|<s>,the,<unk>) ---> backoff to p(house)
>>>>> > p(in|<s>,the,<unk>,house,<unk>) ---> backoff to p(in)
>>>>> >
>>>>> > -phi
>>>>> >
>>>>> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
>>>>> > <quangngocluong@gmail.com <mailto:quangngocluong@gmail.com>> wrote:
>>>>> >
>>>>> > Dear All,
>>>>> >
>>>>> > I am currently using a SRILM Language Model (LM) in my Moses
>>>>> > decoder. Does anyone know how can I ask the decoder, at the
>>>>> decoding
>>>>> > time, skip all out-of-vocabulary words when computing the LM
>>>>> score
>>>>> > (instead of doing back-off)?
>>>>> >
>>>>> > For instance, with the n-gram: "the <unk> house <unk> in", I
>>>>> would
>>>>> > like the decoder to assign it the probability of the phrase: "the
>>>>> > house in" (existing in the LM).
>>>>> >
>>>>> > Do I need more options/declarations in moses.ini file?
>>>>> >
>>>>> > Any help is very much appreciated,
>>>>> >
>>>>> > Best,
>>>>> > Quang
>>>>> >
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > Moses-support mailing list
>>>>> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > Moses-support mailing list
>>>>> > Moses-support@mit.edu
>>>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>> >
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Best regards!
>>>
>>> Jie Jiang
>>>
>>>
>>
>
>
> --
>
> Best regards!
>
> Jie Jiang
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160115/b7659f50/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 111, Issue 45
**********************************************

Moses-support Digest, Vol 111, Issue 45

0 Response to "Moses-support Digest, Vol 111, Issue 45"

Post a Comment