Moses-support Digest, Vol 111, Issue 44

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Skip OOV when computing Language Model score
(Kenneth Heafield)
2. Re: Mgiza Debian Packages? (Hieu Hoang)
3. Re: Skip OOV when computing Language Model score (Ergun Bicici)

----------------------------------------------------------------------

Message: 1
Date: Fri, 15 Jan 2016 13:49:53 +0000
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Skip OOV when computing Language Model
score
To: moses-support@mit.edu
Message-ID: <5698F901.2040508@kheafield.com>
Content-Type: text/plain; charset=UTF-8

It depends on what the OP meant by OOV. If it's phrase-table OOV then
-drop-unknown will work. If it's language model OOV then it won't.

However, if the target language model(s) contain the target side of the
phrase table, then language model OOV implies phrase table OOV.

Kenneth

On 01/15/2016 01:37 PM, Jie Jiang wrote:
> Hi Ergun:
>
> The original request in Quang's post was:
>
> */For instance, with the n-gram: "the <unk> house <unk> in", I would
> like the decoder to assign it the probability of the phrase: "the house
> in" (existing in the LM)./*
>
> so each time there is a <unk> when calculating the LM score, you need to
> look another word further.
>
> I believe that it cannot be achieved on current LM tools without
> modifying the source code, which has already been clarified by Kenneth.
>
>
> 2016-01-15 13:20 GMT+00:00 Ergun Bicici <ergun.bicici@dfki.de
> <mailto:ergun.bicici@dfki.de>>:
>
>
> Dear Kenneth,
>
> In the Moses manual, -drop-unknown switch is mentioned:
>
> 4.7.2
> Handling Unknown Words
> Unknown words are copied verbatim to the output. They are also
> scored by the language
> model, and may be placed out of order. Alternatively, you may want
> to drop unknown words.
> To do so add the switch -drop-unknown.
>
> ?Alternatively, you can write a script that replaces all OOV tokens?
> with some OOV-token-identifier such as <unk> before sending for
> translation.
>
>
> /Best Regards,/
> Ergun
>
> Ergun Bi?ici
> DFKI Projektb?ro Berlin
>
>
> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield
> <moses@kheafield.com <mailto:moses@kheafield.com>> wrote:
>
> Hi,
>
> I think oov-feature=1 just activates the OOV count
> feature while
> leaving LM score unchanged. So it would still include p(<unk> |
> in).
>
> One might try setting the OOV feature weight to -weight_LM *
> weird_moses_internal_constant * log p(<unk>) in an attempt to
> cancel out
> the log p(<unk>) terms. However that won't work either because:
>
> 1) It will still charge backoff penalties, b(the)b(house) in the
> example.
>
> 2) The context will be lost each time so it's p(house) not
> p(house | the).
>
> If the <unk>s follow a pattern, such as appearing every other
> word, one
> could insert them into the ARPA file though that would waste memory.
>
> I don't think there's any way to accomplish exactly what OP
> asked for
> without coding (though it wouldn't be that hard once one
> understands how
> the LM infrastructure works).
>
> Kenneth
>
> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
> > Hi,
> >
> > You may get the behavior you want by adding
> > "oov-feature=1"
> > to your LM specification line in moses.ini
> > and also add a second weight with value "0" to the corresponding LM
> > weight setting.
> >
> > This will then only use the scores
> > p(the|<s>)
> > p(house|<s>,the,<unk>) ---> backoff to p(house)
> > p(in|<s>,the,<unk>,house,<unk>) ---> backoff to p(in)
> >
> > -phi
> >
> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
> > <quangngocluong@gmail.com <mailto:quangngocluong@gmail.com>
> <mailto:quangngocluong@gmail.com
> <mailto:quangngocluong@gmail.com>>> wrote:
> >
> > Dear All,
> >
> > I am currently using a SRILM Language Model (LM) in my Moses
> > decoder. Does anyone know how can I ask the decoder, at the decoding
> > time, skip all out-of-vocabulary words when computing the LM score
> > (instead of doing back-off)?
> >
> > For instance, with the n-gram: "the <unk> house <unk> in", I would
> > like the decoder to assign it the probability of the phrase: "the
> > house in" (existing in the LM).
> >
> > Do I need more options/declarations in moses.ini file?
> >
> > Any help is very much appreciated,
> >
> > Best,
> > Quang
> >
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> <mailto:Moses-support@mit.edu <mailto:Moses-support@mit.edu>>
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
>
> Best regards!
>
> Jie Jiang
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

------------------------------

Message: 2
Date: Fri, 15 Jan 2016 13:59:33 +0000
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Mgiza Debian Packages?
To: Bren Briggs <briggs.brenton@gmail.com>, moses-support@mit.edu
Message-ID: <5698FB45.7050405@gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed

there's no debian packages for mgiza that I know of, it'll be good to
have it. You're welcome to add the packaging script to the repository to
keep it in a safe place and let other people can update it afterwards.

The 1 thing to note is that the program
snt2cooc
can require a lot memory to run, which is not suitable for people
working on laptops or using large bitext.

There is a drop-in replacement for this program which is slower but uses
a tiny amount of memory
http://www.statmt.org/moses/?n=Moses.Optimize#ntoc10

On 14/01/16 16:38, Bren Briggs wrote:
> Hello all,
>
> I'd like to either obtain or roll debian packages for mgiza. Has anyone
> attempted this yet?
>
> Regards,
> Bren Briggs
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

--
Hieu Hoang
http://www.hoang.co.uk/hieu

------------------------------

Message: 3
Date: Fri, 15 Jan 2016 15:07:44 +0100
From: Ergun Bicici <ergun.bicici@dfki.de>
Subject: Re: [Moses-support] Skip OOV when computing Language Model
score
To: mail.jie.jiang@gmail.com
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAB2pGndMyo46zWDjE=oy2wq+UOrC0FP+6QKveGv9mcVMEbnRQQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Jie,

There may be some option from SRILM:
- http://www.speech.sri.com/pipermail/srilm-user/2013q2/001509.html
- http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html:
* -skipoovs*
Instruct the LM to skip over contexts that contain out-of-vocabulary words,
instead of using a backoff strategy in these cases.

?if it is not ?there maybe for a reason...

Bing appears fast to index this thread:
?http://comments.gmane.org/gmane.comp.nlp.moses.user/14570?

*Best Regards,*
Ergun

Ergun Bi?ici
DFKI Projektb?ro Berlin

On Fri, Jan 15, 2016 at 2:37 PM, Jie Jiang <mail.jie.jiang@gmail.com> wrote:

> Hi Ergun:
>
> The original request in Quang's post was:
>
> *For instance, with the n-gram: "the <unk> house <unk> in", I would like
> the decoder to assign it the probability of the phrase: "the house in"
> (existing in the LM).*
>
> so each time there is a <unk> when calculating the LM score, you need to
> look another word further.
>
> I believe that it cannot be achieved on current LM tools without modifying
> the source code, which has already been clarified by Kenneth.
>
>
> 2016-01-15 13:20 GMT+00:00 Ergun Bicici <ergun.bicici@dfki.de>:
>
>>
>> Dear Kenneth,
>>
>> In the Moses manual, -drop-unknown switch is mentioned:
>>
>> 4.7.2
>> Handling Unknown Words
>> Unknown words are copied verbatim to the output. They are also scored by
>> the language
>> model, and may be placed out of order. Alternatively, you may want to
>> drop unknown words.
>> To do so add the switch -drop-unknown.
>>
>> ?Alternatively, you can write a script that replaces all OOV tokens? with
>> some OOV-token-identifier such as <unk> before sending for translation.
>>
>>
>> *Best Regards,*
>> Ergun
>>
>> Ergun Bi?ici
>> DFKI Projektb?ro Berlin
>>
>>
>> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield <moses@kheafield.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I think oov-feature=1 just activates the OOV count feature while
>>> leaving LM score unchanged. So it would still include p(<unk> | in).
>>>
>>> One might try setting the OOV feature weight to -weight_LM *
>>> weird_moses_internal_constant * log p(<unk>) in an attempt to cancel out
>>> the log p(<unk>) terms. However that won't work either because:
>>>
>>> 1) It will still charge backoff penalties, b(the)b(house) in the example.
>>>
>>> 2) The context will be lost each time so it's p(house) not p(house |
>>> the).
>>>
>>> If the <unk>s follow a pattern, such as appearing every other word, one
>>> could insert them into the ARPA file though that would waste memory.
>>>
>>> I don't think there's any way to accomplish exactly what OP asked for
>>> without coding (though it wouldn't be that hard once one understands how
>>> the LM infrastructure works).
>>>
>>> Kenneth
>>>
>>> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
>>> > Hi,
>>> >
>>> > You may get the behavior you want by adding
>>> > "oov-feature=1"
>>> > to your LM specification line in moses.ini
>>> > and also add a second weight with value "0" to the corresponding LM
>>> > weight setting.
>>> >
>>> > This will then only use the scores
>>> > p(the|<s>)
>>> > p(house|<s>,the,<unk>) ---> backoff to p(house)
>>> > p(in|<s>,the,<unk>,house,<unk>) ---> backoff to p(in)
>>> >
>>> > -phi
>>> >
>>> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
>>> > <quangngocluong@gmail.com <mailto:quangngocluong@gmail.com>> wrote:
>>> >
>>> > Dear All,
>>> >
>>> > I am currently using a SRILM Language Model (LM) in my Moses
>>> > decoder. Does anyone know how can I ask the decoder, at the
>>> decoding
>>> > time, skip all out-of-vocabulary words when computing the LM score
>>> > (instead of doing back-off)?
>>> >
>>> > For instance, with the n-gram: "the <unk> house <unk> in", I would
>>> > like the decoder to assign it the probability of the phrase: "the
>>> > house in" (existing in the LM).
>>> >
>>> > Do I need more options/declarations in moses.ini file?
>>> >
>>> > Any help is very much appreciated,
>>> >
>>> > Best,
>>> > Quang
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Moses-support mailing list
>>> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Moses-support mailing list
>>> > Moses-support@mit.edu
>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
>
> Best regards!
>
> Jie Jiang
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160115/ad5eff02/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 111, Issue 44
**********************************************

Moses-support Digest, Vol 111, Issue 44

0 Response to "Moses-support Digest, Vol 111, Issue 44"

Post a Comment