Moses-support Digest, Vol 127, Issue 9

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Computing Perplexity with KenLM (Python API) (liling tan)
2. Re: (no subject) (G/her G/libanos)


----------------------------------------------------------------------

Message: 1
Date: Mon, 8 May 2017 14:37:53 +0800
From: liling tan <alvations@gmail.com>
Subject: [Moses-support] Computing Perplexity with KenLM (Python API)
To: moses-support <moses-support@mit.edu>
Cc: Ilia Kurenkov <ilia.kurenkov@gmail.com>
Message-ID:
<CAKzPaJK8jsF-WTGkD7qwp+9pxO4Zh2A1NYn-L09_1Dmp1nKuPA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Moses Community,

Does anyone know how to compute sentence perplexity with a KenLM model?

Let's say we build a model on this:

$ wget https://gist.githubusercontent.com/alvations/1c1b388456dc3760ffb487ce950712ac/raw/86cdf7de279a2b9bceeb3adb481e42691d12fbba/something.txt
$ lmplz -o 5 < something.txt > something.arpa


>From the perplexity formula (
https://web.stanford.edu/class/cs124/lec/languagemodeling.pdf)

Applying the sum of inverse log formula to get the inner variable and then
taking the nth root, the perplexity number is unusually small:

>>> import kenlm>>> m = kenlm.Model('something.arpa')
# Sentence seen in data.>>> s = 'The development of a forward-looking
and comprehensive European migration policy,'>>>
list(m.full_scores(s))
[(-0.8502398729324341, 2, False), (-3.0185394287109375, 3, False),
(-0.3004383146762848, 4, False), (-1.0249041318893433, 5, False),
(-0.6545327305793762, 5, False), (-0.29304179549217224, 5, False),
(-0.4497605562210083, 5, False), (-0.49850910902023315, 5, False),
(-0.3856896460056305, 5, False), (-0.3572353720664978, 5, False),
(-1.7523181438446045, 1, False)]>>> n = len(s.split())>>> sum_inv_logs
= -1 * sum(score for score, _, _ in m.full_scores(s))>>>
math.pow(sum_inv_logs, 1.0/n)1.2536033936438895


Trying again with a sentence not found in the data:

# Sentence not seen in data.>>> s = 'The European developement of a
forward-looking and comphrensive society is doh.'>>> sum_inv_logs = -1
* sum(score for score, _, _ in m.full_scores(s))>>>
sum_inv_logs35.59524390101433>>> n = len(s.split())>>>
math.pow(sum_inv_logs, 1.0/n)1.383679905428275


And trying again with totally out of domain data:

>>> s = """On the evening of 5 May 2017, just before the French Presidential Election on 7 May, it was reported that nine gigabytes of Macron's campaign emails had been anonymously posted to Pastebin, a document-sharing site. In a statement on the same evening, Macron's political movement, En Marche!, said: "The En Marche! Movement has been the victim of a massive and co-ordinated hack this evening which has given rise to the diffusion on social media of various internal information""">>> sum_inv_logs = -1 * sum(score for score, _, _ in m.full_scores(s))>>> sum_inv_logs282.61719834804535>>> n = len(list(m.full_scores(s)))>>> n79>>> math.pow(sum_inv_logs, 1.0/n)1.0740582373271952



Although, it is expected that the longer sentence has lower perplexity,
it's strange that the difference is less than 1.0 and in the range of
decimals.

Is the above the right way to compute perplexity with KenLM? If not, does
anyone know how to computer perplexity with the KenLM through the Python
API?

Thanks in advance for the help!

Regards,
Liling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170508/71d10b68/attachment-0001.html

------------------------------

Message: 2
Date: Mon, 8 May 2017 10:18:10 +0300
From: "G/her G/libanos" <gerizaba@gmail.com>
Subject: Re: [Moses-support] (no subject)
To: Mathias M?ller <mathias.mueller@uzh.ch>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CALR6RoQdJxiUeFnn5BWOzuZjNyjfYLANe56dH5Bp9h4Ehb5qSg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

hello there
how is every thing going on I hope it would be nice

I an focused which alignment of IBM is used in the Moses since I want to
write my documentation

On Tue, May 2, 2017 at 10:58 AM, G/her G/libanos <gerizaba@gmail.com> wrote:

> hello there
>
>
>
> I am doing well and your contribution is high
>
>
>
> my adviser asks me to align the training system using phrase rather than
> word based using Giza++
>
>
>
> what package can I use to do phrase based alignment
>
>
>
> thanks...
>
> On Thu, Apr 13, 2017 at 1:37 AM, G/her G/libanos <gerizaba@gmail.com>
> wrote:
>
>> hello there ....
>>
>> how is every thing
>>
>>
>>
>> I want to uses the m4loc to split the corpus into train, testing and
>> tuning randomly
>>
>> but I didn't get how to apply it.
>>
>>
>>
>> the idea get from this paper
>>
>> Phoneme-based English-Amharic Statistical Machine Translation
>>
>>
>> thanks..
>>
>>
>>
>> On Mon, Mar 27, 2017 at 3:42 AM, G/her G/libanos <gerizaba@gmail.com>
>> wrote:
>>
>>> hello there...
>>>
>>> I want to ask you some thing
>>>
>>> 1. in language modeling for the monolingual data using IRSLM what value
>>> n of the N-gram is more value?
>>>
>>> 2. in the training of the parallel corpus how I can isolate the testing
>>> and tuning data randomly from the total corpus, since choosing myself is
>>> not recommended?
>>>
>>>
>>> thanks....
>>>
>>>
>>> G/her from bahirdar university
>>>
>>> On Sat, Mar 11, 2017 at 1:06 AM, G/her G/libanos <gerizaba@gmail.com>
>>> wrote:
>>>
>>>> hello there I would like thanks your contribution in my work...
>>>> now I have one problem to create web-based translation system
>>>> when I check using the Netcat by following the manual in the
>>>> https://github.com/moses-smt/mosesdecoder/tree/master/contri
>>>> b/iSenWeb/Introduction
>>>> it display these result and the web site doesn't work
>>>>
>>>> thanks...
>>>>
>>>> On Sat, Feb 25, 2017 at 1:09 PM, Mathias M?ller <mathias.mueller@uzh.ch
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Did you perhaps forget the leading "/" in the file path to an existing
>>>>> installation of XML-RPC?
>>>>>
>>>>> Regards
>>>>> Mathias
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Feb 25, 2017 at 10:46 AM, G/her G/libanos <gerizaba@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> hello there I am doing on the Moses server but on installing the
>>>>>> xmlrpc create an error
>>>>>> what I can do
>>>>>>
>>>>>> thanks for every thing you doing all
>>>>>>
>>>>>> On Wed, Feb 8, 2017 at 3:07 AM, Marwa Refaie <basmallah@hotmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> For web server dig little on those links I think could help
>>>>>>>
>>>>>>> http://www.statmt.org/moses/?n=Moses.WebTranslation
>>>>>>>
>>>>>>> https://github.com/moses-smt/mosesdecoder/tree/master/contri
>>>>>>> b/iSenWeb/Introduction
>>>>>>>
>>>>>>> *Marwa N Refaie*
>>>>>>> On 7 Feb 2017, at 21:05, G/her G/libanos <gerizaba@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> the Moses decoder works for our system on Ubuntu using terminal
>>>>>>>> but we want to make user interactive whether in the form of web
>>>>>>>> page or application based
>>>>>>>> could you help me any information to done that
>>>>>>>> 10qu...
>>>>>>>>
>>>>>>>> On Mon, Dec 12, 2016 at 8:43 AM, G/her G/libanos <
>>>>>>>> gerizaba@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> we create the translation model and we need to uses the window
>>>>>>>>> application using
>>>>>>>>> http://www.statmt.org/moses/?n=Moses.Packages
>>>>>>>>> but when we import our model into the mainWindow application it
>>>>>>>>> create an error
>>>>>>>>> what we can do.
>>>>>>>>> since it works for EN into Fr
>>>>>>>>> 10qu for ...
>>>>>>>>>
>>>>>>>>> On Mon, Dec 12, 2016 at 7:17 PM, G/her G/libanos <
>>>>>>>>> gerizaba@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> hello there...
>>>>>>>>>> First of all I would like to thanks for your response for the
>>>>>>>>>> previous comments on the question I rise
>>>>>>>>>>
>>>>>>>>>> next my system trained by 1000 parallel sentence and I try to
>>>>>>>>>> calculate the Bleu score of the translation system using the vedio on TAUS
>>>>>>>>>> and I got this one
>>>>>>>>>>
>>>>>>>>>> On Fri, Dec 9, 2016 at 5:17 PM, G/her G/libanos <
>>>>>>>>>> gerizaba@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> hello there....
>>>>>>>>>>> 1. we done our work using baseline and the system work for the
>>>>>>>>>>> train data
>>>>>>>>>>> but we need to localized the source code in the tokinization to
>>>>>>>>>>> leave abrivated word ?/? as in english Adm. as Notbreaking_prefix.en that
>>>>>>>>>>> is use the dot(.)
>>>>>>>>>>>
>>>>>>>>>>> 2. where we to change the code in c++ or in the perl part of the
>>>>>>>>>>> codes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 3. we need to uses the translation system in window, since the
>>>>>>>>>>> end users are not expert in ubuntu and we see the how to change the window
>>>>>>>>>>> see the Moses GUI and which part of the our file will be load to the model
>>>>>>>>>>> only or all the train including the train data and can we modify the GUI of
>>>>>>>>>>> the Moses.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 10qs for every thing
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> .......education is door of one's life.......
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> .......education is door of one's life.......
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> .......education is door of one's life.......
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> Moses-support@mit.edu
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> .......education is door of one's life.......
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> .......education is door of one's life.......
>>>
>>
>>
>>
>> --
>>
>> .......education is door of one's life.......
>>
>
>
>
> --
>
> .......education is door of one's life.......
>



--

.......education is door of one's life.......
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170508/7478b9c2/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 127, Issue 9
*********************************************

0 Response to "Moses-support Digest, Vol 127, Issue 9"

Post a Comment