Moses-support Digest, Vol 100, Issue 64

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: installing giza with moses (mohamed hasanien)
2. KN and modified-KN implementation in SRILM/KenLM (koormoosh)
3. Re: Tuning with mert-moses.perl error (Hacksawhawk .)


----------------------------------------------------------------------

Message: 1
Date: Wed, 18 Feb 2015 18:31:54 +0000 (UTC)
From: mohamed hasanien <mhmd_hasnen@yahoo.com>
Subject: Re: [Moses-support] installing giza with moses
To: Dimitris Mavroeidis <dmavroeidis@csri.gr>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID:
<911497512.1298214.1424284314540.JavaMail.yahoo@mail.yahoo.com>
Content-Type: text/plain; charset="utf-8"

thank you?mohammed hassanien Mohammed
Egyption Programmers Vice-captain
01000121556
Egyption Programmers Syndicate


On Wednesday, February 18, 2015 1:38 PM, Dimitris Mavroeidis <dmavroeidis@csri.gr> wrote:


This is your problem. Install "make" on your system.


On 17/02/2015 10:02 ??, mhmd hassnen wrote:

-sh: make: command not found


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150218/6eeaa31d/attachment-0001.htm

------------------------------

Message: 2
Date: Thu, 19 Feb 2015 10:59:00 +1100
From: koormoosh <koormoosh@gmail.com>
Subject: [Moses-support] KN and modified-KN implementation in
SRILM/KenLM
To: moses-support@mit.edu, Kenneth Heafield <moses@kheafield.com>,
srilm-user@speech.sri.com
Message-ID:
<CAN3_CDgdwxxopC2vOvgS7qzkPN7ifpqRGkwAe+zds3t++pYfZQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,

I am puzzled with the way SRI calculates the Kneser-Ney and
modified-Kneser-Ney probabilities. I would appreciate it if anyone who has
been "carefully" using these packages, or developed them could help me to
figure this out. please note that I have spent more than 50 hours figuring
this out and keep getting mixed outputs of expected and unexpected
perplexity scores. And note that I've read Goodman-Chen paper a few times,
and checked out the SRILM FAQ, discount, etc pages and codes and still this
is not clear how the probabilities are calculated. Following the
computation steps SRILM released on their web-page (
http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html)
doesn't produce the same result as the package itself (sometimes it does,
and sometimes it doesn't). For simplicity pretend that in the test data
there is no OOV words, and no pruning/refinement is happening in the
training time. what I am looking for is the math behind these computations
and not the hack. In below I am writing one possibility of calculating KN
and m-KN, I would appreciate it a lot if you could leave your comments when
you know the assumption I made is inconsistent with SRI/KenLM.

*For Kneser-Ney:*

- SRI uses the actual count and not the continuation counts for the highest
order ngrams and any ngrams that start with <s>
- SRI for Kneser-Ney uses one single discount D (= n_1 / (n_1 + 2*n_2 ) ),
where n_1,n_2 are calculated based on the size of the ngram. So for
example, in 3-gram model, the D is calculated based on the count of 3-grams
of frequency 1, and 2. And more importantly, if I understood correctly,
they use the same D (that is calculated based on 3-gram) even when they
backoff to 2-gram. Then the formulation becomes:

P(c|ab) = max{c(abc)-*D*,0} / c(ab) + D * N1+(ab.) / c(ab) * P(c|b)
gamma(ab) = N_1 (ab .) *D_1 + N_2 (ab .) * D_2 + N_+3 (ab .) * D_+3
D is calculated *based on 3-gram order*

P(c|b) = max{N1+(.bc)-*D*,0} / N1+(.b.) + D * N1+(b.) / N1+(.b.) * P(c)
D is calculated *based on 3-gram order* *(same discount as the highest
order)*

P(c) = N1+(.c) / N1+(..)

is this correct?

*For modified Kneser-Ney:*
I am making the following assumptions about modified-KN implementation and
I would appreciate:

- Similar to SRI, the actual counts are used for the highest order ngram,
and those that start with <s>. For the lower orders, the counts are just
continuation counts.
- Discounts are not tied together anymore, each level of backoff has its
own discount. That discount itself is calculated based on the actual count
(for the highest order, and those starting with <s>) or continuation count
(lower orders) of the ngram sent to that level. So for example, for the
3gram case we can write the followings:

P(c|ab) = max{c(abc)-D(c(abc)),0} / c(ab) + gamma(ab) / c(ab) * P(c|b)
gamma(ab) = N_1 (ab .) *D_1 + N_2 (ab .) * D_2 + N_+3 (ab .) * D_+3
D_1,D_2,D_3 are calculated *based on 3-gram order*

P(c|b) = max{N1+(. bc)-*D(N1+(.bc)*),0} / N1+(.b.) + gamma(b) / N1+(.b.) *
P(c)
gamma(b) = N_1 (b .) *D_1 + N_2 (b .) * D_2 + N_+3 (b .) * D_+3
D_1,D_2,D_3 are calculated *based on 2-gram order*

P(c) = N1+(.c) / N1+(..)

is this how SRI does it?

Thanks,
Koorm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150219/053e79a2/attachment-0001.htm

------------------------------

Message: 3
Date: Thu, 19 Feb 2015 09:12:34 +0100
From: "Hacksawhawk ." <hayo.ce@gmail.com>
Subject: Re: [Moses-support] Tuning with mert-moses.perl error
To: Matthias Huck <mhuck@inf.ed.ac.uk>
Cc: moses-support@mit.edu
Message-ID:
<CABYw3nQo=LgeWyt425XnKer=9vMN38ZYYMBRjsXLh5xt_fnTQg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,

My apologies for not being able to get back to you immeadiately.

I have attached the filtered/moses.ini file. As for the Git commit ID, I
didn't exactly know where to find that but the latest change in the git log
was the following:





*commit 755bd609f506fa6ce68a935f72499e055a6a4b6cAuthor: Hieu Hoang
<hieuhoang@gmail.com <hieuhoang@gmail.com>>Date: Fri Feb 6 15:52:25 2015
+0000 Using boost for prefix/suffix checks /Jeroen Vermeulen*

Hopefully this helps.

kind regards,
Hayo


2015-02-17 17:19 GMT+01:00 Matthias Huck <mhuck@inf.ed.ac.uk>:

> Hi Hayo,
>
> Can you please do two things:
>
> 1.) Send me the file filtered/moses.ini so that I can have a look at the
> feature functions and scaling factors in there.
>
> 2.) Tell me the Git commit ID of the Moses version you're working with.
> A bug was put into master with commit 70e8eb5. It's been fixed a couple
> of days later (commit 0de206f). If you've checked out Moses from GitHub
> with the bug, you need to update to the most recent code base and the
> error most likely will be gone.
>
> Cheers,
> Matthias
>
>
> On Tue, 2015-02-17 at 17:01 +0100, Hacksawhawk . wrote:
> > Hi,
> >
> >
> > While trying to tune the translation system I created, I ran into the
> > following erorr:
> >
> > The following weights have no feature function. Maybe incorrectly
> > spelt weights: ,Exit code: 1
> > The decoder died. CONFIG WAS -weight-overwrite 'PhrasePenalty0=
> > 0.043478 WordPenalty0= -0.217391 TranslationModel0= 0.043478 0.043478
> > 0.043478 0.043478 Distortion0= 0.065217 LM0= 0.108696
> > LexicalReordering0= 0.065217 0.065217 0.065217 0.065217 0.065217
> > 0.065217'
> >
> >
> > It seems that mert-moses.pl is rearranging the weight features and
> > then trying to overwrite the weight features in the moses.ini file but
> > in the wrong order, is this the cause of the error?
> >
> > I have also attached the mert.out file, hopefully this will provide
> > more information.
> >
> >
> > thanks in advance,
> >
> > Hayo
> >
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150219/6ca4ee33/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: moses.ini
Type: application/octet-stream
Size: 915 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150219/6ca4ee33/attachment.obj

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 100, Issue 64
**********************************************

0 Response to "Moses-support Digest, Vol 100, Issue 64"

Post a Comment