Moses-support Digest, Vol 105, Issue 20

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Getting counts in Moses instead of probabilities (Hieu Hoang)
2. Re: Getting counts in Moses instead of probabilities (Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Thu, 9 Jul 2015 16:38:56 +0400
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Getting counts in Moses instead of
probabilities
To: Harshit Gupta <harshitgupta165@gmail.com>
Cc: moses-support@mit.edu
Message-ID: <559E6B60.5090000@gmail.com>
Content-Type: text/plain; charset="utf-8"

Consider the 2nd line '33 7 2'.
count(target) = 33
count(source) = 7
count(source, target) = 2

p(source|target) = count(source, target) / count(target) = 2/33 = 0.0606
p(target|source) = count(source, target) / count(source) = 2/7 = 0.2857

As you can see, the probabilities match the 1st and 3rd numbers in the
probabilities column. The probabilities column is described here
http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases

On 09/07/2015 15:49, Harshit Gupta wrote:
> Hi Hieu, sorry but I didn't get the exact meaning of counts. As an
> example, I am considering few lines from my png file which have same
> English phrase (be) as
>
> be ||| ?? ??? ?? ||| 1 0.1 0.142857 2.63535e-05 ||| 0-2 ||| 1 7 1 ||| |||
> be ||| ?? ||| 0.0606061 0.1 0.285714 0.375 ||| 0-0 ||| 33 7 2 ||| |||
> be ||| ??? ?? ||| 1 0.0238095 0.142857 0.00162337 ||| 0-1 ||| 1 7 1 ||| |||
> be ||| ???? ??? ?? ||| 1 0.0238095 0.142857 1.40552e-05 ||| 0-2 ||| 1 7
> 1 ||| |||
> be ||| ??? ?? ||| 1 0.1 0.142857 0.000811687 ||| 0-1 ||| 1 7 1 ||| |||
> be ||| ?? ||| 0.0196078 0.0238095 0.142857 0.125 ||| 0-0 ||| 51 7 1 ||| |||
>
> The column after the alignment column shows count. Why are these
> counts different for the same English phrase ? And what does the three
> discrete numbers '1 7 1' or '51 7 1' or '33 7 2' represents ? Does
> these represents the number of times the source/target phrase is
> repeated in corpora or they are calculated using some rule/function in
> Moses ?
>
> Thanks
>
> Regards
> Harshit
>
> On Thu, Jul 9, 2015 at 4:13 PM, Hieu Hoang <hieuhoang@gmail.com
> <mailto:hieuhoang@gmail.com>> wrote:
>
>
>
> On 09/07/2015 14:19, Harshit Gupta wrote:
>> Hi Hieu, Thanks fot the reply. However, I have some further
>> doubts in this.
>> By count of a phrase, I want to know how many times a phrase is
>> repeated in the corpora. So, can I get this counts from the cpp
>> source file you have mentioned ?
>> Also, in the phrase tables, the first four columns are for
>> lexical weighting and phrase translation probabilities and then
>> there are alignments between the source and target language. Here
>> also, is it possible to get the counts of the phrases ?
> yes, the next column (after the alignments) are the counts. In
> your png file, the column '1 3 1' are the counts for the 1st
> translation rule
>
>>
>> Regards
>> Harshit
>>
>> On Thu, Jul 9, 2015 at 1:29 PM, Hieu Hoang <hieuhoang@gmail.com
>> <mailto:hieuhoang@gmail.com>> wrote:
>>
>> The counts are written in the 5th column in the phrase table.
>> http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases
>> This is for debugging purposes only, they don't influence
>> decoding in anyway.
>>
>> IF you want to know more about how it works - the counts are
>> stored in the file extract.*.sorted.gz and
>> extract.*.inv.sorted.gz. The counts are summed and the
>> probability is calculated by the score program. The source
>> code for the score program is in
>> phrase-extract/score-main.cpp
>>
>>
>> On 08/07/2015 18:05, Harshit Gupta wrote:
>>> Hi, I am currently working on Moses platform and in the
>>> phrase tables, I am interested in the counts of phrases
>>> instead of phrase translation probabilities. Can I get to
>>> know this counts ?
>>> In the Moses manual, it is mentioned that in training
>>> process in calculating phrase scores that
>>> "To estimate the phrase translation probability ?(e|f) we
>>> proceed as follows: First, the extract file is sorted. This
>>> ensures that all English phrase translations for an foreign
>>> phrase are next to each other in the file. Thus, we can
>>> process the file, one foreign phrase at a time, *collect
>>> counts* and compute ?(e|f) for that foreign phrase f."
>>>
>>> Where are these counts collected ? Where can I get these
>>> counts ?
>>>
>>> Regards
>>> Harshit
>>>
>>> --
>>> Harshit Gupta
>>> Third Year Undergraduate
>>> Electrical Engineering
>>> IIT Madras
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> --
>> Hieu Hoang
>> Researcher
>> New York University, Abu Dhabi
>> http://www.hoang.co.uk/hieu
>>
>>
>>
>>
>> --
>> Harshit Gupta
>> Third Year Undergraduate
>> Electrical Engineering
>> IIT Madras
>
> --
> Hieu Hoang
> Researcher
> New York University, Abu Dhabi
> http://www.hoang.co.uk/hieu
>
>
>
>
> --
> Harshit Gupta
> Third Year Undergraduate
> Electrical Engineering
> IIT Madras

--
Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150709/88eede13/attachment-0001.htm

------------------------------

Message: 2
Date: Thu, 9 Jul 2015 18:26:00 +0400
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Getting counts in Moses instead of
probabilities
To: Harshit Gupta <harshitgupta165@gmail.com>, moses-support
<moses-support@mit.edu>
Message-ID:
<CAEKMkbjsT3oL=hStyOq-gnaV_eRw-dRJ-djOai=Bc-kqF4idNg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

On 9 July 2015 at 18:21, Harshit Gupta <harshitgupta165@gmail.com> wrote:

> Thanks a lot for the help.
> I just have one more doubt in phrase tables. How are the values of
> probability in 2nd and 4th column of lexical weighting probabilities
> calculated ? Are they also calculated using the same counts or they use a
> different function to calculate those probabilities ?
>
they are calculated slightly differently, using the word alignment


> And while giving the output as BEST TRANSLATION, which probability is
> refereed by Moses amongst these 4 probabilities calculated in phrase table ?
>
All 4 probabilities are given weights during tuning. The best translation
is the translation with the best weighted score.

>
> Thanks
>
> Regards
> Harshit
>
> On Thu, Jul 9, 2015 at 6:08 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>
>> Consider the 2nd line '33 7 2'.
>> count(target) = 33
>> count(source) = 7
>> count(source, target) = 2
>>
>> p(source|target) = count(source, target) / count(target) = 2/33 = 0.0606
>> p(target|source) = count(source, target) / count(source) = 2/7 = 0.2857
>>
>> As you can see, the probabilities match the 1st and 3rd numbers in the
>> probabilities column. The probabilities column is described here
>> http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases
>>
>>
>> On 09/07/2015 15:49, Harshit Gupta wrote:
>>
>> Hi Hieu, sorry but I didn't get the exact meaning of counts. As an
>> example, I am considering few lines from my png file which have same
>> English phrase (be) as
>>
>> be ||| ?? ??? ?? ||| 1 0.1 0.142857 2.63535e-05 ||| 0-2 ||| 1 7 1 ||| |||
>> be ||| ?? ||| 0.0606061 0.1 0.285714 0.375 ||| 0-0 ||| 33 7 2 ||| |||
>> be ||| ??? ?? ||| 1 0.0238095 0.142857 0.00162337 ||| 0-1 ||| 1 7 1 |||
>> |||
>> be ||| ???? ??? ?? ||| 1 0.0238095 0.142857 1.40552e-05 ||| 0-2 ||| 1 7 1
>> ||| |||
>> be ||| ??? ?? ||| 1 0.1 0.142857 0.000811687 ||| 0-1 ||| 1 7 1 ||| |||
>> be ||| ?? ||| 0.0196078 0.0238095 0.142857 0.125 ||| 0-0 ||| 51 7 1 |||
>> |||
>>
>> The column after the alignment column shows count. Why are these counts
>> different for the same English phrase ? And what does the three discrete
>> numbers '1 7 1' or '51 7 1' or '33 7 2' represents ? Does these represents
>> the number of times the source/target phrase is repeated in corpora or they
>> are calculated using some rule/function in Moses ?
>>
>> Thanks
>>
>> Regards
>> Harshit
>>
>> On Thu, Jul 9, 2015 at 4:13 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>>
>>>
>>>
>>> On 09/07/2015 14:19, Harshit Gupta wrote:
>>>
>>> Hi Hieu, Thanks fot the reply. However, I have some further doubts in
>>> this.
>>> By count of a phrase, I want to know how many times a phrase is
>>> repeated in the corpora. So, can I get this counts from the cpp source file
>>> you have mentioned ?
>>> Also, in the phrase tables, the first four columns are for lexical
>>> weighting and phrase translation probabilities and then there are
>>> alignments between the source and target language. Here also, is it
>>> possible to get the counts of the phrases ?
>>>
>>> yes, the next column (after the alignments) are the counts. In your png
>>> file, the column '1 3 1' are the counts for the 1st translation rule
>>>
>>>
>>> Regards
>>> Harshit
>>>
>>> On Thu, Jul 9, 2015 at 1:29 PM, Hieu Hoang < <hieuhoang@gmail.com>
>>> hieuhoang@gmail.com> wrote:
>>>
>>>> The counts are written in the 5th column in the phrase table.
>>>> http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases
>>>> This is for debugging purposes only, they don't influence decoding in
>>>> anyway.
>>>>
>>>> IF you want to know more about how it works - the counts are stored in
>>>> the file extract.*.sorted.gz and extract.*.inv.sorted.gz. The counts are
>>>> summed and the probability is calculated by the score program. The source
>>>> code for the score program is in
>>>> phrase-extract/score-main.cpp
>>>>
>>>>
>>>> On 08/07/2015 18:05, Harshit Gupta wrote:
>>>>
>>>> Hi, I am currently working on Moses platform and in the phrase
>>>> tables, I am interested in the counts of phrases instead of phrase
>>>> translation probabilities. Can I get to know this counts ?
>>>> In the Moses manual, it is mentioned that in training process in
>>>> calculating phrase scores that
>>>> "To estimate the phrase translation probability ?(e|f) we proceed as
>>>> follows: First, the extract file is sorted. This ensures that all English
>>>> phrase translations for an foreign phrase are next to each other in the
>>>> file. Thus, we can process the file, one foreign phrase at a time, *collect
>>>> counts* and compute ?(e|f) for that foreign phrase f."
>>>>
>>>> Where are these counts collected ? Where can I get these counts ?
>>>>
>>>> Regards
>>>> Harshit
>>>>
>>>> --
>>>> Harshit Gupta
>>>> Third Year Undergraduate
>>>> Electrical Engineering
>>>> IIT Madras
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>> --
>>>> Hieu Hoang
>>>> Researcher
>>>> New York University, Abu Dhabihttp://www.hoang.co.uk/hieu
>>>>
>>>>
>>>
>>>
>>> --
>>> Harshit Gupta
>>> Third Year Undergraduate
>>> Electrical Engineering
>>> IIT Madras
>>>
>>>
>>> --
>>> Hieu Hoang
>>> Researcher
>>> New York University, Abu Dhabihttp://www.hoang.co.uk/hieu
>>>
>>>
>>
>>
>> --
>> Harshit Gupta
>> Third Year Undergraduate
>> Electrical Engineering
>> IIT Madras
>>
>>
>> --
>> Hieu Hoang
>> Researcher
>> New York University, Abu Dhabihttp://www.hoang.co.uk/hieu
>>
>>
>
>
> --
> Harshit Gupta
> Third Year Undergraduate
> Electrical Engineering
> IIT Madras
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150709/886efdf9/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 105, Issue 20
**********************************************

0 Response to "Moses-support Digest, Vol 105, Issue 20"

Post a Comment