Moses-support Digest, Vol 93, Issue 27

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Phrase extraction with --IncludeSentenceId messes up
phrase table counts (Philipp Koehn)
2. Re: Phrase extraction with --IncludeSentenceId messes up
phrase table counts (Hieu Hoang)
3. Re: Phrase extraction with --IncludeSentenceId messes up
phrase table counts (Marcin Junczys-Dowmunt)


----------------------------------------------------------------------

Message: 1
Date: Wed, 23 Jul 2014 11:20:09 -0400
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] Phrase extraction with
--IncludeSentenceId messes up phrase table counts
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDB3ytaJimvq03veXhDtQH-nvVDRAJNch3QCBW98M_Mxvg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

the sentence ID is being used for the domain indicator features.

If you run phrase-extract's score with specifying a domain file,
it then it uses the sentence IDs to find out which domain the
phrase pair was found in.

This is a standard features in Edinburgh's phrase-based system
for the last 1-2 years, so if you want to make changes, make
sure that this functionality still works (see [1381-5] for an example
with extract* files still in place).

-phi


On Wed, Jul 23, 2014 at 7:15 AM, Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
wrote:

> Key-value format would actually be fine.
>
> W dniu 23.07.2014 13:12, Marcin Junczys-Dowmunt pisze:
>
> I was planning to use it for a custom feature function later.
>
> W dniu 23.07.2014 13:11, Hieu Hoang pisze:
>
> i can change it so that the sentence id is put into a key-value field in
> the last column.
>
> what is the sentence id used for? is it just for debugging purposes?
>
>
> On 23 July 2014 11:36, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:
>
>> Hi,
>> I am using train-model.perl with
>>
>> --extract-options="--IncludeSentenceId"
>>
>> and it seems that the sentence id is somehow getting into the phrase
>> table as a count and later used for phrase translation weight
>> calculation, for instance the extract (last column is the Id):
>>
>> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
>> 3-2 4-3 ||| 1374618
>> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
>> 3-2 4-3 ||| 1374619
>> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
>> 3-2 4-3 ||| 1374620
>> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
>> 3-2 4-3 ||| 1374621
>> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
>> 3-2 4-3 ||| 1374622
>> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
>> 3-2 4-3 ||| 4587318
>>
>> results in a phrase table entry like this:
>>
>> #c the compound or process ||| #c verbindung oder verfahren ||| 1
>> 0.0100206 5.23542e-07 0.524577 ||| 0-0 2-1 3-2 4-3 ||| 6 1.14604e+07 6
>> ||| |||
>>
>> The count is equal to the sum of sentence ids, which of course make the
>> phrase probability useless.
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
>
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140723/b0eb2172/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 23 Jul 2014 16:32:23 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Phrase extraction with
--IncludeSentenceId messes up phrase table counts
To: Philipp Koehn <pkoehn@inf.ed.ac.uk>, Marcin Junczys-Dowmunt
<junczys@amu.edu.pl>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <53CFD587.3020501@gmail.com>
Content-Type: text/plain; charset="windows-1252"

ah ok.

I thought it was just for debugging. I'm not gonna change it since it's
gonna involve months of debugging.

Ideally, the extract format should be fixed like the phrase-table, with
the last column being key-value pairs. Also, way the key-value pairs are
processed should be automatic like in the decoder.

marcin - sorry mate. you're on your own

On 23/07/14 16:20, Philipp Koehn wrote:
> Hi,
>
> the sentence ID is being used for the domain indicator features.
>
> If you run phrase-extract's score with specifying a domain file,
> it then it uses the sentence IDs to find out which domain the
> phrase pair was found in.
>
> This is a standard features in Edinburgh's phrase-based system
> for the last 1-2 years, so if you want to make changes, make
> sure that this functionality still works (see [1381-5] for an example
> with extract* files still in place).
>
> -phi
>
>
> On Wed, Jul 23, 2014 at 7:15 AM, Marcin Junczys-Dowmunt
> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>> wrote:
>
> Key-value format would actually be fine.
>
> W dniu 23.07.2014 13:12, Marcin Junczys-Dowmunt pisze:
>> I was planning to use it for a custom feature function later.
>>
>> W dniu 23.07.2014 13:11, Hieu Hoang pisze:
>>> i can change it so that the sentence id is put into a key-value
>>> field in the last column.
>>>
>>> what is the sentence id used for? is it just for debugging purposes?
>>>
>>>
>>> On 23 July 2014 11:36, Marcin Junczys-Dowmunt
>>> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>> wrote:
>>>
>>> Hi,
>>> I am using train-model.perl with
>>>
>>> --extract-options="--IncludeSentenceId"
>>>
>>> and it seems that the sentence id is somehow getting into
>>> the phrase
>>> table as a count and later used for phrase translation weight
>>> calculation, for instance the extract (last column is the Id):
>>>
>>> #c the compound or process ||| #c verbindung oder verfahren
>>> ||| 0-0 2-1
>>> 3-2 4-3 ||| 1374618
>>> #c the compound or process ||| #c verbindung oder verfahren
>>> ||| 0-0 2-1
>>> 3-2 4-3 ||| 1374619
>>> #c the compound or process ||| #c verbindung oder verfahren
>>> ||| 0-0 2-1
>>> 3-2 4-3 ||| 1374620
>>> #c the compound or process ||| #c verbindung oder verfahren
>>> ||| 0-0 2-1
>>> 3-2 4-3 ||| 1374621
>>> #c the compound or process ||| #c verbindung oder verfahren
>>> ||| 0-0 2-1
>>> 3-2 4-3 ||| 1374622
>>> #c the compound or process ||| #c verbindung oder verfahren
>>> ||| 0-0 2-1
>>> 3-2 4-3 ||| 4587318
>>>
>>> results in a phrase table entry like this:
>>>
>>> #c the compound or process ||| #c verbindung oder verfahren
>>> ||| 1
>>> 0.0100206 5.23542e-07 0.524577 ||| 0-0 2-1 3-2 4-3 ||| 6
>>> 1.14604e+07 6
>>> ||| |||
>>>
>>> The count is equal to the sum of sentence ids, which of
>>> course make the
>>> phrase probability useless.
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>> --
>>> Hieu Hoang
>>> Research Associate
>>> University of Edinburgh
>>> http://www.hoang.co.uk/hieu
>>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140723/0038a91b/attachment-0001.htm

------------------------------

Message: 3
Date: Wed, 23 Jul 2014 17:34:04 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Phrase extraction with
--IncludeSentenceId messes up phrase table counts
To: Hieu Hoang <hieuhoang@gmail.com>, Philipp Koehn
<pkoehn@inf.ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <53CFD5EC.3070308@amu.edu.pl>
Content-Type: text/plain; charset="windows-1252"


So, how come this is not damaging the Edinburgh system?

W dniu 23.07.2014 17:32, Hieu Hoang pisze:
> ah ok.
>
> I thought it was just for debugging. I'm not gonna change it since
> it's gonna involve months of debugging.
>
> Ideally, the extract format should be fixed like the phrase-table,
> with the last column being key-value pairs. Also, way the key-value
> pairs are processed should be automatic like in the decoder.
>
> marcin - sorry mate. you're on your own
>
> On 23/07/14 16:20, Philipp Koehn wrote:
>> Hi,
>>
>> the sentence ID is being used for the domain indicator features.
>>
>> If you run phrase-extract's score with specifying a domain file,
>> it then it uses the sentence IDs to find out which domain the
>> phrase pair was found in.
>>
>> This is a standard features in Edinburgh's phrase-based system
>> for the last 1-2 years, so if you want to make changes, make
>> sure that this functionality still works (see [1381-5] for an example
>> with extract* files still in place).
>>
>> -phi
>>
>>
>> On Wed, Jul 23, 2014 at 7:15 AM, Marcin Junczys-Dowmunt
>> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>> wrote:
>>
>> Key-value format would actually be fine.
>>
>> W dniu 23.07.2014 13:12, Marcin Junczys-Dowmunt pisze:
>>> I was planning to use it for a custom feature function later.
>>>
>>> W dniu 23.07.2014 13:11, Hieu Hoang pisze:
>>>> i can change it so that the sentence id is put into a key-value
>>>> field in the last column.
>>>>
>>>> what is the sentence id used for? is it just for debugging
>>>> purposes?
>>>>
>>>>
>>>> On 23 July 2014 11:36, Marcin Junczys-Dowmunt
>>>> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>> wrote:
>>>>
>>>> Hi,
>>>> I am using train-model.perl with
>>>>
>>>> --extract-options="--IncludeSentenceId"
>>>>
>>>> and it seems that the sentence id is somehow getting into
>>>> the phrase
>>>> table as a count and later used for phrase translation weight
>>>> calculation, for instance the extract (last column is the Id):
>>>>
>>>> #c the compound or process ||| #c verbindung oder verfahren
>>>> ||| 0-0 2-1
>>>> 3-2 4-3 ||| 1374618
>>>> #c the compound or process ||| #c verbindung oder verfahren
>>>> ||| 0-0 2-1
>>>> 3-2 4-3 ||| 1374619
>>>> #c the compound or process ||| #c verbindung oder verfahren
>>>> ||| 0-0 2-1
>>>> 3-2 4-3 ||| 1374620
>>>> #c the compound or process ||| #c verbindung oder verfahren
>>>> ||| 0-0 2-1
>>>> 3-2 4-3 ||| 1374621
>>>> #c the compound or process ||| #c verbindung oder verfahren
>>>> ||| 0-0 2-1
>>>> 3-2 4-3 ||| 1374622
>>>> #c the compound or process ||| #c verbindung oder verfahren
>>>> ||| 0-0 2-1
>>>> 3-2 4-3 ||| 4587318
>>>>
>>>> results in a phrase table entry like this:
>>>>
>>>> #c the compound or process ||| #c verbindung oder verfahren
>>>> ||| 1
>>>> 0.0100206 5.23542e-07 0.524577 ||| 0-0 2-1 3-2 4-3 ||| 6
>>>> 1.14604e+07 6
>>>> ||| |||
>>>>
>>>> The count is equal to the sum of sentence ids, which of
>>>> course make the
>>>> phrase probability useless.
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Hieu Hoang
>>>> Research Associate
>>>> University of Edinburgh
>>>> http://www.hoang.co.uk/hieu
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140723/24bb09c2/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 93, Issue 27
*********************************************

0 Response to "Moses-support Digest, Vol 93, Issue 27"

Post a Comment