Moses-support Digest, Vol 93, Issue 26

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Phrase extraction with --IncludeSentenceId messes up phrase
table counts (Marcin Junczys-Dowmunt)
2. Re: Phrase extraction with --IncludeSentenceId messes up
phrase table counts (Barry Haddow)
3. Re: Phrase extraction with --IncludeSentenceId messes up
phrase table counts (Hieu Hoang)
4. Re: Phrase extraction with --IncludeSentenceId messes up
phrase table counts (Marcin Junczys-Dowmunt)
5. Re: Phrase extraction with --IncludeSentenceId messes up
phrase table counts (Marcin Junczys-Dowmunt)


----------------------------------------------------------------------

Message: 1
Date: Wed, 23 Jul 2014 12:36:44 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: [Moses-support] Phrase extraction with --IncludeSentenceId
messes up phrase table counts
To: moses-support <moses-support@mit.edu>
Message-ID: <53CF903C.8020305@amu.edu.pl>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hi,
I am using train-model.perl with

--extract-options="--IncludeSentenceId"

and it seems that the sentence id is somehow getting into the phrase
table as a count and later used for phrase translation weight
calculation, for instance the extract (last column is the Id):

#c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
3-2 4-3 ||| 1374618
#c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
3-2 4-3 ||| 1374619
#c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
3-2 4-3 ||| 1374620
#c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
3-2 4-3 ||| 1374621
#c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
3-2 4-3 ||| 1374622
#c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
3-2 4-3 ||| 4587318

results in a phrase table entry like this:

#c the compound or process ||| #c verbindung oder verfahren ||| 1
0.0100206 5.23542e-07 0.524577 ||| 0-0 2-1 3-2 4-3 ||| 6 1.14604e+07 6
||| |||

The count is equal to the sum of sentence ids, which of course make the
phrase probability useless.



------------------------------

Message: 2
Date: Wed, 23 Jul 2014 12:01:16 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Phrase extraction with
--IncludeSentenceId messes up phrase table counts
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>, moses-support
<moses-support@mit.edu>
Message-ID: <53CF95FC.6090505@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Marcin

There's a facility to include a weight in the extract file, which is
then used in phrase scoring. Somehow this appears to have got mixed up
with the sentence id. The problem of not having meta data.

cheers - Barry

On 23/07/14 11:36, Marcin Junczys-Dowmunt wrote:
> Hi,
> I am using train-model.perl with
>
> --extract-options="--IncludeSentenceId"
>
> and it seems that the sentence id is somehow getting into the phrase
> table as a count and later used for phrase translation weight
> calculation, for instance the extract (last column is the Id):
>
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374618
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374619
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374620
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374621
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374622
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 4587318
>
> results in a phrase table entry like this:
>
> #c the compound or process ||| #c verbindung oder verfahren ||| 1
> 0.0100206 5.23542e-07 0.524577 ||| 0-0 2-1 3-2 4-3 ||| 6 1.14604e+07 6
> ||| |||
>
> The count is equal to the sum of sentence ids, which of course make the
> phrase probability useless.
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 3
Date: Wed, 23 Jul 2014 12:11:16 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Phrase extraction with
--IncludeSentenceId messes up phrase table counts
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbhmEwVUUp9e8dzDtMmvqHn=EkU1ii=Y3Mq8373Dvu0uFg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

i can change it so that the sentence id is put into a key-value field in
the last column.

what is the sentence id used for? is it just for debugging purposes?


On 23 July 2014 11:36, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:

> Hi,
> I am using train-model.perl with
>
> --extract-options="--IncludeSentenceId"
>
> and it seems that the sentence id is somehow getting into the phrase
> table as a count and later used for phrase translation weight
> calculation, for instance the extract (last column is the Id):
>
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374618
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374619
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374620
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374621
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374622
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 4587318
>
> results in a phrase table entry like this:
>
> #c the compound or process ||| #c verbindung oder verfahren ||| 1
> 0.0100206 5.23542e-07 0.524577 ||| 0-0 2-1 3-2 4-3 ||| 6 1.14604e+07 6
> ||| |||
>
> The count is equal to the sum of sentence ids, which of course make the
> phrase probability useless.
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140723/ca8b3358/attachment-0001.htm

------------------------------

Message: 4
Date: Wed, 23 Jul 2014 13:12:18 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Phrase extraction with
--IncludeSentenceId messes up phrase table counts
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <53CF9892.50309@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"

I was planning to use it for a custom feature function later.

W dniu 23.07.2014 13:11, Hieu Hoang pisze:
> i can change it so that the sentence id is put into a key-value field
> in the last column.
>
> what is the sentence id used for? is it just for debugging purposes?
>
>
> On 23 July 2014 11:36, Marcin Junczys-Dowmunt <junczys@amu.edu.pl
> <mailto:junczys@amu.edu.pl>> wrote:
>
> Hi,
> I am using train-model.perl with
>
> --extract-options="--IncludeSentenceId"
>
> and it seems that the sentence id is somehow getting into the phrase
> table as a count and later used for phrase translation weight
> calculation, for instance the extract (last column is the Id):
>
> #c the compound or process ||| #c verbindung oder verfahren |||
> 0-0 2-1
> 3-2 4-3 ||| 1374618
> #c the compound or process ||| #c verbindung oder verfahren |||
> 0-0 2-1
> 3-2 4-3 ||| 1374619
> #c the compound or process ||| #c verbindung oder verfahren |||
> 0-0 2-1
> 3-2 4-3 ||| 1374620
> #c the compound or process ||| #c verbindung oder verfahren |||
> 0-0 2-1
> 3-2 4-3 ||| 1374621
> #c the compound or process ||| #c verbindung oder verfahren |||
> 0-0 2-1
> 3-2 4-3 ||| 1374622
> #c the compound or process ||| #c verbindung oder verfahren |||
> 0-0 2-1
> 3-2 4-3 ||| 4587318
>
> results in a phrase table entry like this:
>
> #c the compound or process ||| #c verbindung oder verfahren ||| 1
> 0.0100206 5.23542e-07 0.524577 ||| 0-0 2-1 3-2 4-3 ||| 6 1.14604e+07 6
> ||| |||
>
> The count is equal to the sum of sentence ids, which of course
> make the
> phrase probability useless.
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140723/b6d129ee/attachment-0001.htm

------------------------------

Message: 5
Date: Wed, 23 Jul 2014 13:15:37 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Phrase extraction with
--IncludeSentenceId messes up phrase table counts
To: moses-support@mit.edu
Message-ID: <53CF9959.1030202@amu.edu.pl>
Content-Type: text/plain; charset="iso-8859-1"

Key-value format would actually be fine.

W dniu 23.07.2014 13:12, Marcin Junczys-Dowmunt pisze:
> I was planning to use it for a custom feature function later.
>
> W dniu 23.07.2014 13:11, Hieu Hoang pisze:
>> i can change it so that the sentence id is put into a key-value field
>> in the last column.
>>
>> what is the sentence id used for? is it just for debugging purposes?
>>
>>
>> On 23 July 2014 11:36, Marcin Junczys-Dowmunt <junczys@amu.edu.pl
>> <mailto:junczys@amu.edu.pl>> wrote:
>>
>> Hi,
>> I am using train-model.perl with
>>
>> --extract-options="--IncludeSentenceId"
>>
>> and it seems that the sentence id is somehow getting into the phrase
>> table as a count and later used for phrase translation weight
>> calculation, for instance the extract (last column is the Id):
>>
>> #c the compound or process ||| #c verbindung oder verfahren |||
>> 0-0 2-1
>> 3-2 4-3 ||| 1374618
>> #c the compound or process ||| #c verbindung oder verfahren |||
>> 0-0 2-1
>> 3-2 4-3 ||| 1374619
>> #c the compound or process ||| #c verbindung oder verfahren |||
>> 0-0 2-1
>> 3-2 4-3 ||| 1374620
>> #c the compound or process ||| #c verbindung oder verfahren |||
>> 0-0 2-1
>> 3-2 4-3 ||| 1374621
>> #c the compound or process ||| #c verbindung oder verfahren |||
>> 0-0 2-1
>> 3-2 4-3 ||| 1374622
>> #c the compound or process ||| #c verbindung oder verfahren |||
>> 0-0 2-1
>> 3-2 4-3 ||| 4587318
>>
>> results in a phrase table entry like this:
>>
>> #c the compound or process ||| #c verbindung oder verfahren ||| 1
>> 0.0100206 5.23542e-07 0.524577 ||| 0-0 2-1 3-2 4-3 ||| 6
>> 1.14604e+07 6
>> ||| |||
>>
>> The count is equal to the sum of sentence ids, which of course
>> make the
>> phrase probability useless.
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> --
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140723/23d30088/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 93, Issue 26
*********************************************

0 Response to "Moses-support Digest, Vol 93, Issue 26"

Post a Comment