Moses-support Digest, Vol 95, Issue 14

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Incremental retraining (Sandipan Dandapat)
2. Replicating Och & Ney (2003) with GIZA++ (Robert ?stling)

----------------------------------------------------------------------

Message: 1
Date: Tue, 9 Sep 2014 10:45:13 +0100
From: Sandipan Dandapat <sandipandandapat@gmail.com>
Subject: Re: [Moses-support] Incremental retraining
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>, Ulrich Germann
<ulrich.germann@gmail.com>
Message-ID:
<CAGr2oZTS_pTBb_Cs3q-jmBDtxF45E1y8AMmbTDATMZEa=1kQwQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Hieu,
I am at the last step of the incremental training.
1. I have produced the alignment file for the incremental data.
2. I then just append the new-alignment file and the incremental data to
the original data and alignment file.
3. Furthermore I create the memory mapped suffix array phrase
table(mmsapt).

When I am using the new mmsapt information during decoding I am getting the
following error ( which was fine before adding the incremental data)

Created input-output object : [0.086] seconds
this is a test sentence
Translating line 0 in thread id 139739320297216
Translating: this is a test sentence
binary file loaded, default OFF_T: -1
Line 0: Initialize search took 0.064 seconds total
Alignment range error at sentence 53994!
4/7 6/5

Alignment range error at sentence 54530!
17/19 18/18

Alignment range error at sentence 50292!
0/13 9/9

Alignment range error at sentence 50120!
25/36 31/31

Alignment range error at sentence 55089!
11/27 19/19

terminate called recursively
terminate called after throwing an instance of 'terminate called recursively
terminate called recursively
util::Exception'
terminate called recursively
Aborted

I am not sure if I am doing anything wrong here?

Thanks and regards,
sandipan

On 8 September 2014 10:11, Sandipan Dandapat <sandipandandapat@gmail.com>
wrote:

> Hi,
> This worked.
> Thanks and regards,
> sandipan
>
> On 7 September 2014 16:09, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>
>> sorry, i meant
>>
>>
>> On 7 September 2014 16:08, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>
>>> you HAVE to change both
>>> num-features=7
>>> AND
>>> [weight]
>>> PT0= 0.1 0.2 0.3 0.4 0.5 0.6 0.7
>>>
>>>
>>> On 7 September 2014 16:06, Sandipan Dandapat <sandipandandapat@gmail.com
>>> > wrote:
>>>
>>>> Hi Hieu,
>>>> Even I tried with '7' and it fails with the error message
>>>>
>>>> Exception: moses/ScoreComponentCollection.cpp:248 in void
>>>> Moses::ScoreComponentCollection::Assign(const Moses::FeatureFunction*,
>>>> const std::vector<float>&) threw util::Exception'.
>>>> Feature function PT0 specified 7 dense scores or weights. Actually has 4
>>>>
>>>> In contrast, when I am using binarised pharse table, I use
>>>> num-features=4 and this works fine. I am attaching the Moses.ini file in
>>>> case I am doing anything wrong there.
>>>>
>>>> Thanks and regards,
>>>> sandipan
>>>>
>>>>
>>>> On 7 September 2014 15:46, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>>>
>>>>> maybe's it's 7 scores
>>>>>
>>>>>
>>>>> On 7 September 2014 14:59, Sandipan Dandapat <
>>>>> sandipandandapat@gmail.com> wrote:
>>>>>
>>>>>> Hi Hieu,
>>>>>> I also tried the same but generates the error below:
>>>>>>
>>>>>> Exception: moses/TranslationModel/UG/mmsapt.cpp:381 in virtual void
>>>>>> Moses::Mmsapt::Load() threw util::Exception because
>>>>>> `this->m_feature_names.size() != this->m_numScoreComponents'.
>>>>>> At moses/TranslationModel/UG/mmsapt.cpp:381: number of feature values
>>>>>> provided by Phrase table (7) does not match number specified in Moses
>>>>>> config file (4)!
>>>>>>
>>>>>> Thanks and regards,
>>>>>> sandipan
>>>>>>
>>>>>>
>>>>>> On 6 September 2014 09:50, Hieu Hoang <hieuhoang@gmail.com> wrote:
>>>>>>
>>>>>>> I'm not sure how many scores there are in the phrase table
>>>>>>> PhraseDictionaryBitextSampling
>>>>>>> It may be 4. In which case you must specify
>>>>>>>
>>>>>>> [feature]
>>>>>>> PhraseDictionaryBitextSampling name=PT0 num-features=4 ...
>>>>>>>
>>>>>>> [weight]
>>>>>>> PT0= 0.1 0.2 0.3 0.4
>>>>>>>
>>>>>>>
>>>>>>> On 05/09/14 14:12, Sandipan Dandapat wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>> During incremental retraining I specified the following line in
>>>>>>> moses .ini
>>>>>>> PhraseDictionaryBitextSampling name=PT0 output-factor=0
>>>>>>> num-features=9
>>>>>>> path=/home/sandipan/inc_retrain/MT_sys/EnPl/mtdata_pro/train. L1=en L2=pl
>>>>>>> pfwd=g pbwd=g smooth=0 sample=1000 workers=1
>>>>>>>
>>>>>>> this generates the error:
>>>>>>> Feature function PT0 specified 9 dense scores or weights. Actually
>>>>>>> has 0.
>>>>>>>
>>>>>>> which is solved when num-features is changed to '0'
>>>>>>> but generates the error below:
>>>>>>>
>>>>>>> Exception: moses/TranslationModel/UG/mmsapt.cpp:381 in virtual
>>>>>>> void Moses::Mmsapt::Load() threw util::Exception because
>>>>>>> `this->m_feature_names.size() != this->m_numScoreComponents'.
>>>>>>> At moses/TranslationModel/UG/mmsapt.cpp:381: number of feature
>>>>>>> values provided by Phrase table (7) does not match number specified in
>>>>>>> Moses config file (0)!
>>>>>>> Changing it to 7 also does not help.
>>>>>>>
>>>>>>> I have tried with
>>>>>>> Mmsapt name=PT0 output-factor=0 num-features=0
>>>>>>> base=/home/sandipan/inc_retrain/MT_sys/EnPl/mtdata_pro/train. L1=en L2=pl
>>>>>>>
>>>>>>> but does not work.
>>>>>>> What I need to do at this stage of retraining using moses?
>>>>>>>
>>>>>>> Thanks and regards,
>>>>>>> sandipan
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> Moses-support@mit.edu
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Hieu Hoang
>>>>> Research Associate
>>>>> University of Edinburgh
>>>>> http://www.hoang.co.uk/hieu
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Hieu Hoang
>>> Research Associate
>>> University of Edinburgh
>>> http://www.hoang.co.uk/hieu
>>>
>>>
>>
>>
>> --
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140909/4c2c82ea/attachment-0001.htm

------------------------------

Message: 2
Date: Tue, 09 Sep 2014 12:01:33 +0200
From: Robert ?stling <robert@ling.su.se>
Subject: [Moses-support] Replicating Och & Ney (2003) with GIZA++
To: moses-support@mit.edu
Message-ID: <540ECFFD.6030006@ling.su.se>
Content-Type: text/plain; charset=ISO-8859-1

Hello,

I'm working on word alignment, currently trying to replicate some
results from Och & Ney (2003) using GIZA++, as a sanity check before
using GIZA++ as a baseline for my own experiments.

The problem is that I am getting suspiciously poor results with GIZA++,
and I am unable to figure out why.

Below is a summary of what I did, if anyone wants more detailed
information please ask.

Regards,
Robert ?stling

Data:

WPT-03 version of the English-French Hansards corpus, using the first
128k sentences as training data, and the test/trial sets for evaluation.
This is lower-cased before fed to GIZA++, and I keep the original
tokenization. I did not filter out any sentence pairs, but GIZA++ only
reports a handful of sentences discarded.

GIZA++ chain (performed separately in both directions):

plain2snt.out XXX XXX

snt2cooc.out XXX XXX XXX

mkcls -c50 -n3 -pXXX -VXXX.classes
mkcls -c50 -n3 -pXXX -VXXX.classes

GIZA++ -compactalignmentformat 1 -s XXX -t XXX -c XXX -CoocurrenceFile
XXX -m1 5 -m2 0 -mh 5 -m3 3 -m4 10 -o XXX

Then I read (source index, target index) pairs from XXX.A3.final.

Results:

Here I use 128k training sentences, and intersection symmetrization,
although the trend is the same in other experiments.

1. Och & Ney (2003) report an AER of 6.3%.

2. My own implementation gets 6.1% (test), which is unsurprising and
indicates that the test setup is comparable to Och & Ney.

3. GIZA++ gets 8.9% (test).

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 95, Issue 14
*********************************************

Moses-support Digest, Vol 95, Issue 14

0 Response to "Moses-support Digest, Vol 95, Issue 14"

Post a Comment