Moses-support Digest, Vol 83, Issue 9

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Tuning and decoding of lattices in the new Moses. (Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Fri, 6 Sep 2013 23:47:56 +0200
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Tuning and decoding of lattices in the
new Moses.
To: Yulia Tsvetkov <yulia.tsvetkov@gmail.com>
Cc: "Moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <3FE91AB9-4501-4790-A0E0-64559B8E76C5@gmail.com>
Content-Type: text/plain; charset="us-ascii"

Good to know. I don't think it's obvious that you need that switch for lattice input. Maybe there should be a check of some sort in the mert scrip

Sent while bumping into things

On 6 Sep 2013, at 15:42, Yulia Tsvetkov <yulia.tsvetkov@gmail.com> wrote:

> Hi Hieu,
>
> A quick update: I should have used the --no-filter-phrase-table flag, otherwise phrase table gets filtered. Thanks a lot for our help!!!!
>
> Yulia
>
>
> On Wed, Sep 4, 2013 at 12:34 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>> Ok. If you're stil stuck please send me your phrase table and I'll try and debug it
>>
>> Sent while bumping into things
>>
>> On 4 Sep 2013, at 17:07, Yulia Tsvetkov <yulia.tsvetkov@gmail.com> wrote:
>>
>>> phrase table is not empty, it looks normal, here is the snippet:
>>>
>>> no one ||| aucun de ceux ||| 1 0.00157474 0.0060241 5.00684e-06 ||| 0-0 0-1 1-2 ||| 1 166 1
>>> no one ||| ce que personne ||| 0.5 3.7494e-05 0.0060241 5.89199e-06 ||| 0-0 1-2 ||| 2 166 1
>>> no one ||| il que personne ||| 1 9.31515e-05 0.0060241 1.11289e-05 ||| 0-0 1-2 ||| 1 166 1
>>> no one ||| n'est pas le seul ||| 0.0714286 0.0073779 0.0060241 4.54759e-07 ||| 0-0 0-1 1-3 ||| 14 166 1
>>> no one ||| on ne ||| 0.00444444 0.000152764 0.0060241 0.000497078 ||| 1-0 0-1 ||| 225 166 1
>>> no one ||| pas ||| 6.5066e-05 0.000267155 0.0060241 0.294497 ||| 0-0 ||| 15369 166 1
>>>
>>> i don't filter the phrase table...
>>>
>>> I'll debug more, and Chris was going to look at it too, I will send you an update.
>>>
>>> Thanks!
>>>
>>> Yulia
>>>
>>>
>>>
>>> On Wed, Sep 4, 2013 at 10:41 AM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>>> hmm, strange. the moses.ini file looks ok. There shouldn't be an issue with initialisation. Is the phrase-table empty?
>>>>
>>>> make sure you're not fitlering the phrase table, i don't think the filter script understand lattices
>>>>
>>>>
>>>>
>>>>
>>>> On 4 September 2013 15:10, Yulia Tsvetkov <yulia.tsvetkov@gmail.com> wrote:
>>>>> Hi Hieu,
>>>>>
>>>>>> did you manage to get moses working with lattices again? it would be nice to get some feedback
>>>>>
>>>>> Sorry for not sending feedback earlier -- I was just trying to debug by myself before I send feedback or ask next question...
>>>>>
>>>>> I was able to run a pipeline with the new settings, thanks a lot for the detailed answer!
>>>>>
>>>>> There is still a problem (with feature initialization?), here is the first lattice translation, looks like all input words are treated as OOVs (and they are not), and then MERT gets killed:
>>>>>
>>>>> BEST TRANSLATION: no|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK the|UNK|UNK|UNK intense|UNK|UNK|UNK closures|UNK|UNK|UNK of|UNK|UNK|UNK travel|UNK|UNK|UNK and|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK the|UNK|UNK|UNK delights|UNK|UNK|UNK of|UNK|UNK|UNK ethnographic|UNK|UNK|UNK research|UNK|UNK|UNK is|UNK|UNK|UNK the|UNK|UNK|UNK opportunity|UNK|UNK|UNK to|UNK|UNK|UNK live|UNK|UNK|UNK amongst|UNK|UNK|UNK those|UNK|UNK|UNK who|UNK|UNK|UNK have|UNK|UNK|UNK not|UNK|UNK|UNK forgotten|UNK|UNK|UNK the|UNK|UNK|UNK old|UNK|UNK|UNK ways|UNK|UNK|UNK to|UNK|UNK|UNK still|UNK|UNK|UNK feel|UNK|UNK|UNK their|UNK|UNK|UNK pass|UNK|UNK|UNK in|UNK|UNK|UNK the|UNK|UNK|UNK when|UNK|UNK|UNK touch|UNK|UNK|UNK and|UNK|UNK|UNK stones|UNK|UNK|UNK caused|UNK|UNK|UNK by|UNK|UNK|UNK rain|UNK|UNK|UNK tasted|UNK|UNK|UNK leaves|UNK|UNK|UNK of|UNK|UNK|UNK the|UNK|UNK|UNK bitter|UNK|UNK|UNK plants|UNK|UNK|UNK [1111111111111111111111111111111111111111111111111111111111111] [total=-6405.459] core=(-6100!
.000,-50.000,61.000,0.000,0.000,0.000,0.000,-8.000,-1952.355,0.000)
>>>>> Line 0: Translation took 0.000 seconds total
>>>>> Translating line 1 in thread id 47061808453376
>>>>> sh: line 1: 7333 Killed /home/ytsvetko/tools/mosesdecoder/bin/moses -config filtered/moses.ini -inputtype 2 -weight-overwrite 'InputFeature0= 0.066667 PhrasePenalty0= 0.066667 WordPenalty0= -0.333333 TranslationModel0= 0.066667 0.066667 0.066667 0.066667 Distortion0= 0.100000 LM0= 0.166667' -n-best-list run1.best100.out 100 -input-file /share/workhorse4/ytsvetko/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/tuning/corpus.en > run1.out
>>>>> Exit code: 137
>>>>> The decoder died. CONFIG WAS -weight-overwrite 'InputFeature0= 0.066667 PhrasePenalty0= 0.066667 WordPenalty0= -0.333333 TranslationModel0= 0.066667 0.066667 0.066667 0.066667 Distortion0= 0.100000 LM0= 0.166667'
>>>>>
>>>>> I attach my config file, and here is the exact command that I am executing:
>>>>>
>>>>> mert-moses.pl ./tuning/corpus.en ./tuning/corpus.fr /home/ytsvetko/tools/mosesdecoder/bin/moses ./moses.ini --working-dir ./tuning --mertdir /home/ytsvetko/tools/mosesdecoder/mert --inputtype 2
>>>>>
>>>>>
>>>>> Thanks a lot for your help!
>>>>> Yulia
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On 2 September 2013 17:03, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>>>>>> Hi Yulia
>>>>>>>
>>>>>>>
>>>>>>> On 1 September 2013 22:46, Yulia Tsvetkov <yulia.tsvetkov@gmail.com> wrote:
>>>>>>>> Dear Moses developers,
>>>>>>>>
>>>>>>>> I am trying to use the a new version of Moses, seems like things have changed quite a bit and I have hard time finding an up-to-date documentation. For debugging I used very small train/tune/test corpora (10 lines each).
>>>>>>>>
>>>>>>>> First thing is running the following command produces a phrase table with only 4 features:
>>>>>>>> train-model.perl --root-dir $root_dir --corpus $root_dir/$corpus_name --f $src_lng --e $trg_lng --alignment grow-diag-final --lm 0:3:$LM -external-bin-dir $external_bin_dir`;
>>>>>>>>
>>>>>>>> Here is a snippet from a produced moses.iniPhraseDictionaryMemory name=TranslationModel0 table-limit=20 num-features=4 path=/usr1/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/model/phrase-table.gz input-factor=0 output-factor=0
>>>>>>>
>>>>>>> Yes, the phrase-table now has 4 scores, instead of 5. The 5th score was a constant 2.718. This has now moved into it's own feature function, PhrasePenalty.
>>>>>>>
>>>>>>> it save 3% of disk space, and i think is better for research. eg. create better, non-constant phrase penalty feature functions, if we have 2 phrase tables do we need just 1 phrase penalty? etc.
>>>>>>>
>>>>>>>>
>>>>>>>> Second, I am trying to run tuning and decoding of lattices in plf format.
>>>>>>>> Can you point me to example commands and moses.ini for running mert and decoding lattices with the new Moses?
>>>>>>>
>>>>>>> an example ini file for lattices can be seen here
>>>>>>> https://github.com/moses-smt/moses-regression-tests/blob/master/tests/phrase.lattice-surface/moses.ini
>>>>>>>
>>>>>>> Mert should run like it has always did. However, if you upgrade the decoder, you should use the upgraded mert script too.
>>>>>>>
>>>>>>> Decoding with lattice is exactly the same as for a sentence, except 2 things
>>>>>>> 1. inputtype=2. This can be on the command line of in the ini file, eg.
>>>>>>> ./moses -inputtype 2
>>>>>>>
>>>>>>> or
>>>>>>> [inputtype]
>>>>>>> 2
>>>>>>>
>>>>>>> 2. You should use the InputFeature feature function. This is the score of the path through the lattice. You can see the InputFeature in the ini file:
>>>>>>> [feature]
>>>>>>> ....
>>>>>>> InputFeature num-features=1 num-input-features=1 real-word-count=0
>>>>>>>
>>>>>>> [weight]
>>>>>>> ...
>>>>>>> InputFeature0 = 1
>>>>>>>
>>>>>>> Before the refactoring, this was hacked into as an extra feature in the phrase-table
>>>>>>>>
>>>>>>>> So far I tried training and tuning on text files and decoding on lattices because I could not figure out the right settings for tuning.
>>>>>>>> According to some old documentation I am supposed to convert the phrase table to a binary format. Is it still needed?
>>>>>>>
>>>>>>> You no longer need to convert it to binary format. It's good to convert to binary format to save memory, but it is not required. Lattice decoding works with all phrase-table implmentations now
>>>>>>>>
>>>>>>>> When I ran it with the following command:
>>>>>>>> moses -inputtype 2 -weight-i 0.62 -weight-l 12.5 -f $tune_dir/moses.ini < $eval_dir/69.plf > $eval_dir/69.plf.out
>>>>>>>> I got an error:
>>>>>>>> Don't mix old and new ini file format
>>>>>>>> What is the new equivalent of weight-i and weight-l?
>>>>>>>
>>>>>>> -weight-i 0.62
>>>>>>> now becomes
>>>>>>> -weight-overwrite 'InputFeature0= 0.62'
>>>>>>>
>>>>>>> -weight-l 12.5
>>>>>>> now becomes
>>>>>>> -weight-overwrite 'LM0= 12.5'
>>>>>>>
>>>>>>> The updated mert script should be doing this anyway.
>>>>>>>>
>>>>>>>> Without those parameters I get a Segmentation Fault with both a .gz and a binary phrase table.
>>>>>>>
>>>>>>> if you're still having problems, give me your ini file and exact command you're executing and i'll try and debug it
>>>>>>>>
>>>>>>>> Could you help me figuring out the right settings?
>>>>>>>>
>>>>>>>> Thanks in advance.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Moses-support mailing list
>>>>>>>> Moses-support@mit.edu
>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Hieu Hoang
>>>>>>> Research Associate
>>>>>>> University of Edinburgh
>>>>>>> http://www.hoang.co.uk/hieu
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Hieu Hoang
>>>>>> Research Associate
>>>>>> University of Edinburgh
>>>>>> http://www.hoang.co.uk/hieu
>>>>
>>>>
>>>>
>>>> --
>>>> Hieu Hoang
>>>> Research Associate
>>>> University of Edinburgh
>>>> http://www.hoang.co.uk/hieu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130906/7198a44d/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 83, Issue 9
********************************************

0 Response to "Moses-support Digest, Vol 83, Issue 9"

Post a Comment