Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: EMS help (Barry Haddow)
2. Re: EMS help (Vincent Nguyen)
3. Re: EMS help (Vincent Nguyen)
4. Re: EMS help (Barry Haddow)
5. Re: EMS help (Barry Haddow)
----------------------------------------------------------------------
Message: 1
Date: Tue, 28 Jul 2015 10:51:24 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] EMS help
To: Vincent Nguyen <vnguyen@neuf.fr>, moses-support
<moses-support@mit.edu>
Message-ID: <55B7509C.6040709@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Hi Vincent
On 28/07/15 10:18, Vincent Nguyen wrote:
> Thanks Barry. Answers and other questions below.
>
> Le 28/07/2015 10:25, Barry Haddow a ?crit :
>> Hi Vincent
>>
>>> 2 bugs report :
>>> in the LM Corpus definition for Europarl : the $pair-extension is
>>> missing before .$output-extension
>>> in the step 5 (maybe for others too) generation of the
>>> moses.tuned.ini.5
>>> file there is a missing ".gz" at the end of phrase-table.5
>>> in the PhraseDictionaryMemory definition.
>> These seem OK to me. For europarl, it points to the monolingual
>> corpus, and for the phrase table the .gz is implicitly added. Did
>> they not work for you?
>
> I am NOT talking about the [CORPUS:europarl] section but
> the [LM:europarl] I think in this section you need the $pair-extension
> same as [LM:nc] where it was fine
> Anyway :yes I had an error
Europarl releases usually contain the parallel files (e.g.
europarl-v7.fr-en.fr) and monolingual files (e.g. europarl-v7.en).
>
> Also : when.gz is missing yes it stops and give an error message.
OK, this used to work.
>
>
>>
>>> I tried to remove the "IGNORE" for the Interpolated-LM section
>>> I am still using KenLM.
>>> BUT I get a message saying I need to define srilm-dir
>>> is SRILM mandatory to turn on the interpolated-lm with KenLM only ?
>> That's right, the interpolated LM uses some code from SRILM. You can
>> still use KenLM to create the individual language models, and use
>> KenLM during decoding,
>
> OK. But related question :
> If I do not interpolate, and if I keep the 2 (or more) LM in the
> moses.ini file
> does the decoder work similarly as if I had interpolated 2 LMs ?
If you do not interpolate using EMS then both LMs will be features in
the model - i.e. you get log-linear interpolation. See here for an early
comparison of linear and log-linear interpolation -
https://aclweb.org/anthology/W/W07/W07-0717.pdf - there has been other
work since then. Note that SRILM does not do linear interpolation correctly,
cheers - Barry
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
------------------------------
Message: 2
Date: Tue, 28 Jul 2015 12:14:12 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] EMS help
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>, moses-support
<moses-support@mit.edu>
Message-ID: <55B755F4.4000204@neuf.fr>
Content-Type: text/plain; charset="windows-1252"
>> I am NOT talking about the [CORPUS:europarl] section but
>> the [LM:europarl] I think in this section you need the $pair-extension
>> same as [LM:nc] where it was fine
>> Anyway :yes I had an error
>
> Europarl releases usually contain the parallel files (e.g.
> europarl-v7.fr-en.fr) and monolingual files (e.g. europarl-v7.en).
If I am not mistaken, http://www.statmt.org/wmt12/training-parallel.tgz
do not, this could be the reason.
>> Also : when.gz is missing yes it stops and give an error message.
>
> OK, this used to work.
Sorry I was not specific enough. the errors pops up when I use
./daemon.pl for the web translation. Could be just there ....
>>
>>> That's right, the interpolated LM uses some code from SRILM. You can
>>> still use KenLM to create the individual language models, and use
>>> KenLM during decoding,
>>
>> OK. But related question :
>> If I do not interpolate, and if I keep the 2 (or more) LM in the
>> moses.ini file
>> does the decoder work similarly as if I had interpolated 2 LMs ?
>
> If you do not interpolate using EMS then both LMs will be features in
> the model - i.e. you get log-linear interpolation. See here for an
> early comparison of linear and log-linear interpolation -
> https://aclweb.org/anthology/W/W07/W07-0717.pdf - there has been other
> work since then. Note that SRILM does not do linear interpolation
> correctly,
>
Many thanks.
Just as a general question.
The baseline Tuto mentions this at the end :
This gives me a BLEU score of 23.5 (in comparison, the best result at
WMT11 was 30.5 <http://matrix.statmt.org/matrix/systems_list/1669>,
although it should be cautioned that this uses NIST BLEU, which does its
own tokenisation, so there will be 1-2 points difference in the score
anyway)
The Baseline tuto is done with NewsCommentary_V8. I did it and got 22/23
as a Bleu score.
My EMS with the config.basic file gives me around 26 (EuroparlV7+NCv10)
test set out of NC2011.
Is the reason for my "low" score versus 30s because I am using KenLM
only ? am I missing something else ?
>
> cheers - Barry
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150728/2f7822da/attachment-0001.htm
------------------------------
Message: 3
Date: Tue, 28 Jul 2015 13:37:38 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] EMS help
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>, moses-support
<moses-support@mit.edu>
Message-ID: <55B76982.9080803@neuf.fr>
Content-Type: text/plain; charset=windows-1252; format=flowed
I don't know why but the binarize crashes see below ....
>
>> in my working directory I have 2 subdir,
>> "tuning" with inside moses.filtered.ini.5 moses.ini.5 moses.tuned.ini.5
>> and
>> "model" with inside moses.ini.5 (apparently this one does not have the
>> tuned weights)
>>
>> those in the tuning subdir : the "tuned" one moses.tuned.ini.5 generated
>> after the moses.ini.5 seems to point on phrase-table.5.gz not binarized
>> and the moses.5.ini seem to point on the binarized within
>> tuning/filtered.5/...
>> unclear to me on which one I should use.
> If you run EMS, there will be a filtered ini file inside the
> evaluation directory which can be used to translate the test set using
> the tuned weights. However this model is filtered for the test set, so
> you cannot use it on other sentences.
>
> If you want the full model binarised, then you should add:
>
> binarize-all = $moses-script-dir/training/binarize-model.perl
>
> to the [GENERAL] section of the EMS config and rerun EMS. In this case
> the moses.tuned.ini in tuning can be used to translate any sentences.
>
Executing:
/home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl
/home/moses/working/model/moses.bin.ini.6.tables
/home/moses/working/model/moses.ini.5 /dev/null -nofilter -Binarizer
/home/moses/mosesdecoder/bin/CreateOnDiskPt
Executing: mkdir -p /home/moses/working/model/moses.bin.ini.6.tables
Stripping XML...
Executing:
/home/moses/mosesdecoder/scripts/training/../generic/strip-xml.perl <
/dev/null > /home/moses/working/model/moses.bin.ini.6.tables/input.34384
pt:PhraseDictionaryMemory name=TranslationModel0 num-features=4
path=/home/moses/working/model/phrase-table.5 input-factor=0 output-factor=0
Considering factor 0
ro:LexicalReordering name=LexicalReordering0 num-features=6
type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
path=/home/moses/working/model/reordering-table.5.wbe-msd-bidirectional-fe.gz
Considering factor 0
Filtering files...
filtering /home/moses/working/model/phrase-table.5 ->
/home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1...
Executing: ln -s /home/moses/working/model/phrase-table.5.gz
/home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz
binarizing...
Executing: /home/moses/mosesdecoder/bin/CreateOnDiskPt
/home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz
/home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.bin
Usage: /home/moses/mosesdecoder/bin/CreateOnDiskPt numSourceFactors
numTargetFactors numScores tableLimit sortScoreIndex inputPath outputPath
Exit code: 1
Can't binarize at
/home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl
line 417.
Exit code: 1
binarising failed at
/home/moses/mosesdecoder/scripts/training/binarize-model.perl line 43.
------------------------------
Message: 4
Date: Tue, 28 Jul 2015 12:49:17 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] EMS help
To: Vincent Nguyen <vnguyen@neuf.fr>, moses-support
<moses-support@mit.edu>
Message-ID: <55B76C3D.4060307@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Hi Vincent
If you look at the error log, you will see:
> Usage: /home/moses/mosesdecoder/bin/CreateOnDiskPt numSourceFactors
> numTargetFactors numScores tableLimit sortScoreIndex inputPath outputPath
You are missing the first 5 arguments to CreateOnDiskPt, as given in
config.basic.
cheers - Barry
On 28/07/15 12:37, Vincent Nguyen wrote:
> I don't know why but the binarize crashes see below ....
>
>>
>>> in my working directory I have 2 subdir,
>>> "tuning" with inside moses.filtered.ini.5 moses.ini.5
>>> moses.tuned.ini.5
>>> and
>>> "model" with inside moses.ini.5 (apparently this one does not have the
>>> tuned weights)
>>>
>>> those in the tuning subdir : the "tuned" one moses.tuned.ini.5
>>> generated
>>> after the moses.ini.5 seems to point on phrase-table.5.gz not binarized
>>> and the moses.5.ini seem to point on the binarized within
>>> tuning/filtered.5/...
>>> unclear to me on which one I should use.
>> If you run EMS, there will be a filtered ini file inside the
>> evaluation directory which can be used to translate the test set
>> using the tuned weights. However this model is filtered for the test
>> set, so you cannot use it on other sentences.
>>
>> If you want the full model binarised, then you should add:
>>
>> binarize-all = $moses-script-dir/training/binarize-model.perl
>>
>> to the [GENERAL] section of the EMS config and rerun EMS. In this
>> case the moses.tuned.ini in tuning can be used to translate any
>> sentences.
>>
>
>
> Executing:
> /home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl
> /home/moses/working/model/moses.bin.ini.6.tables
> /home/moses/working/model/moses.ini.5 /dev/null -nofilter -Binarizer
> /home/moses/mosesdecoder/bin/CreateOnDiskPt
> Executing: mkdir -p /home/moses/working/model/moses.bin.ini.6.tables
> Stripping XML...
> Executing:
> /home/moses/mosesdecoder/scripts/training/../generic/strip-xml.perl <
> /dev/null > /home/moses/working/model/moses.bin.ini.6.tables/input.34384
> pt:PhraseDictionaryMemory name=TranslationModel0 num-features=4
> path=/home/moses/working/model/phrase-table.5 input-factor=0
> output-factor=0
> Considering factor 0
> ro:LexicalReordering name=LexicalReordering0 num-features=6
> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
> path=/home/moses/working/model/reordering-table.5.wbe-msd-bidirectional-fe.gz
>
> Considering factor 0
> Filtering files...
> filtering /home/moses/working/model/phrase-table.5 ->
> /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1...
> Executing: ln -s /home/moses/working/model/phrase-table.5.gz
> /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz
> binarizing...
> Executing: /home/moses/mosesdecoder/bin/CreateOnDiskPt
> /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz
> /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.bin
> Usage: /home/moses/mosesdecoder/bin/CreateOnDiskPt numSourceFactors
> numTargetFactors numScores tableLimit sortScoreIndex inputPath outputPath
> Exit code: 1
> Can't binarize at
> /home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl
> line 417.
> Exit code: 1
> binarising failed at
> /home/moses/mosesdecoder/scripts/training/binarize-model.perl line 43.
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
------------------------------
Message: 5
Date: Tue, 28 Jul 2015 13:47:40 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] EMS help
To: Vincent Nguyen <vnguyen@neuf.fr>, moses-support
<moses-support@mit.edu>
Message-ID: <55B779EC.6040606@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Hi Vincent
It could be a bug. Could you edit
mosesdecoder/scripts/ems/experiment.meta and change the line:
template: $binarize-all IN OUT -Binarizer $ttable-binarizer
to
template: $binarize-all IN OUT -Binarizer "$ttable-binarizer"
Note that I have added quotes. Then you'll have to delete the most
recent run, and re-run experiment.perl. If it works, fine. If it
doesn't, could you post the steps/6/TRAINING_binarize-config.6 script
(hopefully I got the name right - you may need to change the number)
cheers - Barry
On 28/07/15 13:11, Vincent Nguyen wrote:
> I know but this is what I have in my config.basic now:
> # conversion of rule table into binary on-disk format
> ttable-binarizer = "$moses-bin-dir/CreateOnDiskPt 1 1 4 100 2"
> binarize-all = $moses-script-dir/training/binarize-model.perl
>
> I don't where else I can add the 5 arguments or if I need to reference
> ttable-binarizer somewhere
>
>
> Le 28/07/2015 13:49, Barry Haddow a ?crit :
>> Hi Vincent
>>
>> If you look at the error log, you will see:
>>
>>> Usage: /home/moses/mosesdecoder/bin/CreateOnDiskPt numSourceFactors
>>> numTargetFactors numScores tableLimit sortScoreIndex inputPath
>>> outputPath
>> You are missing the first 5 arguments to CreateOnDiskPt, as given in
>> config.basic.
>>
>> cheers - Barry
>>
>> On 28/07/15 12:37, Vincent Nguyen wrote:
>>> I don't know why but the binarize crashes see below ....
>>>
>>>>
>>>>> in my working directory I have 2 subdir,
>>>>> "tuning" with inside moses.filtered.ini.5 moses.ini.5
>>>>> moses.tuned.ini.5
>>>>> and
>>>>> "model" with inside moses.ini.5 (apparently this one does not have
>>>>> the
>>>>> tuned weights)
>>>>>
>>>>> those in the tuning subdir : the "tuned" one moses.tuned.ini.5
>>>>> generated
>>>>> after the moses.ini.5 seems to point on phrase-table.5.gz not
>>>>> binarized
>>>>> and the moses.5.ini seem to point on the binarized within
>>>>> tuning/filtered.5/...
>>>>> unclear to me on which one I should use.
>>>> If you run EMS, there will be a filtered ini file inside the
>>>> evaluation directory which can be used to translate the test set
>>>> using the tuned weights. However this model is filtered for the
>>>> test set, so you cannot use it on other sentences.
>>>>
>>>> If you want the full model binarised, then you should add:
>>>>
>>>> binarize-all = $moses-script-dir/training/binarize-model.perl
>>>>
>>>> to the [GENERAL] section of the EMS config and rerun EMS. In this
>>>> case the moses.tuned.ini in tuning can be used to translate any
>>>> sentences.
>>>>
>>>
>>>
>>> Executing:
>>> /home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl /home/moses/working/model/moses.bin.ini.6.tables
>>> /home/moses/working/model/moses.ini.5 /dev/null -nofilter
>>> -Binarizer /home/moses/mosesdecoder/bin/CreateOnDiskPt
>>> Executing: mkdir -p /home/moses/working/model/moses.bin.ini.6.tables
>>> Stripping XML...
>>> Executing:
>>> /home/moses/mosesdecoder/scripts/training/../generic/strip-xml.perl
>>> < /dev/null >
>>> /home/moses/working/model/moses.bin.ini.6.tables/input.34384
>>> pt:PhraseDictionaryMemory name=TranslationModel0 num-features=4
>>> path=/home/moses/working/model/phrase-table.5 input-factor=0
>>> output-factor=0
>>> Considering factor 0
>>> ro:LexicalReordering name=LexicalReordering0 num-features=6
>>> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
>>> path=/home/moses/working/model/reordering-table.5.wbe-msd-bidirectional-fe.gz
>>>
>>> Considering factor 0
>>> Filtering files...
>>> filtering /home/moses/working/model/phrase-table.5 ->
>>> /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1...
>>>
>>> Executing: ln -s /home/moses/working/model/phrase-table.5.gz
>>> /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz
>>>
>>> binarizing...
>>> Executing: /home/moses/mosesdecoder/bin/CreateOnDiskPt
>>> /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz
>>> /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.bin
>>>
>>> Usage: /home/moses/mosesdecoder/bin/CreateOnDiskPt numSourceFactors
>>> numTargetFactors numScores tableLimit sortScoreIndex inputPath
>>> outputPath
>>> Exit code: 1
>>> Can't binarize at
>>> /home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl line
>>> 417.
>>> Exit code: 1
>>> binarising failed at
>>> /home/moses/mosesdecoder/scripts/training/binarize-model.perl line 43.
>>>
>>
>>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 105, Issue 60
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 105, Issue 60"
Post a Comment