Moses-support Digest, Vol 88, Issue 69

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Binarising the phrase table (Hieu Hoang)
2. Re: EMS error: you need to define GENERAL:input-sgm
(Peter Kleiweg)
3. Very slow tuning with binarised kenlm language model
(Felipe S?nchez Mart?nez)
4. Re: EMS error: you need to define GENERAL:input-sgm
(Philipp Koehn)
5. Re: Very slow tuning with binarised kenlm language model
(Massinissa Ahmim)


----------------------------------------------------------------------

Message: 1
Date: Fri, 28 Feb 2014 10:16:07 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Binarising the phrase table
To: Per Tunedal <per.tunedal@operamail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbhubfXnYsodCkAN+GipF_gsYL2g5F5d9XdsG5rwBQoRUA@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

On 28 February 2014 06:47, Per Tunedal <per.tunedal@operamail.com> wrote:

>
> Hi,
> tried to binarise the phrase table and got in to trouble.
>
> 1. Error messages, as below. What's that?
> distinct source phrases: 439511 distinct first words of source phrases:
> 67600 number of phrase pairs (line count): 940438
> Count of lines with missing alignments: 0/940438
> WARNING: there are src voc entries with no phrase translation: count
> 2168
> There exists phrase translations for 65432 entries
>
> 2. Modify the moses.ini file. I've found this on the Moses/Baseline
> page:
> 1. Change PhraseDictionaryMemory to PhraseDictionaryBinary
> 2. Set the path of the PhraseDictionary feature to point to
> $HOME/working/train/binarised-model/Kryptering1.sv-fr.phrase-table
> 3. Set the path of the LexicalReordering feature to point to
> $HOME/working/train/binarised-model/Kryptering1.sv-fr.reordering-table
>
> But I cannot find any such entries in my moses.ini - maybe because I'm
> running a somewhat older version of Moses. I've found e.g. the following
> lines in my ini-file:
>
> [ttable-file]
> 0 0 0 5 /home/per/working/train/model/phrase-table.gz
>
> # distortion (reordering) files
> [distortion-file]
> 0-0 wbe-msd-bidirectional-fe-allff 6
> /home/per/working/train/model/reordering-table.wbe-msd-bidirectional-fe.gz
>
> How should I change the entries to use my binarised model?
>
The moses.ini file format has changed in the past year. The previous format
specified a phrase-table like so:
[ttable-file]
0 0 0 5 /home/per/working/train/model/phrase-table.gz
The equivalent in the new format is:
[feature]
PhraseDictionaryMemory input-factor=0 output-factor=0 num-features=5
path=/home/per/working/train/model/phrase-table.gz

I would urge you to update your moses installation to use the new format.

However, if you want to use the old version, to use the binary
phrase-table, change
[ttable-file]
0 0 0 5 /home/per/working/train/model/phrase-table.gz
to
[ttable-file]
1 0 0 5 /home/per/working/train/model/phrase-table.gz


> Yours,
> Per Tunedal
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140228/4e961d8b/attachment-0001.htm

------------------------------

Message: 2
Date: Fri, 28 Feb 2014 11:30:39 +0100 (CET)
From: Peter Kleiweg <p.c.j.kleiweg@rug.nl>
Subject: Re: [Moses-support] EMS error: you need to define
GENERAL:input-sgm
To: moses-support@mit.edu
Message-ID: <alpine.DEB.2.00.1402281125560.3348@pebbe>
Content-Type: TEXT/PLAIN; charset=US-ASCII

Philipp Koehn schreef op de 27e dag van de sprokkelmaand van het jaar 2014:

> Hi,
>
> since you are using the NIST BLEU scoring tool, you need to provide
> the following two files
>
> [EVALUATION:test]
> input-sgm = $data/test.$input-extension.sgm
> reference-sgm = $data/test.$output-extension.sgm

When I do that, it complains that it can't find the file. How
would I create it?

So the error message is wrong? It says I need input-sgm in
GENERAL, not in EVALUATION.

> If this is too much hassle for you, you can use multi-bleu instead of
> nist-bleu, just note that
> it relies on the tokenization used by your system, so changes in
> tokenization will lead
> to BLEU scores that are not comparable.


That is OK. I work with data that is already tokenized.

I am now using multi-bleu, and the experiment is running.

Thanks for the help.


>
> -phi
>
> On Thu, Feb 27, 2014 at 9:51 AM, Peter Kleiweg <p.c.j.kleiweg@rug.nl> wrote:
> >
> > Hi,
> >
> > I installed Moses, run through the steps in part 2 of the
> > manual, and got me a working translator.
> >
> > Now I try to do the same using the Experiment Management System.
> > I copied the configuration file config.toy, fixed some settings,
> > and run script/ems/experiment.perl -config config.data
> >
> > I get:
> >
> > STEP SUMMARY:
> > 64 CORPUS:toy:clean -> run
> > 60 CORPUS:toy:truecase -> run
> > 55 TRUECASER:consolidate -> run
> > 54 TRUECASER:train -> run
> > 52 LM:toy:truecase -> run
> > 50 LM:toy:train -> run
> > 47 LM:toy:binarize -> run
> > 46 TRAINING:consolidate -> run
> > 45 TRAINING:prepare-data -> run
> > 44 TRAINING:run-giza -> run
> > 43 TRAINING:run-giza-inverse -> run
> > 42 TRAINING:symmetrize-giza -> run
> > 41 TRAINING:build-lex-trans -> run
> > 38 TRAINING:extract-phrases -> run
> > 37 TRAINING:build-reordering -> run
> > 36 TRAINING:build-ttable -> run
> > 33 TRAINING:create-config -> run
> > 27 TUNING:truecase-input -> run
> > 25 TUNING:truecase-reference -> run
> > 23 TUNING:filter -> run
> > 22 TUNING:apply-filter -> run
> > 21 TUNING:tune -> run
> > 20 TUNING:apply-weights -> run
> > 15 EVALUATION:test:truecase-input -> run
> > 13 EVALUATION:test:filter -> run
> > 12 EVALUATION:test:apply-filter -> run
> > 11 EVALUATION:test:decode -> run
> > 10 EVALUATION:test:remove-markup -> run
> > 8 EVALUATION:test:detruecase-output -> run
> > 6 EVALUATION:test:wrap -> run
> > 4 EVALUATION:test:nist-bleu -> run
> > 3 EVALUATION:test:nist-bleu-c -> run
> > 2 EVALUATION:test:analysis -> run
> > 1 EVALUATION:test:analysis-coverage -> run
> > 0 REPORTING:report -> run
> >
> > DEFINE STEPS (run with -exec if everything ok)
> > ERROR: you need to define GENERAL:input-sgm
> >
> >
> > I can't find any documentation about a GENERAL:input-sgm.
> > What do I do now?
> >
> >
> > Here is my configuration file:
> >
> > http://www.let.rug.nl/~kleiweg/tmp/moses.config
> >
> >
> >
> > --
> > Peter
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>



--
Peter Kleiweg
http://pkleiweg.home.xs4all.nl/


------------------------------

Message: 3
Date: Fri, 28 Feb 2014 12:56:56 +0100
From: Felipe S?nchez Mart?nez <fsanchez@dlsi.ua.es>
Subject: [Moses-support] Very slow tuning with binarised kenlm
language model
To: moses-support <moses-support@mit.edu>
Message-ID: <53107988.9040809@dlsi.ua.es>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hello all,

I am tuning a system that uses a binarised kenlm language model. This
model was binarised with default parameters and is 22 GB in size after
binarisation.

The thing is that the language model and the (filtered) phrase table fit
into memory (32 GB ; so, no swaping) but moses is translating very very
slowly, it is only using around 15% of CPU.

Is there anything I can do to make it faster?

Thank you very much for your help
Regards
--
Felipe



------------------------------

Message: 4
Date: Fri, 28 Feb 2014 09:02:53 -0500
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] EMS error: you need to define
GENERAL:input-sgm
To: Peter Kleiweg <p.c.j.kleiweg@rug.nl>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDDopTDJH5-AbSkd83MV-RRgQc=Kzcmefq6BmC+WRS+JuQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

>> since you are using the NIST BLEU scoring tool, you need to provide
>> the following two files
>>
>> [EVALUATION:test]
>> input-sgm = $data/test.$input-extension.sgm
>> reference-sgm = $data/test.$output-extension.sgm
>
> When I do that, it complains that it can't find the file. How
> would I create it?

If you do not have these files, you have to build them.
This is basically the input and reference with some
fancy <xml> tags around it.

You can take a look at the WMT test sets (for instance
at http://www.statmt.org/wmt14/ ) to see the format
of these files.

> So the error message is wrong? It says I need input-sgm in
> GENERAL, not in EVALUATION.

All parameters are looked up from local to global scope.
So, input-sgm is checked for existance in
(1) EVALUATION:test
(2) EVALUATION
(3) GENERAL
I can see that this does not make for a very informative
error message...

>> If this is too much hassle for you, you can use multi-bleu instead of
>> nist-bleu, just note that
>> it relies on the tokenization used by your system, so changes in
>> tokenization will lead
>> to BLEU scores that are not comparable.
>
>
> That is OK. I work with data that is already tokenized.
>
> I am now using multi-bleu, and the experiment is running.

That's probably the easiest thing to do.

-phi


------------------------------

Message: 5
Date: Fri, 28 Feb 2014 15:36:14 +0100
From: Massinissa Ahmim <massinissa.ahmim@linguacustodia.com>
Subject: Re: [Moses-support] Very slow tuning with binarised kenlm
language model
To: fsanchez@dlsi.ua.es, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID:
<CANN0mWYNvEe_t9SB+7v4zPXfToSUBxkW-8iu6uYDFoC0eutaqw@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Felipe,

Tuning is usually very slow, it is recommended to use a muti-threaded mert
if you have several CPUs at your disposal

You can run a multi-threaded tuning by adding the option --decoder-flags
"-threads all" to your command

Regards

Massinissa


2014-02-28 12:56 GMT+01:00 Felipe S?nchez Mart?nez <fsanchez@dlsi.ua.es>:

> Hello all,
>
> I am tuning a system that uses a binarised kenlm language model. This
> model was binarised with default parameters and is 22 GB in size after
> binarisation.
>
> The thing is that the language model and the (filtered) phrase table fit
> into memory (32 GB ; so, no swaping) but moses is translating very very
> slowly, it is only using around 15% of CPU.
>
> Is there anything I can do to make it faster?
>
> Thank you very much for your help
> Regards
> --
> Felipe
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



--

[image: Description : Description : lingua_custodia_final full logo]

*The Translation Trustee*

*1, Place Charles de Gaulle*

*78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23 Mobile : +33 7 61 44 40 84*

*Email :* *massinissa.ahmim@linguacustodia.com
<massinissa.ahmim@linguacustodia.com>*

*Website :* *www.linguacustodia.com <http://www.linguacustodia.com/> -
www.thetranslationtrustee.com <http://www.thetranslationtrustee.com>*

? Pensez ? l'environnement, n'imprimez ce courriel que si n?cessaire.

Please do not print this email unless it is absolutely necessary. Spread
environmental awareness.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140228/cf9667de/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20140228/cf9667de/attachment.jpg

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 88, Issue 69
*********************************************

0 Response to "Moses-support Digest, Vol 88, Issue 69"

Post a Comment