Moses-support Digest, Vol 89, Issue 49

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Loading a binary rule table (Massinissa Ahmim)
2. Re: Recaser - LM model loading (Tomas Fulajtar)


----------------------------------------------------------------------

Message: 1
Date: Thu, 20 Mar 2014 12:20:10 +0100
From: Massinissa Ahmim <massinissa.ahmim@linguacustodia.com>
Subject: [Moses-support] Loading a binary rule table
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CANN0mWYCahxdUokojh6+1r4h5K0++LvMwZHFtUVx8RyGv-v_wQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Dear all,

I've managed to train a hierarchical model using the following command :

nohup /mosesdecoder/scripts/training/train-model.perl --hierarchical
--glue-grammar --score-options="--GoodTuring" -root-dir
/disque2/Preparation/syntactic/hierarchical_PSCT -corpus
/disque2/Preparation/backoff/PSCT.tok.uni.low -f fr -e en -lm
0:5:/disque2/Preparation/syntactic/hierarchical/LM.sur.en.blm
-external-bin-dir /root/external-bin-dir/ -mgiza -mgiza-cpus 30 >&
training.out &

The resulting model works great but it is very slow.

So I used CreatOnDiskPt to binarise the rule table as follows :

/home/Moses/mosesdecoder/bin/CreateOnDiskPt 1 1 5 20 2 model/rule-table.gz
rules-table

This outputs the following files

-rw-r--r-- 1 root root 85 20 mars 11:35 Misc.dat
-rw-r--r-- 1 root root 781509322 20 mars 11:35 Source.dat
-rw-r--r-- 1 root root 1758588137 20 mars 11:35 TargetColl.dat
-rw-r--r-- 1 root root 1555353459 20 mars 11:35 TargetInd.dat
-rw-r--r-- 1 root root 400714 20 mars 11:35 Vocab.dat

and I've updated my moses.ini as follows :

6 0 0 1 /disque2/Preparation/syntactic/hierarchical_PSCT/rule-table
6 0 0 1 /disque2/Preparation/syntactic/hierarchical_PSCT/model/glue-grammar

But as I try to use it, I'm getting this :

Defined parameters (per moses.ini or switch):
config: moses.ini
cube-pruning-pop-limit: 1000
input-factors: 0
inputtype: 3
lmodel-file: 0 0 5
/disque2/Preparation/syntactic/hierarchical/LM.sur.en.blm
mapping: 0 T 0 1 T 1
max-chart-span: 20 1000
non-terminals: X
search-algorithm: 3
ttable-file: 6 0 0 1
/disque2/Preparation/syntactic/hierarchical_PSCT/rule-table 6 0 0 1
/disque2/Preparation/syntactic/hierarchical_PSCT/model/glue-grammar
ttable-limit: 20
weight-l: 0.5000
weight-t: 0.20 0.20 0.20 0.20 0.20 1.0
weight-w: -1
/mosesdecoder/bin
ScoreProducer: WordPenalty start: 0 end: 1
ScoreProducer: !UnknownWordPenalty start: 1 end: 2
Loading lexical distortion models...have 0 models
Start loading LanguageModel
/disque2/Preparation/syntactic/hierarchical/LM.sur.en.blm : [0.000] seconds
/disque2/Preparation/syntactic/hierarchical/LM.sur.en.blm: line 80492:
reached EOF before \end\
ScoreProducer: LM start: 2 end: 3
Finished loading LanguageModels : [0.056] seconds
Using uniform ttable-limit of 20 for all translation tables.
Start loading PhraseTable
/disque2/Preparation/syntactic/hierarchical_PSCT/rule-table : [0.056]
seconds
filePath: /disque2/Preparation/syntactic/hierarchical_PSCT/rule-table
ScoreProducer: PhraseModel start: 3 end: 4
Start loading PhraseTable
/disque2/Preparation/syntactic/hierarchical_PSCT/model/glue-grammar :
[0.056] seconds
filePath:
/disque2/Preparation/syntactic/hierarchical_PSCT/model/glue-grammar
ScoreProducer: PhraseModel:2 start: 4 end: 5
Finished loading phrase tables : [0.056] seconds
max-chart-span: 20
max-chart-span: 1000
Start loading phrase table from
/disque2/Preparation/syntactic/hierarchical_PSCT/rule-table : [0.056]
seconds
Can't read /disque2/Preparation/syntactic/hierarchical_PSCT/rule-table

So I wondering if I did something wrong with my training
command/binarisation or with the parameters in the moses.ini


Many thanks

Regads

MA




--

[image: Description : Description : lingua_custodia_final full logo]

*The Translation Trustee*

*1, Place Charles de Gaulle*

*78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23 Mobile : +33 7 61 44 40 84*

*Email :* *massinissa.ahmim@linguacustodia.com
<massinissa.ahmim@linguacustodia.com>*

*Website :* *www.linguacustodia.com <http://www.linguacustodia.com/> -
www.thetranslationtrustee.com <http://www.thetranslationtrustee.com>*

? Pensez ? l'environnement, n'imprimez ce courriel que si n?cessaire.

Please do not print this email unless it is absolutely necessary. Spread
environmental awareness.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140320/025a9f8c/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20140320/025a9f8c/attachment-0001.jpg

------------------------------

Message: 2
Date: Wed, 19 Mar 2014 08:58:05 +0000
From: Tomas Fulajtar <TomasFu@moravia.com>
Subject: Re: [Moses-support] Recaser - LM model loading
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<0F546C639409F6479DF2CE6DD5267B16A040FB3A@dag-cz-1.CZ.moravia-it.com>
Content-Type: text/plain; charset="us-ascii"

Hi Hieu,

Please find the Moses.ini attached.

The LM model is default 3-gram IRSTLM trained by command :
/opt/moses/scripts/recaser/train-recaser.perl --dir=$dir --lm=IRSTLM --build-lm=/usr/local/irstlm/bin/build-lm.sh --corpus=file --train-script=/opt/moses/scripts/training/train-model.perl.

I do not expect the problem is in LM preparation steps as we are using the same scripts for long time without issues.
Parameters of trained LM:
iARPA

\data\
ngram 1= 219165
ngram 2= 2616463
ngram 3= 7215865


The command issued for the recasing experiment:
echo 'some text to recase ' | moses -f recase/moses.ini

Response on Fedora (showing only the part with the LM data loading) :

Defined parameters (per moses.ini or switch):
config: moses.ini
distortion-limit: 6
input-factors: 0
lmodel-file: 1 0 3 /tmp/recase/cased.irstlm.gz
mapping: 0 T 0
ttable-file: 0 0 0 5 /tmp/recase/phrase-table.gz
ttable-limit: 20
weight-d: 0.6
weight-l: 0.5000
weight-t: 0.20 0.20 0.20 0.20 0.20
weight-w: -1
/var/www/moses/bin
ScoreProducer: Distortion start: 0 end: 1
ScoreProducer: WordPenalty start: 1 end: 2
ScoreProducer: !UnknownWordPenalty start: 2 end: 3
Loading lexical distortion models...have 0 models
Start loading LanguageModel /tmp/recase/cased.irstlm.gz : [0.009] seconds
In LanguageModelIRST::Load: nGramOrder = 3
Language Model Type of /tmp/recase/cased.irstlm.gz is 1
Language Model Type is 1
iARPA
loadtxt_ram()
1-grams: reading 219165 entries
done level1
2-grams: reading 2616463 entries
done level2
3-grams: reading 7215865 entries
.done level3
done
OOV code is 219164
OOV code is 219164
IRST: m_unknownId=219164
ScoreProducer: LM start: 3 end: 4
Finished loading LanguageModels : [34.666] seconds
...

Reponse on Suse:
Defined parameters (per moses.ini or switch):
config: recase/moses.ini
distortion-limit: 6
input-factors: 0
lmodel-file: 1 0 3 /home/sandy/retrain/recase/cased.irstlm.gz
mapping: 0 T 0
ttable-file: 0 0 0 5 /home/sandy/retrain/recase/phrase-table.gz
ttable-limit: 20
weight-d: 0.6
weight-l: 0.5000
weight-t: 0.20 0.20 0.20 0.20 0.20
weight-w: -1

ScoreProducer: Distortion start: 0 end: 1
ScoreProducer: WordPenalty start: 1 end: 2
ScoreProducer: !UnknownWordPenalty start: 2 end: 3
Loading lexical distortion models...have 0 models
Start loading LanguageModel /home/sandy/retrain/recase/cased.irstlm.gz : [0.001] seconds
In LanguageModelIRST::Load: nGramOrder = 3
Language Model Type of /home/sandy/retrain/recase/cased.irstlm.gz is 1
Language Model Type is 1
iARPA
loadtxt_ram()
1-grams: reading 219165 entries
done level 1
2-grams: reading 2616463 entries
done level 2
3-grams: reading 7215865 entries
.done level 3
done
OOV code is 219164
OOV code is 219164
IRST: m_unknownId=219164
ScoreProducer: LM start: 3 end: 4
Finished loading LanguageModels : [1045.969] seconds
...

As you can see the loading takes enormous 1045 seconds.

---
Meanwhile I have found there is also gcc 4.7 available in SUSE 11 SP3 SDK, thus I tried to recompile boost/irstlm/moses, but the results are almost same (it is faster by 200 sec due the optimization in compiler.)

Thus the last config on SUSE is following:

irstlm 5.80.03 - recompiled under gcc 4.7
mgiza 0.6.3 updated to 0.7.3 and recompiled under 4.7
boost 1.55 - recompiled under gcc 4.7

I have also attached the build.log in case it would be useful.

Today I am going to run regression tests to see if there are any particular issues found.


Tomas


From: hieuhoang@gmail.com [mailto:hieuhoang@gmail.com] On Behalf Of Hieu Hoang
Sent: Wednesday, March 19, 2014 1:36 AM
To: Tomas Fulajtar
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Recaser - LM model loading

What is a recaser LM? What command is taking 20 minutes? Can you send me the moses.ini file you're using.


On 17 March 2014 12:58, Tomas Fulajtar <TomasFu@moravia.com<mailto:TomasFu@moravia.com>> wrote:
Hello,

I am experiencing strange behavior when using recaser LM model after migrated to moses(1.0) compiled on different machine.
The problem is that loading of LM takes 20 minutes on my new machine (SUSE), while on previous it was 20 secs or so.

Machine 1: Fedora 18:

* gcc: 4.7.2

* perl 5.16

* moses 1.0

* irstlm 5.80.01

* mgiza 0.7.0

* boost 1.52

Machine 2: SUSE SLES 11 SP3


* perl: 5.10.0

* gcc: 4.3

* moses 1.0

* irstlm 5.80.03

* mgiza 0.6.3

* boost 1.55

Moses compilation command:

sudo ./bjam --prefix=/opt/moses --install-scripts=/opt/moses/scripts -j4 -a --with-irstlm=/usr/local/irstlm --with-xmlrpc-c=/usr/local --with-cmph=usr/local --with-boost=/opt/boost --with-giza=/usr/local/bin --enable-boost-pool --enable-optimization --debug-symbols=off toolset=gcc -d2 --debug-configuration --max-kenlm-order=7 |tee ~/build.log 2>&1

I have tested the speed using the same recaser IRSTLM model data in ARPA format . There is actually no error displayed, thus I wonder where to continue with debugging. Also tried to retrain model on SUSE and then test on Fedora, but the result is same (no error, but too slow on SUSE). Does anybody have idea where to look for resolution? Maybe the problem is in IRSTLM used?


Thank you,

Tomas Fulajtar



_______________________________________________
Moses-support mailing list
Moses-support@mit.edu<mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140319/740be1f6/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: moses.ini
Type: application/octet-stream
Size: 1094 bytes
Desc: moses.ini
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20140319/740be1f6/attachment.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: build.log
Type: application/octet-stream
Size: 544069 bytes
Desc: build.log
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20140319/740be1f6/attachment-0001.obj

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 89, Issue 49
*********************************************

0 Response to "Moses-support Digest, Vol 89, Issue 49"

Post a Comment