Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Recaser - LM model loading (Tomas Fulajtar)
----------------------------------------------------------------------
Message: 1
Date: Wed, 19 Mar 2014 12:31:09 +0000
From: Tomas Fulajtar <TomasFu@moravia.com>
Subject: Re: [Moses-support] Recaser - LM model loading
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<0F546C639409F6479DF2CE6DD5267B16A040FB85@dag-cz-1.CZ.moravia-it.com>
Content-Type: text/plain; charset="us-ascii"
Hi Hieu,
Looking to log , the problem seems to be related to IRTSLM library and it code inside src/lmtable.cpp (function named loadtext_ram).
I have tried to return back to IRSTLM 5.80.01 and it resolved the issue with long LM loading. However as the issue might be reproducible by other people, I am wondering if we should report it to IRTSLM team and maybe add the comment to Moses wiki as well ( A see there is a comment about issues with IRSTLM source code in official repos and recommended to prefer 5.80.03, which unfortunately wont' work on my environment).
Kind regards,
Tomas
From: Tomas Fulajtar
Sent: Wednesday, March 19, 2014 9:58 AM
To: 'Hieu Hoang'
Cc: moses-support@mit.edu
Subject: RE: [Moses-support] Recaser - LM model loading
Hi Hieu,
Please find the Moses.ini attached.
The LM model is default 3-gram IRSTLM trained by command :
/opt/moses/scripts/recaser/train-recaser.perl --dir=$dir --lm=IRSTLM --build-lm=/usr/local/irstlm/bin/build-lm.sh --corpus=file --train-script=/opt/moses/scripts/training/train-model.perl.
I do not expect the problem is in LM preparation steps as we are using the same scripts for long time without issues.
Parameters of trained LM:
iARPA
\data\
ngram 1= 219165
ngram 2= 2616463
ngram 3= 7215865
The command issued for the recasing experiment:
echo 'some text to recase ' | moses -f recase/moses.ini
Response on Fedora (showing only the part with the LM data loading) :
Defined parameters (per moses.ini or switch):
config: moses.ini
distortion-limit: 6
input-factors: 0
lmodel-file: 1 0 3 /tmp/recase/cased.irstlm.gz
mapping: 0 T 0
ttable-file: 0 0 0 5 /tmp/recase/phrase-table.gz
ttable-limit: 20
weight-d: 0.6
weight-l: 0.5000
weight-t: 0.20 0.20 0.20 0.20 0.20
weight-w: -1
/var/www/moses/bin
ScoreProducer: Distortion start: 0 end: 1
ScoreProducer: WordPenalty start: 1 end: 2
ScoreProducer: !UnknownWordPenalty start: 2 end: 3
Loading lexical distortion models...have 0 models
Start loading LanguageModel /tmp/recase/cased.irstlm.gz : [0.009] seconds
In LanguageModelIRST::Load: nGramOrder = 3
Language Model Type of /tmp/recase/cased.irstlm.gz is 1
Language Model Type is 1
iARPA
loadtxt_ram()
1-grams: reading 219165 entries
done level1
2-grams: reading 2616463 entries
done level2
3-grams: reading 7215865 entries
.done level3
done
OOV code is 219164
OOV code is 219164
IRST: m_unknownId=219164
ScoreProducer: LM start: 3 end: 4
Finished loading LanguageModels : [34.666] seconds
...
Reponse on Suse:
Defined parameters (per moses.ini or switch):
config: recase/moses.ini
distortion-limit: 6
input-factors: 0
lmodel-file: 1 0 3 /home/sandy/retrain/recase/cased.irstlm.gz
mapping: 0 T 0
ttable-file: 0 0 0 5 /home/sandy/retrain/recase/phrase-table.gz
ttable-limit: 20
weight-d: 0.6
weight-l: 0.5000
weight-t: 0.20 0.20 0.20 0.20 0.20
weight-w: -1
ScoreProducer: Distortion start: 0 end: 1
ScoreProducer: WordPenalty start: 1 end: 2
ScoreProducer: !UnknownWordPenalty start: 2 end: 3
Loading lexical distortion models...have 0 models
Start loading LanguageModel /home/sandy/retrain/recase/cased.irstlm.gz : [0.001] seconds
In LanguageModelIRST::Load: nGramOrder = 3
Language Model Type of /home/sandy/retrain/recase/cased.irstlm.gz is 1
Language Model Type is 1
iARPA
loadtxt_ram()
1-grams: reading 219165 entries
done level 1
2-grams: reading 2616463 entries
done level 2
3-grams: reading 7215865 entries
.done level 3
done
OOV code is 219164
OOV code is 219164
IRST: m_unknownId=219164
ScoreProducer: LM start: 3 end: 4
Finished loading LanguageModels : [1045.969] seconds
...
As you can see the loading takes enormous 1045 seconds.
---
Meanwhile I have found there is also gcc 4.7 available in SUSE 11 SP3 SDK, thus I tried to recompile boost/irstlm/moses, but the results are almost same (it is faster by 200 sec due the optimization in compiler.)
Thus the last config on SUSE is following:
irstlm 5.80.03 - recompiled under gcc 4.7
mgiza 0.6.3 updated to 0.7.3 and recompiled under 4.7
boost 1.55 - recompiled under gcc 4.7
I have also attached the build.log in case it would be useful.
Today I am going to run regression tests to see if there are any particular issues found.
Tomas
From: hieuhoang@gmail.com<mailto:hieuhoang@gmail.com> [mailto:hieuhoang@gmail.com] On Behalf Of Hieu Hoang
Sent: Wednesday, March 19, 2014 1:36 AM
To: Tomas Fulajtar
Cc: moses-support@mit.edu<mailto:moses-support@mit.edu>
Subject: Re: [Moses-support] Recaser - LM model loading
What is a recaser LM? What command is taking 20 minutes? Can you send me the moses.ini file you're using.
On 17 March 2014 12:58, Tomas Fulajtar <TomasFu@moravia.com<mailto:TomasFu@moravia.com>> wrote:
Hello,
I am experiencing strange behavior when using recaser LM model after migrated to moses(1.0) compiled on different machine.
The problem is that loading of LM takes 20 minutes on my new machine (SUSE), while on previous it was 20 secs or so.
Machine 1: Fedora 18:
* gcc: 4.7.2
* perl 5.16
* moses 1.0
* irstlm 5.80.01
* mgiza 0.7.0
* boost 1.52
Machine 2: SUSE SLES 11 SP3
* perl: 5.10.0
* gcc: 4.3
* moses 1.0
* irstlm 5.80.03
* mgiza 0.6.3
* boost 1.55
Moses compilation command:
sudo ./bjam --prefix=/opt/moses --install-scripts=/opt/moses/scripts -j4 -a --with-irstlm=/usr/local/irstlm --with-xmlrpc-c=/usr/local --with-cmph=usr/local --with-boost=/opt/boost --with-giza=/usr/local/bin --enable-boost-pool --enable-optimization --debug-symbols=off toolset=gcc -d2 --debug-configuration --max-kenlm-order=7 |tee ~/build.log 2>&1
I have tested the speed using the same recaser IRSTLM model data in ARPA format . There is actually no error displayed, thus I wonder where to continue with debugging. Also tried to retrain model on SUSE and then test on Fedora, but the result is same (no error, but too slow on SUSE). Does anybody have idea where to look for resolution? Maybe the problem is in IRSTLM used?
Thank you,
Tomas Fulajtar
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu<mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140319/7d60a116/attachment.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 89, Issue 44
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 89, Issue 44"
Post a Comment