Send Moses-support mailing list submissions to
	moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
	http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
	moses-support-request@mit.edu
You can reach the person managing the list at
	moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
   1. Re: Recaser - LM model loading (Hieu Hoang)
----------------------------------------------------------------------
Message: 1
Date: Wed, 19 Mar 2014 23:09:52 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Recaser - LM model loading
To: Tomas Fulajtar <TomasFu@moravia.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
	<CAEKMkbiezAGoojcJaEpkSCrUX8o9_ybD3vu3h4j=SRHX1tXcPQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
You seem to be using the text LM. This will take a long time to load,
especially if it's over a network. It should matter what linux distribution
you're using.
You should:
  1. Make sure your files are on local disks
  2. Binarize the LM with KenLM or IRSTLM. Also, binarize the phrase tables
  3. If it's a recaser, the distortion limit [distortion-limit] should be
0. Otherwise the recaser can reorder the output.
Also, you should consider updating your version of Moses. This will allow
you to use IRSTLM 5.80.03. There's various changes to make it more
extensible, faster and more reliable.
On 19 March 2014 12:31, Tomas Fulajtar <TomasFu@moravia.com> wrote:
>  Hi Hieu,
>
>
>
> Looking to log , the problem seems to be related to IRTSLM library and it
> code inside src/lmtable.cpp (function named loadtext_ram).
>
>
>
> I have tried to return back to IRSTLM 5.80.01 and it resolved the issue
> with long LM loading.   However as the issue might be reproducible by other
> people,  I am wondering if we should report it to IRTSLM team  and maybe
> add the comment to Moses wiki as well ( A see there is a comment about
> issues with IRSTLM source code in official repos and recommended to prefer
>  5.80.03, which unfortunately wont' work on my environment).
>
>
>
> Kind regards,
>
>
>
>
>
> Tomas
>
>
>
> *From:* Tomas Fulajtar
> *Sent:* Wednesday, March 19, 2014 9:58 AM
> *To:* 'Hieu Hoang'
> *Cc:* moses-support@mit.edu
> *Subject:* RE: [Moses-support] Recaser - LM model loading
>
>
>
> Hi Hieu,
>
>
>
> Please find the Moses.ini attached.
>
>
>
> The LM model is  default   3-gram IRSTLM  trained by  command :
>
>  /opt/moses/scripts/recaser/train-recaser.perl --dir=$dir  --lm=IRSTLM
> --build-lm=/usr/local/irstlm/bin/build-lm.sh --corpus=file
> --train-script=/opt/moses/scripts/training/train-model.perl.
>
>
>
> I do not expect  the problem is in LM preparation steps as we are using
> the same scripts for long time without issues.
>
> Parameters of trained LM:
>
> iARPA
>
>
>
> \data\
>
> ngram 1= 219165
>
> ngram 2= 2616463
>
> ngram 3= 7215865
>
>
>
>
>
> The command issued for the recasing experiment:
>
> echo 'some text to recase ' | moses -f recase/moses.ini
>
>
>
> Response on  Fedora (showing only the part with the LM  data loading) :
>
>
>
> Defined parameters (per moses.ini or switch):
>
>         config: moses.ini
>
>         distortion-limit: 6
>
>         input-factors: 0
>
>         lmodel-file: 1 0 3 /tmp/recase/cased.irstlm.gz
>
>         mapping: 0 T 0
>
>         ttable-file: 0 0 0 5 /tmp/recase/phrase-table.gz
>
>         ttable-limit: 20
>
>         weight-d: 0.6
>
>         weight-l: 0.5000
>
>         weight-t: 0.20 0.20 0.20 0.20 0.20
>
>         weight-w: -1
>
> /var/www/moses/bin
>
> ScoreProducer: Distortion start: 0 end: 1
>
> ScoreProducer: WordPenalty start: 1 end: 2
>
> ScoreProducer: !UnknownWordPenalty start: 2 end: 3
>
> Loading lexical distortion models...have 0 models
>
> Start loading LanguageModel /tmp/recase/cased.irstlm.gz : [0.009] seconds
>
> In LanguageModelIRST::Load: nGramOrder = 3
>
> Language Model Type of /tmp/recase/cased.irstlm.gz is 1
>
> Language Model Type is 1
>
> iARPA
>
> loadtxt_ram()
>
> 1-grams: reading 219165 entries
>
> done level1
>
> 2-grams: reading 2616463 entries
>
> done level2
>
> 3-grams: reading 7215865 entries
>
> .done level3
>
> done
>
> OOV code is 219164
>
> OOV code is 219164
>
> IRST: m_unknownId=219164
>
> ScoreProducer: LM start: 3 end: 4
>
> Finished loading LanguageModels : [34.666] seconds
>
> ...
>
>
>
> Reponse on Suse:
>
> Defined parameters (per moses.ini or switch):
>
>         config: recase/moses.ini
>
>         distortion-limit: 6
>
>         input-factors: 0
>
>         lmodel-file: 1 0 3 /home/sandy/retrain/recase/cased.irstlm.gz
>
>         mapping: 0 T 0
>
>         ttable-file: 0 0 0 5 /home/sandy/retrain/recase/phrase-table.gz
>
>         ttable-limit: 20
>
>         weight-d: 0.6
>
>         weight-l: 0.5000
>
>         weight-t: 0.20 0.20 0.20 0.20 0.20
>
>         weight-w: -1
>
>
>
> ScoreProducer: Distortion start: 0 end: 1
>
> ScoreProducer: WordPenalty start: 1 end: 2
>
> ScoreProducer: !UnknownWordPenalty start: 2 end: 3
>
> Loading lexical distortion models...have 0 models
>
> Start loading LanguageModel /home/sandy/retrain/recase/cased.irstlm.gz :
> [0.001] seconds
>
> In LanguageModelIRST::Load: nGramOrder = 3
>
> Language Model Type of /home/sandy/retrain/recase/cased.irstlm.gz is 1
>
> Language Model Type is 1
>
> iARPA
>
> loadtxt_ram()
>
> 1-grams: reading 219165 entries
>
> done level 1
>
> 2-grams: reading 2616463 entries
>
> done level 2
>
> 3-grams: reading 7215865 entries
>
> .done level 3
>
> done
>
> OOV code is 219164
>
> OOV code is 219164
>
> IRST: m_unknownId=219164
>
> ScoreProducer: LM start: 3 end: 4
>
> Finished loading LanguageModels : [1045.969] seconds
>
> ...
>
>
>
> As you can see the loading takes enormous 1045 seconds.
>
>
>
> ---
>
> Meanwhile I have found there is also gcc 4.7 available in SUSE 11 SP3 SDK,
> thus I tried to recompile  boost/irstlm/moses, but the results are almost
> same (it is faster by 200 sec due the optimization in compiler.)
>
>
>
> Thus the last config on SUSE is following:
>
>
>
> irstlm 5.80.03  - recompiled under gcc 4.7
>
> mgiza  0.6.3     updated to 0.7.3 and recompiled under 4.7
>
> boost  1.55     - recompiled under gcc 4.7
>
>
>
> I have also attached the build.log in case it would be useful.
>
>
>
> Today I am going to run regression tests to see if there are any
> particular issues found.
>
>
>
>
>
> Tomas
>
>
>
>
>
> *From:* hieuhoang@gmail.com [mailto:hieuhoang@gmail.com<hieuhoang@gmail.com>]
> *On Behalf Of *Hieu Hoang
> *Sent:* Wednesday, March 19, 2014 1:36 AM
> *To:* Tomas Fulajtar
> *Cc:* moses-support@mit.edu
> *Subject:* Re: [Moses-support] Recaser - LM model loading
>
>
>
> What is a recaser LM? What command is taking 20 minutes? Can you send me
> the moses.ini file you're using.
>
>
>
>
>
> On 17 March 2014 12:58, Tomas Fulajtar <TomasFu@moravia.com> wrote:
>
> Hello,
>
>
>
> I am experiencing strange behavior when  using recaser  LM model  after
> migrated to moses(1.0) compiled on different machine.
>
> The problem is that loading of LM takes  20 minutes on my new machine
> (SUSE), while on previous it was 20 secs or so.
>
>
>
> Machine 1: Fedora 18:
>
> ?         gcc: 4.7.2
>
> ?         perl 5.16
>
> ?         moses  1.0
>
> ?         irstlm 5.80.01
>
> ?         mgiza  0.7.0
>
> ?         boost  1.52
>
>
>
> Machine 2: SUSE  SLES  11 SP3
>
>
>
> ?         perl: 5.10.0
>
> ?         gcc: 4.3
>
> ?         moses  1.0
>
> ?         irstlm 5.80.03
>
> ?         mgiza  0.6.3
>
> ?         boost  1.55
>
>
>
> Moses compilation command:
>
>
>
> sudo ./bjam --prefix=/opt/moses --install-scripts=/opt/moses/scripts -j4
> -a --with-irstlm=/usr/local/irstlm --with-xmlrpc-c=/usr/local
> --with-cmph=usr/local --with-boost=/opt/boost --with-giza=/usr/local/bin
> --enable-boost-pool --enable-optimization  --debug-symbols=off toolset=gcc
> -d2 --debug-configuration --max-kenlm-order=7 |tee ~/build.log 2>&1
>
>
>
> I have tested the speed using the same recaser  IRSTLM model data in ARPA
> format . There is actually no error displayed, thus I wonder where to
> continue with debugging. Also tried to retrain model on SUSE  and then test
> on Fedora, but the result is same (no error, but too slow on SUSE). Does
> anybody have idea where to look for resolution? Maybe the problem is in
> IRSTLM used?
>
>
>
>
>
> Thank you,
>
>
>
> Tomas Fulajtar
>
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140319/8dcb7929/attachment.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 89, Issue 46
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 89, Issue 46"
Post a Comment