Moses-support Digest, Vol 89, Issue 47

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Recaser - LM model loading (Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Thu, 20 Mar 2014 00:53:44 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Recaser - LM model loading
To: Tomas Fulajtar <TomasFu@moravia.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAEKMkbg4Rh0WKCwqL+dn_cZb2W=5DmYQNcs49r7==uE7=M_SDw@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

It should matter --> It should not matter


On 19 March 2014 23:09, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:

> You seem to be using the text LM. This will take a long time to load,
> especially if it's over a network. It should matter what linux distribution
> you're using.
>
> You should:
> 1. Make sure your files are on local disks
> 2. Binarize the LM with KenLM or IRSTLM. Also, binarize the phrase tables
> 3. If it's a recaser, the distortion limit [distortion-limit] should be
> 0. Otherwise the recaser can reorder the output.
>
> Also, you should consider updating your version of Moses. This will allow
> you to use IRSTLM 5.80.03. There's various changes to make it more
> extensible, faster and more reliable.
>
>
>
> On 19 March 2014 12:31, Tomas Fulajtar <TomasFu@moravia.com> wrote:
>
>> Hi Hieu,
>>
>>
>>
>> Looking to log , the problem seems to be related to IRTSLM library and it
>> code inside src/lmtable.cpp (function named loadtext_ram).
>>
>>
>>
>> I have tried to return back to IRSTLM 5.80.01 and it resolved the issue
>> with long LM loading. However as the issue might be reproducible by other
>> people, I am wondering if we should report it to IRTSLM team and maybe
>> add the comment to Moses wiki as well ( A see there is a comment about
>> issues with IRSTLM source code in official repos and recommended to prefer
>> 5.80.03, which unfortunately wont' work on my environment).
>>
>>
>>
>> Kind regards,
>>
>>
>>
>>
>>
>> Tomas
>>
>>
>>
>> *From:* Tomas Fulajtar
>> *Sent:* Wednesday, March 19, 2014 9:58 AM
>> *To:* 'Hieu Hoang'
>> *Cc:* moses-support@mit.edu
>> *Subject:* RE: [Moses-support] Recaser - LM model loading
>>
>>
>>
>> Hi Hieu,
>>
>>
>>
>> Please find the Moses.ini attached.
>>
>>
>>
>> The LM model is default 3-gram IRSTLM trained by command :
>>
>> /opt/moses/scripts/recaser/train-recaser.perl --dir=$dir --lm=IRSTLM
>> --build-lm=/usr/local/irstlm/bin/build-lm.sh --corpus=file
>> --train-script=/opt/moses/scripts/training/train-model.perl.
>>
>>
>>
>> I do not expect the problem is in LM preparation steps as we are using
>> the same scripts for long time without issues.
>>
>> Parameters of trained LM:
>>
>> iARPA
>>
>>
>>
>> \data\
>>
>> ngram 1= 219165
>>
>> ngram 2= 2616463
>>
>> ngram 3= 7215865
>>
>>
>>
>>
>>
>> The command issued for the recasing experiment:
>>
>> echo 'some text to recase ' | moses -f recase/moses.ini
>>
>>
>>
>> Response on Fedora (showing only the part with the LM data loading) :
>>
>>
>>
>> Defined parameters (per moses.ini or switch):
>>
>> config: moses.ini
>>
>> distortion-limit: 6
>>
>> input-factors: 0
>>
>> lmodel-file: 1 0 3 /tmp/recase/cased.irstlm.gz
>>
>> mapping: 0 T 0
>>
>> ttable-file: 0 0 0 5 /tmp/recase/phrase-table.gz
>>
>> ttable-limit: 20
>>
>> weight-d: 0.6
>>
>> weight-l: 0.5000
>>
>> weight-t: 0.20 0.20 0.20 0.20 0.20
>>
>> weight-w: -1
>>
>> /var/www/moses/bin
>>
>> ScoreProducer: Distortion start: 0 end: 1
>>
>> ScoreProducer: WordPenalty start: 1 end: 2
>>
>> ScoreProducer: !UnknownWordPenalty start: 2 end: 3
>>
>> Loading lexical distortion models...have 0 models
>>
>> Start loading LanguageModel /tmp/recase/cased.irstlm.gz : [0.009] seconds
>>
>> In LanguageModelIRST::Load: nGramOrder = 3
>>
>> Language Model Type of /tmp/recase/cased.irstlm.gz is 1
>>
>> Language Model Type is 1
>>
>> iARPA
>>
>> loadtxt_ram()
>>
>> 1-grams: reading 219165 entries
>>
>> done level1
>>
>> 2-grams: reading 2616463 entries
>>
>> done level2
>>
>> 3-grams: reading 7215865 entries
>>
>> .done level3
>>
>> done
>>
>> OOV code is 219164
>>
>> OOV code is 219164
>>
>> IRST: m_unknownId=219164
>>
>> ScoreProducer: LM start: 3 end: 4
>>
>> Finished loading LanguageModels : [34.666] seconds
>>
>> ...
>>
>>
>>
>> Reponse on Suse:
>>
>> Defined parameters (per moses.ini or switch):
>>
>> config: recase/moses.ini
>>
>> distortion-limit: 6
>>
>> input-factors: 0
>>
>> lmodel-file: 1 0 3 /home/sandy/retrain/recase/cased.irstlm.gz
>>
>> mapping: 0 T 0
>>
>> ttable-file: 0 0 0 5 /home/sandy/retrain/recase/phrase-table.gz
>>
>> ttable-limit: 20
>>
>> weight-d: 0.6
>>
>> weight-l: 0.5000
>>
>> weight-t: 0.20 0.20 0.20 0.20 0.20
>>
>> weight-w: -1
>>
>>
>>
>> ScoreProducer: Distortion start: 0 end: 1
>>
>> ScoreProducer: WordPenalty start: 1 end: 2
>>
>> ScoreProducer: !UnknownWordPenalty start: 2 end: 3
>>
>> Loading lexical distortion models...have 0 models
>>
>> Start loading LanguageModel /home/sandy/retrain/recase/cased.irstlm.gz :
>> [0.001] seconds
>>
>> In LanguageModelIRST::Load: nGramOrder = 3
>>
>> Language Model Type of /home/sandy/retrain/recase/cased.irstlm.gz is 1
>>
>> Language Model Type is 1
>>
>> iARPA
>>
>> loadtxt_ram()
>>
>> 1-grams: reading 219165 entries
>>
>> done level 1
>>
>> 2-grams: reading 2616463 entries
>>
>> done level 2
>>
>> 3-grams: reading 7215865 entries
>>
>> .done level 3
>>
>> done
>>
>> OOV code is 219164
>>
>> OOV code is 219164
>>
>> IRST: m_unknownId=219164
>>
>> ScoreProducer: LM start: 3 end: 4
>>
>> Finished loading LanguageModels : [1045.969] seconds
>>
>> ...
>>
>>
>>
>> As you can see the loading takes enormous 1045 seconds.
>>
>>
>>
>> ---
>>
>> Meanwhile I have found there is also gcc 4.7 available in SUSE 11 SP3
>> SDK, thus I tried to recompile boost/irstlm/moses, but the results are
>> almost same (it is faster by 200 sec due the optimization in compiler.)
>>
>>
>>
>> Thus the last config on SUSE is following:
>>
>>
>>
>> irstlm 5.80.03 - recompiled under gcc 4.7
>>
>> mgiza 0.6.3 updated to 0.7.3 and recompiled under 4.7
>>
>> boost 1.55 - recompiled under gcc 4.7
>>
>>
>>
>> I have also attached the build.log in case it would be useful.
>>
>>
>>
>> Today I am going to run regression tests to see if there are any
>> particular issues found.
>>
>>
>>
>>
>>
>> Tomas
>>
>>
>>
>>
>>
>> *From:* hieuhoang@gmail.com [mailto:hieuhoang@gmail.com<hieuhoang@gmail.com>]
>> *On Behalf Of *Hieu Hoang
>> *Sent:* Wednesday, March 19, 2014 1:36 AM
>> *To:* Tomas Fulajtar
>> *Cc:* moses-support@mit.edu
>> *Subject:* Re: [Moses-support] Recaser - LM model loading
>>
>>
>>
>> What is a recaser LM? What command is taking 20 minutes? Can you send me
>> the moses.ini file you're using.
>>
>>
>>
>>
>>
>> On 17 March 2014 12:58, Tomas Fulajtar <TomasFu@moravia.com> wrote:
>>
>> Hello,
>>
>>
>>
>> I am experiencing strange behavior when using recaser LM model after
>> migrated to moses(1.0) compiled on different machine.
>>
>> The problem is that loading of LM takes 20 minutes on my new machine
>> (SUSE), while on previous it was 20 secs or so.
>>
>>
>>
>> Machine 1: Fedora 18:
>>
>> ? gcc: 4.7.2
>>
>> ? perl 5.16
>>
>> ? moses 1.0
>>
>> ? irstlm 5.80.01
>>
>> ? mgiza 0.7.0
>>
>> ? boost 1.52
>>
>>
>>
>> Machine 2: SUSE SLES 11 SP3
>>
>>
>>
>> ? perl: 5.10.0
>>
>> ? gcc: 4.3
>>
>> ? moses 1.0
>>
>> ? irstlm 5.80.03
>>
>> ? mgiza 0.6.3
>>
>> ? boost 1.55
>>
>>
>>
>> Moses compilation command:
>>
>>
>>
>> sudo ./bjam --prefix=/opt/moses --install-scripts=/opt/moses/scripts -j4
>> -a --with-irstlm=/usr/local/irstlm --with-xmlrpc-c=/usr/local
>> --with-cmph=usr/local --with-boost=/opt/boost --with-giza=/usr/local/bin
>> --enable-boost-pool --enable-optimization --debug-symbols=off toolset=gcc
>> -d2 --debug-configuration --max-kenlm-order=7 |tee ~/build.log 2>&1
>>
>>
>>
>> I have tested the speed using the same recaser IRSTLM model data in ARPA
>> format . There is actually no error displayed, thus I wonder where to
>> continue with debugging. Also tried to retrain model on SUSE and then test
>> on Fedora, but the result is same (no error, but too slow on SUSE). Does
>> anybody have idea where to look for resolution? Maybe the problem is in
>> IRSTLM used?
>>
>>
>>
>>
>>
>> Thank you,
>>
>>
>>
>> Tomas Fulajtar
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> --
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>


--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140320/5abc5d86/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 89, Issue 47
*********************************************

0 Response to "Moses-support Digest, Vol 89, Issue 47"

Post a Comment