Moses-support Digest, Vol 85, Issue 47

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Estimating probabilities with KenLM (Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Tue, 26 Nov 2013 15:39:09 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Estimating probabilities with KenLM
To: Prasanth K <prasanthk.ms09@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbiFv+sMkJJ2hqR1yxRERBFFpJraDKhT2Deo8cEs=+AToA@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

runs ok for me. Try git pull on moses if your code is a few months old.
There might be some error incompatibility between the lmplz wrapper script
and lmplz.

My command:
# trainlm-lmplz.perl -order 5 -lmplz
~/workspace/github/mosesdecoder/bin/lmplz -T . -S 1G -text
lm/europarl.lowercased.1 -lm lm/europarl.lmplz
...
Chain sizes: 1:171084 2:2345088 3:6649660 4:10482672 5:13132616
=== 5/5 Writing ARPA model ===
RSSMax:219410432 kB user:2.62476 sys:2.46472 CPU:5.08947 real:0



On 26 November 2013 14:57, Prasanth K <prasanthk.ms09@gmail.com> wrote:

> Ok. I have managed to re-create this error (no reason why it shouldn't
> come back, I knew exactly what I told moses to do). So, the exact command
> run to create the language model from the logs is as follows:
>
> scripts/generic/trainlm-lmplz.perl -lmplz bin/lmplz -order 5 -T europarl.en-sv/phrase-based-dup/tmp
> -S 10G -text europarl.en-sv/phrase-based-dup/lm/europarl.lowercased.1 -lm
> europarl.en-sv/phrase-based-dup/lm/europarl.lm.1
>
> Of course, all paths in the above command given were absolute paths (I
> just removed them for readability.) When this is run, my log file from EMS
> gives the following in LM_europarl_train.id.STDERR
>
> EXECUTING bin/lmplz --order 5 -T europarl.en-sv/phrase-based-dup/tmp -S
> 10G < europarl.en-sv/phrase-based-dup/lm/europarl.lowercased.1 >
> europarl.en-sv/phrase-based-dup/lm/europarl.lm.1
>
> === 1/5 Counting and sorting n-grams ===
>
> Reading stdin
>
>
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>
>
> ****************************************************************************************************
>
> Function not implemented
>
> This does not get the language model step to crash, instead creates an
> empty language model (0 lines). The below is the log file for
> LM_europarl_binarize.id.STDERR
>
> Reading europarl.en-sv/phrase-based-dup/lm/europarl.lm.1
>
>
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>
> End of file Byte: 0 File: europarl.en-sv/phrase-based-dup/lm/europarl.lm.1
>
> ERROR
>
> Clearly, something is wrong with my installation of kenlm (the decoding
> with kenlm works just fine ..I have confirmed that now), which makes the
> estimation go funny. The question is where I start to fix this?
>
> Thanks.
>
> - Regards,
>
> Prasanth
>
>
> On Tue, Nov 26, 2013 at 1:56 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>
>> ok, i can't reproduce your error
>> FUnction not implemented
>> you should find out exactly how lmplz is being run, it may be that you
>> have a slightly older version and doesn't know all the arguments you've
>> given it.
>>
>>
>> On 26/11/2013 06:47, Prasanth K wrote:
>>
>> Hello Hieu,
>>
>> My first attempt was to specify the absolute amount of memory (10G) but
>> that gave an error saying function not implemented. Later, when I tried
>> specifying the relative size (80%), I got a similar parse error to what you
>> have given above. Strange that it should
>>
>> @Kenneth, thanks for the code to estimate physical memory. I am going
>> to give it a shot and let you know how it goes.
>>
>> - Regards,
>> Prasanth
>>
>>
>> On Mon, Nov 25, 2013 at 9:20 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>>
>>> Prasanth - what is the exact lmplz command that was ran by the EMS?
>>>
>>>
>>> This works
>>> .../lmplz --order 5 --text lm/europarl.lowercased.1 --arpa
>>> lm/europarl.lmplz -T /tmp -S 1G
>>> This doesn't
>>> .../lmplz --order 5 --text lm/europarl.lowercased.1 --arpa
>>> lm/europarl.lmplz -T /tmp -S 80%
>>> it give the error
>>> util/usage.cc:220 in uint64_t util::<anonymous
>>> namespace>::ParseNum(const std::string &) [Num = double] threw
>>> SizeParseError because `!mem'.
>>> Failed to parse 80% into a memory size because % was specified but the
>>> physical memory size could not be determined.
>>>
>>> However, it worked even with the source code from 4 days ago.
>>>
>>>
>>> On 25/11/2013 19:07, Kenneth Heafield wrote:
>>> > Hi,
>>> >
>>> > I've taken a shot in the dark based on physmem.c to support
>>> physical
>>> > memory estimation on BSD and OS X. Please clone
>>> >
>>> > github.com/kpu/kenlm
>>> >
>>> > and compile with
>>> >
>>> > ./bjam
>>> >
>>> > If that fails, please let Hieu and I know (maybe Hieu can help since he
>>> > has OS X). If it doesn't fail, run
>>> >
>>> > bin/lmplz
>>> >
>>> > with no argument. The help message will include a line e.g.
>>> >
>>> > "This machine has 135224176640 bytes of memory."
>>> >
>>> > or
>>> >
>>> > "Unable to determine the amount of memory on this machine."
>>> >
>>> > If it works, then I'll push to Moses. Trying to not break Moses master
>>> > for OS X.
>>> >
>>> > Kenneth
>>> >
>>> > On 11/24/13 22:40, Prasanth K wrote:
>>> >> Hi Kenneth,
>>> >>
>>> >> Thanks for the clarification w.r.t. calculating the memory size. But I
>>> >> am running these on a Mac (10.9 Mavericks). Do you think I should
>>> still
>>> >> port the lmplz code to Mac for the estimation of probabilities?
>>> >>
>>> >> One thing though, I did change the default clang compiler that comes
>>> >> with this new Mac to a gcc-4.8 (not sure that changes anything in this
>>> >> context).
>>> >>
>>> >> - Prasanth
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Nov 22, 2013 at 6:50 PM, Kenneth Heafield <
>>> moses@kheafield.com
>>> >> <mailto:moses@kheafield.com>> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> What OS are you on? Cygwin? Apparently every OS reports
>>> >> memory size
>>> >> in a different way:
>>> >>
>>> >>
>>> http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/physmem.c;h=2629936146e3042f927523322f18aca76996cd7f;hb=HEAD
>>> >>
>>> >> The good news is that the above code is LGPLv2:
>>> >>
>>> >>
>>> http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=modules/physmem;h=9644522e0493a85a9fb4ae7c4449741c2c1500ea;hb=HEAD
>>> >>
>>> >> But currently I'm just using this short function that will fail
>>> on some
>>> >> platforms:
>>> >>
>>> >> uint64_t GuessPhysicalMemory() {
>>> >> #if defined(_WIN32) || defined(_WIN64)
>>> >> return 0;
>>> >> #elif defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
>>> >> long pages = sysconf(_SC_PHYS_PAGES);
>>> >> if (pages == -1) return 0;
>>> >> long page_size = sysconf(_SC_PAGESIZE);
>>> >> if (page_size == -1) return 0;
>>> >> return static_cast<uint64_t>(pages) *
>>> >> static_cast<uint64_t>(page_size);
>>> >> #else
>>> >> return 0;
>>> >>

0 Response to "Moses-support Digest, Vol 85, Issue 47"

Post a Comment