Moses-support Digest, Vol 102, Issue 52

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Working with big models (liling tan)
2. Re: Working with big models (liling tan)
3. Re: Working with big models (Kenneth Heafield)
4. Re: Working with big models (Marcin Junczys-Dowmunt)
5. Re: Working with big models (liling tan)

----------------------------------------------------------------------

Message: 1
Date: Sat, 25 Apr 2015 21:31:19 +0200
From: liling tan <alvations@gmail.com>
Subject: Re: [Moses-support] Working with big models
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAKzPaJ+Ka6gqDjEDZDHJh6D3YSwKZAk_fnvvEJrUzhyJC+6vHA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Moses devs/users,

@Marcin, thanks for the tip on the trie, I'll try out the trie.

About the 100 MERT iterations, when i tried to run mert-moses.pl on that
target language with 71GB of binarized language model on a 3000 line dev
set, it took more than one day to tune using 10 threads. Is that normal?

For a different experiment with a 38GB binarized language model, it took
max 4-5 hours to tune with 10 threads on a 3000 lines dev set. (all the
phrase-tables and rerodering-tables are binarized)

I ran mert-moses.pl with only the model directory and the path to
moses.ini.

Regards,
Liling

binarizing like this gives you a lot smaller file:

build_binary trie -a 22 -b 8 -q 8 lm.arpa.gz lm.kenlm

This uses quantization, in theory that could cause quality loss, but I
never saw that happen. Remove "-b 8 -q 8" if you are afraid of that, the
file will be larger, but still a lot smaller than what you have. That's
about all I do. You said "100 MERT iterations" ... what do you mean by
that? Also the LM uses memory mapping in shared memory, so running
several moses instances in parallel does not use additional memory due
to the LM, similar for the phrase table.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150425/cc9ac351/attachment-0001.htm

------------------------------

Message: 2
Date: Sat, 25 Apr 2015 21:34:53 +0200
From: liling tan <alvations@gmail.com>
Subject: Re: [Moses-support] Working with big models
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAKzPaJKtf0aRNah8S=xeVxhhFJZOAP_iwGHed+Y9FydjSE58LQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Moses devs/users,

@Marcin, thanks for the tip on the trie, I'll try out the trie.

About the 100 MERT iterations, when i tried to run mert-moses.pl on that
target language with 71GB of binarized language model on a 3000 line dev
set, it took more than one day to tune using 10 threads.

*Is that normal? *

For a different experiment with a 38GB binarized language model, it took
max 4-5 hours to tune with 10 threads on a 3000 lines dev set. (all the
phrase-tables and rerodering-tables are binarized)

I ran mert-moses.pl with only the model directory and the path to
moses.ini.

Regards,
Liling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150425/1f318260/attachment-0001.htm

------------------------------

Message: 3
Date: Sat, 25 Apr 2015 15:37:10 -0400
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Working with big models
To: moses-support@mit.edu
Message-ID: <553BECE6.2090800@kheafield.com>
Content-Type: text/plain; charset=windows-1252

Hi,

Why are you running 100 MERT iterations as opposed to, say, 20? And
whether that amount of time is normal depends on how much RAM you have.

Kenneth

On 04/25/2015 03:31 PM, liling tan wrote:
> Dear Moses devs/users,
>
> @Marcin, thanks for the tip on the trie, I'll try out the trie.
>
> About the 100 MERT iterations, when i tried to run mert-moses.pl
> <http://mert-moses.pl> on that target language with 71GB of binarized
> language model on a 3000 line dev set, it took more than one day to tune
> using 10 threads. Is that normal?
>
> For a different experiment with a 38GB binarized language model, it took
> max 4-5 hours to tune with 10 threads on a 3000 lines dev set. (all the
> phrase-tables and rerodering-tables are binarized)
>
> I ran mert-moses.pl <http://mert-moses.pl> with only the model directory
> and the path to moses.ini.
>
> Regards,
> Liling
>
>
> binarizing like this gives you a lot smaller file:
>
> build_binary trie -a 22 -b 8 -q 8 lm.arpa.gz lm.kenlm
>
> This uses quantization, in theory that could cause quality loss, but I
> never saw that happen. Remove "-b 8 -q 8" if you are afraid of that, the
> file will be larger, but still a lot smaller than what you have. That's
> about all I do. You said "100 MERT iterations" ... what do you mean by
> that? Also the LM uses memory mapping in shared memory, so running
> several moses instances in parallel does not use additional memory due
> to the LM, similar for the phrase table.
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

------------------------------

Message: 4
Date: Sat, 25 Apr 2015 21:41:54 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Working with big models
To: moses-support@mit.edu
Message-ID: <553BEE02.2050808@amu.edu.pl>
Content-Type: text/plain; charset=windows-1252; format=flowed

One day is a possible period. Happens sometimes depending on your
configuration and other weird stuff.

W dniu 25.04.2015 o 21:37, Kenneth Heafield pisze:
> Hi,
>
> Why are you running 100 MERT iterations as opposed to, say, 20? And
> whether that amount of time is normal depends on how much RAM you have.
>
> Kenneth
>
> On 04/25/2015 03:31 PM, liling tan wrote:
>> Dear Moses devs/users,
>>
>> @Marcin, thanks for the tip on the trie, I'll try out the trie.
>>
>> About the 100 MERT iterations, when i tried to run mert-moses.pl
>> <http://mert-moses.pl> on that target language with 71GB of binarized
>> language model on a 3000 line dev set, it took more than one day to tune
>> using 10 threads. Is that normal?
>>
>> For a different experiment with a 38GB binarized language model, it took
>> max 4-5 hours to tune with 10 threads on a 3000 lines dev set. (all the
>> phrase-tables and rerodering-tables are binarized)
>>
>> I ran mert-moses.pl <http://mert-moses.pl> with only the model directory
>> and the path to moses.ini.
>>
>> Regards,
>> Liling
>>
>>
>> binarizing like this gives you a lot smaller file:
>>
>> build_binary trie -a 22 -b 8 -q 8 lm.arpa.gz lm.kenlm
>>
>> This uses quantization, in theory that could cause quality loss, but I
>> never saw that happen. Remove "-b 8 -q 8" if you are afraid of that, the
>> file will be larger, but still a lot smaller than what you have. That's
>> about all I do. You said "100 MERT iterations" ... what do you mean by
>> that? Also the LM uses memory mapping in shared memory, so running
>> several moses instances in parallel does not use additional memory due
>> to the LM, similar for the phrase table.
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

Message: 5
Date: Sat, 25 Apr 2015 22:08:43 +0200
From: liling tan <alvations@gmail.com>
Subject: Re: [Moses-support] Working with big models
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAKzPaJK==vkPXE1_x9zNhaRcRWj62TUEfmZg3DWGo+EuWNZH3g@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Moses devs/users,

@Ken, MERT with 100 iterations might be an overkill but MERT with 20 also
wouldn't finished in a day on the 71GB binarzied LM. It was at the 5th
iteration when i killed it.

@Marcin, I'll try to retune with the trie LM and see how far it goes.

*Does the build_binary come with thread options?*

*Also, would the KenLM filter help? How does the filter work? *

I've tried on the 71GB LM:

~/moses/bin/filter union lm.en.binary lm.filter.binary

After waiting for 15 mins, it sort of looks like there nothing going on and
looks stuck.
More details on:
http://stackoverflow.com/questions/29869607/how-to-tune-a-machine-translation-model-with-huge-language-model

Regards,
Liling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150425/4e9af6d9/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 102, Issue 52
**********************************************

Moses-support Digest, Vol 102, Issue 52

0 Response to "Moses-support Digest, Vol 102, Issue 52"

Post a Comment