Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Too large language models - how to handle that?
(Marcin Junczys-Dowmunt)
2. Alignment output from mosesserver for hiero models (Guchun Zhang)
3. Re: Alignment output from mosesserver for hiero models
(Barry Haddow)
----------------------------------------------------------------------
Message: 1
Date: Mon, 24 Nov 2014 15:19:46 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Too large language models - how to handle
that?
To: moses-support@mit.edu
Message-ID: <f523bdda88d5a233bfbe9d3548647ca3@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"
The command
moses/bin/build_binary trie -a 22 -b 8 -q 8 lm.arpa lm.kenlm
will build a compressed binarized model with quantization. You can run
moses/bin/build_binary lm.arpa
without any parameters to get size estimates for different parameter
settings. I would guess you will get a binarized LM of roughly 20 to 30
GB which is managable (provided the size you gave us is that of an
uncompressed text file). You can also use lmplz to build pruned models
in the first place, these will be much smaller.
W dniu 2014-11-24 15:11, Tom Hoar napisa?(a):
> After binarizing such a large ARPA file with KenLM, you'll need to configure your moses.ini file to "lazily load the model using mmap." This involves using lmodel-file code "9" vs code "8." More details here: https://kheafield.com/code/kenlm/moses/ [2]
>
> Performance improves significantly if you store the binarized file on an SSD.
>
> On 11/24/2014 07:00 PM, Raj Dabre wrote:
>
> Hey Hoang, You should binarize the arpa file. The readme of the LM tool (KenLM or IRSTLM or SRILM) will tell you how. Regards.
>
> On Mon, Nov 24, 2014 at 7:07 PM, Hoang Cuong <hoangcuong2011@gmail.com> wrote:
>
> Hi all,
> I have trained an (unpruned) 5-grams language model on a large corpus of 5 billion words, resulting an ARPA-format file of roughly 300GB (is it a normal LM size with such a big monolingual data?). This is obviously too big for running an SMT system.
> I read several works where their system uses language models trained on similar monolingual corpus. Could you give me some advice how to handle this, making it feasible to run SMT systems?
> I appreciate your help a lot,
> Best,
>
> --
>
> Best Regards,
>
> Hoang Cuong
>
> SMTNerd
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support [1]
>
> --
>
> Raj Dabre.
> Research Student, Graduate School of Informatics,
> Kyoto University.
>
> CSE MTech, IITB., 2011-2014
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support [1]
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support [1]
Links:
------
[1] http://mailman.mit.edu/mailman/listinfo/moses-support
[2] https://kheafield.com/code/kenlm/moses/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141124/4f2f2409/attachment-0001.htm
------------------------------
Message: 2
Date: Mon, 24 Nov 2014 14:27:43 +0000
From: Guchun Zhang <gzhang@alphacrc.com>
Subject: [Moses-support] Alignment output from mosesserver for hiero
models
To: "moses-support@MIT.EDU" <moses-support@mit.edu>
Message-ID:
<CA+cfSV+B5waArfXYocfXPJVX6QuT5rXA3TAjGu=Aict3d+7kVg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi there,
When mosesserver is called with phrase based models, I can get both the
translation and alignment using the sample client scripts in *contrib*.
However, when it's used with hiero models, I can't get the alignment with
the same scripts. Is this intentional? Where should I look at to get the
alignment?
Many thanks,
Guchun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141124/8433de07/attachment-0001.htm
------------------------------
Message: 3
Date: Mon, 24 Nov 2014 16:58:45 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Alignment output from mosesserver for
hiero models
To: Guchun Zhang <gzhang@alphacrc.com>, "moses-support@MIT.EDU"
<moses-support@mit.edu>
Message-ID: <547363C5.8000806@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Hi Guchun
I'm afraid word alignment is not implemented for hiero models, in moses
server. You're welcome to add it.
cheers - Barry
On 24/11/14 14:27, Guchun Zhang wrote:
> Hi there,
>
> When mosesserver is called with phrase based models, I can get both
> the translation and alignment using the sample client scripts in
> /contrib/. However, when it's used with hiero models, I can't get the
> alignment with the same scripts. Is this intentional? Where should I
> look at to get the alignment?
>
> Many thanks,
> Guchun
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 97, Issue 74
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 97, Issue 74"
Post a Comment