Moses-support Digest, Vol 116, Issue 4

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Extract list of n-grams from Trie Language Model that
contains a certain word (Graeme Kidd)
2. Re: Extract list of n-grams from Trie Language Model that
contains a certain word (Kenneth Heafield)
3. About gpu-moses (Sen Lam)
4. Re: About gpu-moses (Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Sat, 4 Jun 2016 01:42:40 +0100
From: "Graeme Kidd" <graemekidd@gmail.com>
Subject: [Moses-support] Extract list of n-grams from Trie Language
Model that contains a certain word
To: <moses-support@mit.edu>
Message-ID: <001201d1bdfa$007d4b60$0177e220$@gmail.com>
Content-Type: text/plain; charset="us-ascii"

Hi,



This is still all very new to me so apologies if this is not the correct
place to ask this questions.



I am wanting to take the English Trie Language Model (5.5TB) created from
the Common Crawl data set:

http://data.statmt.org/ngrams/lm/en.trie



Then extract all n-grams that contain a certain word. This needs to be done
for a list of 100 words. For example if I was looking for all n-grams that
contained the word "discombobulated" I would want an output file containing
the n-gram that contains that word and the number of times that n-gram
occurs:

word1 discombobulated 25

word1 discombobulated word3 40



Due to the size of the file, this is something I am keen to get right first
time. For this reason is someone able to give me an example of how this can
be done and would this kind of query be possible with 64GB of RAM?



Thanks,

Graeme

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160603/bfd5a1a2/attachment-0001.html

------------------------------

Message: 2
Date: Sat, 04 Jun 2016 07:00:02 +0000
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Extract list of n-grams from Trie
Language Model that contains a certain word
To: Graeme Kidd <graemekidd@gmail.com>, moses-support@mit.edu
Message-ID: <5456F8B4-FC95-4EE5-8EC2-9DF6CFAD93C8@kheafield.com>
Content-Type: text/plain; charset="utf-8"

The trie file you have contains conditional probabilities and backoffs but not counts. If you're OK with that, check out/modify the dump_trie program in the bounded-noquant branch of github.com/kpu/kenlm . It can stream but you will need to do ulimit -v with something above 6 TB even though physical usage will be fine.

For counts, contact me off list.

On June 4, 2016 1:42:40 AM GMT+01:00, Graeme Kidd <graemekidd@gmail.com> wrote:
>Hi,
>
>
>
>This is still all very new to me so apologies if this is not the
>correct
>place to ask this questions.
>
>
>
>I am wanting to take the English Trie Language Model (5.5TB) created
>from
>the Common Crawl data set:
>
>http://data.statmt.org/ngrams/lm/en.trie
>
>
>
>Then extract all n-grams that contain a certain word. This needs to be
>done
>for a list of 100 words. For example if I was looking for all n-grams
>that
>contained the word "discombobulated" I would want an output file
>containing
>the n-gram that contains that word and the number of times that n-gram
>occurs:
>
>word1 discombobulated 25
>
>word1 discombobulated word3 40
>
>
>
>Due to the size of the file, this is something I am keen to get right
>first
>time. For this reason is someone able to give me an example of how this
>can
>be done and would this kind of query be possible with 64GB of RAM?
>
>
>
>Thanks,
>
>Graeme
>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Moses-support mailing list
>Moses-support@mit.edu
>http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160604/9a385727/attachment-0001.html

------------------------------

Message: 3
Date: Sat, 4 Jun 2016 16:04:35 +0700
From: Sen Lam <lamsencntt@gmail.com>
Subject: [Moses-support] About gpu-moses
To: moses-support@mit.edu
Message-ID:
<CAOQw-C+Kc6rNCivC=eWFdNKEZfoxwL0ot-dM8U2arnBwihN+0A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

*Hi,I've successfully built moses baseline system, now I want to use GPU to
improve Moses. I've found the fast-moses on github but not sure how it
suppose to work.And in the initialize step, i can not find the config file,
is it the moses.ini file in mert-work after tuning?*

*Can anyone give me more information about this.*


*Thanks in advance*



*L?m Th? Sen - CNPMK9B - 0977347109*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160604/a4594615/attachment-0001.html

------------------------------

Message: 4
Date: Sat, 4 Jun 2016 10:55:16 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] About gpu-moses
To: Sen Lam <lamsencntt@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbiijGck9cP8MMZ4aAGo5ijnsQ01ijUXhdxiftvLNMym+w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

just me messing around. It doesn't work

Hieu Hoang
http://www.hoang.co.uk/hieu

On 4 June 2016 at 10:04, Sen Lam <lamsencntt@gmail.com> wrote:

>
>
>
> *Hi,I've successfully built moses baseline system, now I want to use GPU
> to improve Moses. I've found the fast-moses on github but not sure how it
> suppose to work.And in the initialize step, i can not find the config file,
> is it the moses.ini file in mert-work after tuning?*
>
> *Can anyone give me more information about this.*
>
>
> *Thanks in advance*
>
>
>
> *L?m Th? Sen - CNPMK9B - 0977347109*
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160604/e3c3e98c/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 116, Issue 4
*********************************************

0 Response to "Moses-support Digest, Vol 116, Issue 4"

Post a Comment