Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. How to use XML-Input in Moses? (liling tan)
2. Two postdoc positions in machine translation at ADAPT Centre,
Dublin City University (deadline extended) (Qun Liu)
3. Re: Duplicated source files (Jeroen Vermeulen)
4. Re: Duplicated source files (Kenneth Heafield)
5. 12-gram language model ARPA file for 16GB (liling tan)
----------------------------------------------------------------------
Message: 1
Date: Sun, 3 May 2015 00:09:22 +0200
From: liling tan <alvations@gmail.com>
Subject: [Moses-support] How to use XML-Input in Moses?
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAKzPaJ+xKnVBCh4zhFS99hM2b2Uu9G-LtoiLiWAqHsk+v8ARgg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Dear Moses devs/users,
I want to use the XML-Input to add constraints when decoding (
http://www.statmt.org/moses/?n=Advanced.Hybrid#ntoc7)
The example on the Moses page shows only an example with one xml input. I
have 700,000 of those in a dictionary that I can search and replace using a
python script to change the decoder's input file. It's rather slow when i'm
decoding a huge file. I've to search through all 700,000 terms in the
dictionary for each sentence and do a regex replace.
Is there a cannonical way to add a dictionary for XML-input in moses?
Is there a page that someone can point me to for that?
Regards,
Liling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150503/5fe77b48/attachment-0001.htm
------------------------------
Message: 2
Date: Sun, 03 May 2015 01:00:16 +0100
From: Qun Liu <liuquncn@gmail.com>
Subject: [Moses-support] Two postdoc positions in machine translation
at ADAPT Centre, Dublin City University (deadline extended)
To: mt-list@eamt.org, moses-support@mit.edu, corpora@uib.no,
"IRList@lists.shef.ac.uk" <IRList@lists.shef.ac.uk>, ",
\"cl-list@lists.ifi.uzh.ch\"" <cl-list@lists.ifi.uzh.ch>, ",
\"complit@linguistlist.org\"" <complit@linguistlist.org>, ",
\"cngl_allmembers@mailhost.computing.dcu.ie\""
<cngl_allmembers@mailhost.computing.dcu.ie>, ", KDEG Mailing List"
<kdeg@cs.tcd.ie>, ", \"liresearch@computing.dcu.ie\""
<liresearch@computing.dcu.ie>, ", \"LRC@ul.ie\"" <LRC@ul.ie>, ",
\"maillist@afnlp.org\"" <maillist@afnlp.org>, ",
\"publ@isca-speech.org\"" <publ@isca-speech.org>, ",
\"nlpcall@watarts.uwaterloo.ca\"" <nlpcall@watarts.uwaterloo.ca>, ",
\"news@multilingual.com\"" <news@multilingual.com>, ",
\"researchers@pascal-network.org\"" <researchers@pascal-network.org>,
", \"www-rdf-logic@w3.org\"" <www-rdf-logic@w3.org>, ",
\"SIGHIT-MEMBERS@LISTSERV.ACM.ORG\""
<SIGHIT-MEMBERS@LISTSERV.ACM.ORG>, ", \"trec-blog@nist.gov\""
<trec-blog@nist.gov>, ", \"elsnet-list@elsnet.org\""
<elsnet-list@elsnet.org>, ", \"humanist@lists.digitalhumanities.org\""
<humanist@lists.digitalhumanities.org>
Message-ID: <55456510.7070602@gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
The advertisement can be downloaded from:
http://dcu.ie/sites/default/files/hr/Post%20Doctoral%20Researcher%20in%20Machine%20Translation%20Models%20and%20Evaluations%20-%20Adapt%20Ref%2070.pdf
http://dcu.ie/sites/default/files/hr/Post%20Doctoral%20Researcher%20in%20Machine%20Translation%20Quality%20Estimation%20and%20Evaluation%20-%20Adapt%20Ref%2069.pdf
Thank you for your attention.
------------------------------
Message: 3
Date: Sun, 03 May 2015 15:08:43 +0700
From: Jeroen Vermeulen <jtv@precisiontranslationtools.com>
Subject: Re: [Moses-support] Duplicated source files
To: Kenneth Heafield <moses@kheafield.com>, moses-support@mit.edu
Message-ID:
<DE0A5BFA-011F-4457-99E7-9C3473B4CF20@precisiontranslationtools.com>
Content-Type: text/plain; charset=UTF-8
On May 2, 2015 2:58:08 AM GMT+07:00, Kenneth Heafield <moses@kheafield.com> wrote:
>If this comment accurate that gzfilebuf is only used for writing?
>
>/** wrapper around gzip input stream. Unknown parentage
> * @todo replace with boost version - output stream already uses it
> */
>
>If so I'll just extend util/fake_ofstream.hh to have gzip support.
>
>Time to print a bunch of integers:
>
>FakeOFStream:
>
>real 0m3.460s
>user 0m3.459s
>sys 0m0.004s
>
>std::cout
>
>real 0m23.010s
>user 0m22.895s
>sys 0m0.134s
>
>Time to print a bunch of floats:
>
>FakeOFStream:
>
>real 0m34.871s
>user 0m34.894s
>sys 0m0.006s
>
>std::cout
>
>real 1m56.628s
>user 1m56.690s
>sys 0m0.037s
>
>The conversion is done by https://github.com/miloyip/itoa-benchmark/
>and
>Google double conversion.
>
>Kenneth
>
>On 05/01/15 14:37, Barry Haddow wrote:
>> What about the util directory?
>>
>> On 1 May 2015 19:13:26 BST, Hieu Hoang <hieuhoang@gmail.com> wrote:
>>
>> i suppose everything should reference the moses lib.
>>
>> that's getting a bit bloated, one day we should look at splitting
>it up
>>
>> On 30/04/2015 10:24, Jeroen Vermeulen wrote:
>>
>> Any chance we could re-unify the gzfilebuf and
>InputFileStream
>> modules?
>> Looks like we're carrying around 4 copies of each, and
>they're
>> starting
>> to diverge.
>>
>> I'd be happy to make the change, if we know a good reusable
>> place to put it.
>>
>>
>> Jeroen
>>
>------------------------------------------------------------------------
>>
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>
>>
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>_______________________________________________
>Moses-support mailing list
>Moses-support@mit.edu
>http://mailman.mit.edu/mailman/listinfo/moses-support
I don't have the code at hand but I'm fairly sure I saw it being used for reading.
Maybe Boost has an equivalent that we can drop in?
Jeroen
------------------------------
Message: 4
Date: Sun, 03 May 2015 07:08:34 -0400
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Duplicated source files
To: Jeroen Vermeulen <jtv@precisiontranslationtools.com>,
moses-support@mit.edu
Message-ID: <554601B2.1040700@kheafield.com>
Content-Type: text/plain; charset=utf-8
http://www.boost.org/doc/libs/1_57_0/libs/iostreams/doc/classes/gzip.html (best
to scroll to the bottom first).
On 05/03/2015 04:08 AM, Jeroen Vermeulen wrote:
> On May 2, 2015 2:58:08 AM GMT+07:00, Kenneth Heafield <moses@kheafield.com> wrote:
>> If this comment accurate that gzfilebuf is only used for writing?
>>
>> /** wrapper around gzip input stream. Unknown parentage
>> * @todo replace with boost version - output stream already uses it
>> */
>>
>> If so I'll just extend util/fake_ofstream.hh to have gzip support.
>>
>> Time to print a bunch of integers:
>>
>> FakeOFStream:
>>
>> real 0m3.460s
>> user 0m3.459s
>> sys 0m0.004s
>>
>> std::cout
>>
>> real 0m23.010s
>> user 0m22.895s
>> sys 0m0.134s
>>
>> Time to print a bunch of floats:
>>
>> FakeOFStream:
>>
>> real 0m34.871s
>> user 0m34.894s
>> sys 0m0.006s
>>
>> std::cout
>>
>> real 1m56.628s
>> user 1m56.690s
>> sys 0m0.037s
>>
>> The conversion is done by https://github.com/miloyip/itoa-benchmark/
>> and
>> Google double conversion.
>>
>> Kenneth
>>
>> On 05/01/15 14:37, Barry Haddow wrote:
>>> What about the util directory?
>>>
>>> On 1 May 2015 19:13:26 BST, Hieu Hoang <hieuhoang@gmail.com> wrote:
>>>
>>> i suppose everything should reference the moses lib.
>>>
>>> that's getting a bit bloated, one day we should look at splitting
>> it up
>>>
>>> On 30/04/2015 10:24, Jeroen Vermeulen wrote:
>>>
>>> Any chance we could re-unify the gzfilebuf and
>> InputFileStream
>>> modules?
>>> Looks like we're carrying around 4 copies of each, and
>> they're
>>> starting
>>> to diverge.
>>>
>>> I'd be happy to make the change, if we know a good reusable
>>> place to put it.
>>>
>>>
>>> Jeroen
>>>
>> ------------------------------------------------------------------------
>>>
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> --
>>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>>
>>>
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> I don't have the code at hand but I'm fairly sure I saw it being used for reading.
>
> Maybe Boost has an equivalent that we can drop in?
>
>
> Jeroen
>
------------------------------
Message: 5
Date: Sun, 3 May 2015 17:51:01 +0200
From: liling tan <alvations@gmail.com>
Subject: [Moses-support] 12-gram language model ARPA file for 16GB
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAKzPaJLMLgWQda+e=Kgo6X3spYNt-d1FeDrhwbp=NAiLK+xPAQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Dear Moses devs/users,
Does anyone have an idea how big would a 12-gram language model ARPA file
trained on 16GB of text become?
Any hints on what is the resulting size of the ARPA file?
Is there a way to measure how much space a language model take given the
training corpus size and the order of ngrams?
Regards,
Liling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150503/311033f3/attachment-0001.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 103, Issue 4
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 103, Issue 4"
Post a Comment