Moses-support Digest, Vol 85, Issue 7

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: 10 years of OPUS (Philipp Koehn)
2. Re: gappy phrases (Nadir Durrani) (Nadir Durrani)


----------------------------------------------------------------------

Message: 1
Date: Mon, 4 Nov 2013 10:57:31 -0500
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] 10 years of OPUS
To: Jorg Tiedemann <tiedeman@gmail.com>
Cc: "moses-support@MIT.EDU" <moses-support@mit.edu>
Message-ID:
<CAAFADDC_WrEaOR4dCESPV3WQP-mmAhj0-sAGt0Mf+vgth=ipEQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi Joerg,

congratulations on the 10 years - this is really a great service to
the community!

Regards,
Philipp

On Sat, Nov 2, 2013 at 2:54 PM, Jorg Tiedemann <tiedeman@gmail.com> wrote:
>
> After attending the 20-years-of-bitext workshop at EMNLP I suddenly realized that OPUS (http://opus.lingfil.uu.se) also has its 10-years anniversary this year (send me some champagne if you like). I will celebrate this anniversary by sending out this e-mail with some recent news and highlights.
>
> OPUS is a growing collection of parallel corpora for many languages and various domains. The collection becomes pretty big and includes a variety of data sets and tools that are not only useful for statistical machine translation. OPUS has been extended a lot since its first appearance in 2003. Actually the best birthday present would be if anyone would decide to start a mirror of OPUS. Let me know if you are interested.
>
>
> Here some of the highlights:
>
> - over 150 languages and language variants
> - over 5 billion aligned translation units
> - downloads in XML/XCES, plain text (Moses/SMT) and TMX
> - raw, tokenized and machine-annotated data
> - monolingual data sets (for language modeling)
> - search interfaces
>
>
> Some recent news and data sets:
>
> - EUbookshop: a large but noisy corpus (converted from PDF)
> - Tatoeba: a small but clean corpus with many languages
> - OpenSubtitles2012: an improved version of the 2011 version
> - coming soon: OpenSubtitles2013 - an extension of OpenSubtitles2012
> - UN, MultiUN, Europarl v7: aligned for all language combinations
> - word alignments and phrase tables for the majority of bitexts
>
>
> The Web Site: http://opus.lingfil.uu.se
> More information: http://opus.lingfil.uu.se/trac/wiki
>
> Feedback is very welcome!
> And, be nice to our server!
>
>
> J?rg Tiedemann
> tiedeman@gmail.com
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



------------------------------

Message: 2
Date: Mon, 4 Nov 2013 16:02:39 +0000
From: Nadir Durrani <nadir.durrani@nu.edu.pk>
Subject: Re: [Moses-support] gappy phrases (Nadir Durrani)
To: moses-support@mit.edu
Message-ID:
<CAFDj2Q18Aw4KWh0yPsDFTEC9dgYs-4Mj-80JmYMMazzVpUJ6hQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

The recent version of OSM-decoder from LMU-Munich uses discontinuous
source-side phrases. We used it in this year's WMT campaign. Details
on phrase extraction can be looked at in

http://www.statmt.org/wmt13/pdf/WMT13.pdf

It gives improvements although not consistently which I suppose is
also true for discontinuous Phrasal.





On Mon, Nov 4, 2013 at 2:50 PM, <moses-support-request@mit.edu> wrote:
> Send Moses-support mailing list submissions to
> moses-support@mit.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mailman.mit.edu/mailman/listinfo/moses-support
> or, via email, send a message with subject or body 'help' to
> moses-support-request@mit.edu
>
> You can reach the person managing the list at
> moses-support-owner@mit.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Moses-support digest..."
>
>
> Today's Topics:
>
> 1. Release 1.0 details (Tom Hoar)
> 2. Re: gappy phrases (Matthias Huck)
> 3. Re: -lm training parameter (John D. Burger)
> 4. Re: Release 1.0 details (Hieu Hoang)
> 5. Re: Syntax model in source side (burak ayd?n)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 04 Nov 2013 21:12:37 +0700
> From: Tom Hoar <tahoar@precisiontranslationtools.com>
> Subject: [Moses-support] Release 1.0 details
> To: Moses-Support <moses-support@mit.edu>
> Message-ID: <5277AB55.60607@precisiontranslationtools.com>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Where can I find the options that were used to compile the release 1.0
> binaries and training tools? A complete list would be nice, but
> specifically, I'm looking into whether the distributed Moses binary
> includes --with-xmlrpc-c. I suspect not, because the mosesserver binary
> is missing from the bin folder.
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 04 Nov 2013 14:39:35 +0000
> From: Matthias Huck <mhuck@inf.ed.ac.uk>
> Subject: Re: [Moses-support] gappy phrases
> To: moses-support@mit.edu
> Message-ID: <1383575975.20373.84.camel@portedgar>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi,
>
> RWTH Aachen University implemented extraction of discontinuous phrases
> and decoding with source-side gaps in the Jane toolkit
> [www.hltpr.rwth-aachen.de/jane/].
> We did not see any clear improvements over standard phrase-based setups
> in our experiments, though.
>
> Some results were published in PBML:
>
> M. Huck, E. Scharw?chter, and H. Ney. Source-Side Discontinuous Phrases
> for Machine Translation: A Comparative Study on Phrase Extraction and
> Search. The Prague Bulletin of Mathematical Linguistics, number 99,
> pages 17-38, Prague, Czech Republic, April 2013.
> http://www.hltpr.rwth-aachen.de/publications/download/848/Huck-PBML-2013.pdf
>
> The Jane Hiero implementation yields better translation quality on
> Chinese-English. But note that RWTH did not modify Jane's phrase-based
> decoder to support target-side gaps.
>
> I would be very much interested in seeing whether other groups than
> Stanford achieve encouraging results with discontinuous phrases in their
> toolkits.
>
> Erik Scharw?chter wrote most of the code related to discontinuous
> phrases in the Jane toolkit as part of his Bachelor's thesis. I don't
> know how you define a "massive undertaking", but an excellent
> undergraduate student can obviously implement it, run some experiments
> and write a thesis about it within a limited amount of time.
>
> Cheers,
> Matthias
>
>
>
> On Sun, 2013-11-03 at 20:34 -0800, Kenneth Heafield wrote:
>> Hi,
>>
>> I'll throw in the anecdote that gappy phrases are currently not in use
>> at Stanford. My predecessor told me that it took a lot longer and only
>> improved BLEU slightly on Chinese-English. But it's also possible that
>> something didn't get passed down correctly from Michel to my predecessor
>> to me. . .
>>
>> Kenneth
>>
>> On 11/03/13 14:18, Read, James C wrote:
>> > My understanding is that they used a similar approach as the grammar extraction to extract the gappy phrases. Would it be a massive undertaking to get Moses to support this?
>> >
>> > James
>> > ________________________________________
>> > From: Barry Haddow [bhaddow@staffmail.ed.ac.uk]
>> > Sent: 30 October 2013 09:26
>> > To: Read, James C
>> > Cc: moses-support@mit.edu
>> > Subject: Re: [Moses-support] gappy phrases
>> >
>> > No, but it does support hiero and syntax models.
>> >
>> > On 29/10/13 22:23, Read, James C wrote:
>> >> Hi,
>> >>
>> >> does anybody know if Moses supports gappy phrases http://www-nlp.stanford.edu/pubs/naacl10-discontinuous_phrases.pdf
>> >>
>> >> James
>> >>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 4 Nov 2013 09:41:46 -0500
> From: "John D. Burger" <john@mitre.org>
> Subject: Re: [Moses-support] -lm training parameter
> To: Moses-support <moses-support@mit.edu>
> Message-ID: <C450EAA6-8AFC-4F0A-986E-D033DD02DFEE@mitre.org>
> Content-Type: text/plain; charset=us-ascii
>
> We've done something like this in the past. The fact that the check for a non-empty LM happens at the very beginning is somewhat annoying if you have a setup that builds the phrase models and language models in parallel, for instance on a cluster.
>
> - JB
>
> On Nov 4, 2013, at 07:48 , Tom Hoar wrote:
>
>> Yes, on both counts. You can edit the moses.ini file to change to a
>> different LM. Editing the train-model.perl script should work. We take a
>> different approach. We create a temporary /tmp/placeholder.lm before
>> running the script and then remove it afterwards. We then regex the
>> pattern and change the moses.ini file to any LM we want.
>>
>>
>> On 11/04/2013 04:57 AM, Read, James C wrote:
>>> Thanks.
>>>
>>> So if you wanted to train and at a later date use a different LM with the already trained TM would it just be a simple case of manually editing moses.ini?
>>>
>>> If I were to edit the training script to skip the check that LM file exists (it doesn't) it wouldn't break anything would it?
>>>
>>> James
>>>
>>> ________________________________________
>>> From:moses-support-bounces@mit.edu [moses-support-bounces@mit.edu] on behalf of Tom Hoar [tahoar@precisiontranslationtools.com]
>>> Sent: 03 November 2013 13:03
>>> To:moses-support@mit.edu
>>> Subject: Re: [Moses-support] -lm training parameter
>>>
>>> You are correct that train-model.perl script does not use the -lm
>>> parameter through any of the word alignment or phrase scoring steps. The
>>> script's step 9 builds a template moses.ini configuration file and
>>> includes the values from the -lm parameter. At the beginning, the script
>>> checks that the -lm value points to a non-zero length file. If the file
>>> is missing or is zero length, the script halts.
>>>
>>>
>>>
>>> On 11/03/2013 06:03 PM, Read, James C wrote:
>>>> Hi,
>>>>
>>>> does anybody know what the effect of the -lm training parameter in the training script is? Surely the LM used has no effect on typical training tasks like word alignment and phrase scoring?
>>>>
>>>> thanks,
>>>> James
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 4 Nov 2013 14:46:02 +0000
> From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
> Subject: Re: [Moses-support] Release 1.0 details
> To: Tom Hoar <tahoar@precisiontranslationtools.com>
> Cc: Moses-Support <moses-support@mit.edu>
> Message-ID:
> <CAEKMkbhSQWBOhfDRS_B3zOTY5PofDM5Z=eHsCO8HKkYJYwOYag@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Sorry, i didn't write it down. They were compiled with IRSTLM (and KenLM),
> but not SRILM. I don't usually compile mosesserver, so the command would be
> something like:
> nohup ./bjam --with-irstlm=/home/hieu/workspace/irstlm/trunk/
>
> I'll try & remember to document it more throughly in the next round
>
>
> On 4 November 2013 14:12, Tom Hoar <tahoar@precisiontranslationtools.com>wrote:
>
>> Where can I find the options that were used to compile the release 1.0
>> binaries and training tools? A complete list would be nice, but
>> specifically, I'm looking into whether the distributed Moses binary
>> includes --with-xmlrpc-c. I suspect not, because the mosesserver binary
>> is missing from the bin folder.
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131104/27cc6026/attachment-0001.htm
>
> ------------------------------
>
> Message: 5
> Date: Mon, 4 Nov 2013 16:50:24 +0200
> From: burak ayd?n <baydinx@gmail.com>
> Subject: Re: [Moses-support] Syntax model in source side
> To: moses-support@mit.edu
> Message-ID:
> <CAH+r-SLRhr5TyOG3p3qAAiuW4b4+WHPDMKqpF1zSyunZBBhBAQ@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-9"
>
> Hi everyone,
>
> I want to use Collins parser while translating from En. I checked the
> sample ems configs and applied it. The experiment did not crash or get any
> error, but bleu scores were dramatically low, implying that there must be
> something wrong. Here the additional parameters for sytnax with Collins' :
>
> #syntactic parsers
> input-parser = "$moses-script-dir/training/wrappers/parse-en-collins.perl
> -collins /usr/local/smt/COLLINS-PARSER -mxpost /usr/local/smt/MXPOST/ "
>
> #training options
> training-options = "-mgiza -mgiza-cpus 4 -sort-buffer-size 8G
> -sort-compress gzip -sort-parallel 4 -cores 4 -source-syntax"
>
> Do I need additional parameters except the ones above? I would appreciate
> any help.
>
> Thanks
>
>
> 2013/11/4 burak ayd?n <baydinx@gmail.com>
>
>> Hi everyone,
>>
>> I want to use Collins parser while translating from En. I checked the
>> sample ems configs and applied it. The experiment did not crash or get any
>> error, but bleu scores were dramatically low, implying that there must be
>> something wrong. Here the additional parameters for sytnax with Collins' :
>>
>> #syntactic parsers
>> input-parser = "$moses-script-dir/training/wrappers/parse-en-collins.perl
>> -collins /usr/local/smt/COLLINS-PARSER -mxpost /usr/local/smt/MXPOST/ "
>>
>> #training options
>> training-options = "-mgiza -mgiza-cpus 4 -sort-buffer-size 8G
>> -sort-compress gzip -sort-parallel 4 -cores 4 -source-syntax"
>>
>> Do I need additional parameters except the ones above? I would appreciate
>> any help.
>>
>> Thanks
>> Burak
>>
>>
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131104/15521e14/attachment.htm
>
> ------------------------------
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> End of Moses-support Digest, Vol 85, Issue 6
> ********************************************


------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 85, Issue 7
********************************************

0 Response to "Moses-support Digest, Vol 85, Issue 7"

Post a Comment