Moses-support Digest, Vol 85, Issue 3

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. 10 years of OPUS (Jorg Tiedemann)
2. kind request (Arththika Paramanathan)
3. -lm training parameter (Read, James C)
4. Re: -lm training parameter (Tom Hoar)


----------------------------------------------------------------------

Message: 1
Date: Sat, 2 Nov 2013 19:54:38 +0100
From: Jorg Tiedemann <tiedeman@gmail.com>
Subject: [Moses-support] 10 years of OPUS
To: "moses-support@MIT.EDU" <moses-support@mit.edu>
Message-ID: <56B5E5C5-0FDD-4444-A530-0A216C1D086F@gmail.com>
Content-Type: text/plain; charset=iso-8859-1


After attending the 20-years-of-bitext workshop at EMNLP I suddenly realized that OPUS (http://opus.lingfil.uu.se) also has its 10-years anniversary this year (send me some champagne if you like). I will celebrate this anniversary by sending out this e-mail with some recent news and highlights.

OPUS is a growing collection of parallel corpora for many languages and various domains. The collection becomes pretty big and includes a variety of data sets and tools that are not only useful for statistical machine translation. OPUS has been extended a lot since its first appearance in 2003. Actually the best birthday present would be if anyone would decide to start a mirror of OPUS. Let me know if you are interested.


Here some of the highlights:

- over 150 languages and language variants
- over 5 billion aligned translation units
- downloads in XML/XCES, plain text (Moses/SMT) and TMX
- raw, tokenized and machine-annotated data
- monolingual data sets (for language modeling)
- search interfaces


Some recent news and data sets:

- EUbookshop: a large but noisy corpus (converted from PDF)
- Tatoeba: a small but clean corpus with many languages
- OpenSubtitles2012: an improved version of the 2011 version
- coming soon: OpenSubtitles2013 - an extension of OpenSubtitles2012
- UN, MultiUN, Europarl v7: aligned for all language combinations
- word alignments and phrase tables for the majority of bitexts


The Web Site: http://opus.lingfil.uu.se
More information: http://opus.lingfil.uu.se/trac/wiki

Feedback is very welcome!
And, be nice to our server!


J?rg Tiedemann
tiedeman@gmail.com







------------------------------

Message: 2
Date: Sun, 3 Nov 2013 10:07:13 +0530
From: Arththika Paramanathan <arthiparamanathan@gmail.com>
Subject: [Moses-support] kind request
To: moses-support@mit.edu
Message-ID:
<CAJSfqEze66FqrxE1hcoX5uLikeX4SzQ_Cg9T5WEsCzv7u2fEEA@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,
Thanks a lot sir for your kind response. I installed moses on ubuntu. I
tried Experiment Management System (EMS) & faced an issue in decoding
stage. Could you please help me to fix this? I attached the file.

Thank you

--
regards,
P.Arththika
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131103/7411bd04/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: error
Type: application/octet-stream
Size: 2370 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20131103/7411bd04/attachment-0001.obj

------------------------------

Message: 3
Date: Sun, 3 Nov 2013 11:03:36 +0000
From: "Read, James C" <jcread@essex.ac.uk>
Subject: [Moses-support] -lm training parameter
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<F00840E41983C645928E21E3C35F4EB1012CF34FB2@mbx1-node2.essex.ac.uk>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

does anybody know what the effect of the -lm training parameter in the training script is? Surely the LM used has no effect on typical training tasks like word alignment and phrase scoring?

thanks,
James



------------------------------

Message: 4
Date: Sun, 03 Nov 2013 20:03:33 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] -lm training parameter
To: moses-support@mit.edu
Message-ID: <527649A5.4010701@precisiontranslationtools.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

You are correct that train-model.perl script does not use the -lm
parameter through any of the word alignment or phrase scoring steps. The
script's step 9 builds a template moses.ini configuration file and
includes the values from the -lm parameter. At the beginning, the script
checks that the -lm value points to a non-zero length file. If the file
is missing or is zero length, the script halts.



On 11/03/2013 06:03 PM, Read, James C wrote:
> Hi,
>
> does anybody know what the effect of the -lm training parameter in the training script is? Surely the LM used has no effect on typical training tasks like word alignment and phrase scoring?
>
> thanks,
> James
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 85, Issue 3
********************************************

0 Response to "Moses-support Digest, Vol 85, Issue 3"

Post a Comment