Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Windows newline support (Tom Hoar)
2. Re: Windows newline support (Kenneth Heafield)
3. Re: Windows newline support (Hieu Hoang)
4. Cochrane job opportunity: Translation System Developer,
location flexible, application deadline 12 February (Barry Haddow)
----------------------------------------------------------------------
Message: 1
Date: Thu, 29 Jan 2015 03:00:19 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: [Moses-support] Windows newline support
To: moses-support@mit.edu
Message-ID: <54C93FD3.4080603@precisiontranslationtools.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Native Moses components (MGIZA++, lmplz, train-model.perl, mert-moses.pl
and other scripts/binaries) currently limit the training corpora
(parallel and LM) to Posix newline (\n) only. Is this a legacy of Posix
origins and/or a matter of limited resources to update the system to
support both?
Is there some reason why they should NOT be updated to allow Windows
newline (\r\n)? Would anyone object if we do the work and contribute
transparently support that allows Linux or Windows newline?
------------------------------
Message: 2
Date: Wed, 28 Jan 2015 15:42:46 -0500
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Windows newline support
To: moses-support@mit.edu
Message-ID: <54C949C6.7080606@kheafield.com>
Content-Type: text/plain; charset=windows-1252
lmplz works with windows newline as documented in:
http://kheafield.com/code/kenlm/estimation/
Words are delimited by any number of '\0', '\t' '\r', and ' '. UNIX
newline ('\n') delimits lines (but note that DOS files will work because
'\r' will be treated as a word delimiter and ignored at the end of a line).
Can't we just treat this (and window's love for BOM) as a preprocessor
issue in the tokenizer?
On 01/28/15 15:00, Tom Hoar wrote:
> Native Moses components (MGIZA++, lmplz, train-model.perl, mert-moses.pl
> and other scripts/binaries) currently limit the training corpora
> (parallel and LM) to Posix newline (\n) only. Is this a legacy of Posix
> origins and/or a matter of limited resources to update the system to
> support both?
>
> Is there some reason why they should NOT be updated to allow Windows
> newline (\r\n)? Would anyone object if we do the work and contribute
> transparently support that allows Linux or Windows newline?
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
------------------------------
Message: 3
Date: Wed, 28 Jan 2015 21:55:43 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Windows newline support
To: Tom Hoar <tahoar@precisiontranslationtools.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbgqWPOsshgx1cvBVoevQS-kA1qJ6p9+xH+4JYLqg7yTpw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
i would definitely go with ken's idea and do it as a preprocessing step
inside of tokenizer, escaping special character script etc. By happy
coincidence, there's a new c++ tokenizer which i hope we will migrate to in
future
contrib/c++tokenizer
I would be wary of changing anything else like mgiza or train-model.perl.
On 28 January 2015 at 20:00, Tom Hoar <tahoar@precisiontranslationtools.com>
wrote:
> Native Moses components (MGIZA++, lmplz, train-model.perl, mert-moses.pl
> and other scripts/binaries) currently limit the training corpora
> (parallel and LM) to Posix newline (\n) only. Is this a legacy of Posix
> origins and/or a matter of limited resources to update the system to
> support both?
>
> Is there some reason why they should NOT be updated to allow Windows
> newline (\r\n)? Would anyone object if we do the work and contribute
> transparently support that allows Linux or Windows newline?
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150128/dc564e72/attachment-0001.htm
------------------------------
Message: 4
Date: Wed, 28 Jan 2015 22:08:49 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: [Moses-support] Cochrane job opportunity: Translation System
Developer, location flexible, application deadline 12 February
To: moses-support@MIT.EDU
Message-ID: <54C95DF1.2070107@staffmail.ed.ac.uk>
Content-Type: text/plain; charset="utf-8"
Hi All
Below is a job advertisement from the Cochrane Collaboration that may
be of interest,
cheers - Barry
----
Cochrane is currently recruiting for a Translation System Developer to
join our Informatics and Knowledge Management Department for 1 year, and
mainly work on developing and maintaining our translation technology
infrastructure. A detailed job description can be found here:
http://www.cochrane.org/news/tags/central-executive-team/translation-system-developer-cochrane-ikmd-flexible-location
Many thanks and kind regards,
Juliane
*Juliane Ried ? Translations Co-ordinator*
CEO?s Office ? Cochrane Central Executive
Cochrane:/Trusted evidence. Informed decisions. Better health.
/www.cochrane.org <http://www.cochrane.org/> ?www.thecochranelibrary.com
<http://www.thecochranelibrary.com/>
CC20-logo-horizontal-rgb-large 1
C/O German Cochrane Centre
Medical Center ? University of Freiburg ? Berliner Allee 29 ? 79110 ?
Freiburg ? Germany
T: +49 761 203 97644 ? E: juliane.ried@cochrane.org
<mailto:juliane.ried@cochrane.org> ? Skype: juliane.ried
Twitter: @CochraneLingual ? LinkedIn: Juliane Ried
Subscribe to translation updates here:
http://lists.cochrane.org/mailman/listinfo/translations
This e-mail contains information intended for the addressee only. If you
receive this e-mail in error, please contact the sender and delete the
original from your system. The Cochrane Central Executive, nor the
German Cochrane Centre, can guarantee that any attachments to this
e-mail are free of software viruses, and we recommend that you check for
viruses before opening any attachments. *Disclaimer:*
http://www.cochrane.org/docs/email_disclaimer.htm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150128/03924cb3/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 5213 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150128/03924cb3/attachment.jpg
-------------- next part --------------
_______________________________________________
Himl-project mailing list
Himl-project@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/himl-project
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 99, Issue 65
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 99, Issue 65"
Post a Comment