Moses-support Digest, Vol 99, Issue 66

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Windows newline support (Tom Hoar)
2. GIZA++ default options (Read, James C)
3. Binarised Model (Benyamin Bosari)
4. Re: Binarised Model (Marcin Junczys-Dowmunt)


----------------------------------------------------------------------

Message: 1
Date: Thu, 29 Jan 2015 08:08:29 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] Windows newline support
To: moses-support@mit.edu
Message-ID: <54C9880D.70808@precisiontranslationtools.com>
Content-Type: text/plain; charset="windows-1252"

I vote against embedding it in the Perl tokenizer (legacy anyway). The
new C++ tokenizer doing the work might be an option.

For years, our tools have prepared corpus files on Windows. The final
preparation steps, however, were done on the Linux machine that runs the
Moses training/tuning steps... simply strip all \r\n and let the Linux
OS create the final files. Newlines are also not normally a problem on
Cygwin. If you use tools through Cygwin to prepare your corpus, Cygwin
creates Posix newlines.

In developing the native Windows versions, this approach won't work
anymore. For now, we will simply create a final processing step that
forces Posix newlines on corpus created on Windows. We'll also test
every step for native OS newlines support because it's possible Moses'
various tools running on Windows might convert the Posix newlines to
native Windows OS newlines. We'll see and let you know.


-------- Forwarded Message --------
Subject: Re: [Moses-support] Windows newline support
Date: Wed, 28 Jan 2015 15:42:46 -0500
From: Kenneth Heafield <moses@kheafield.com>
To: moses-support@mit.edu



lmplz works with windows newline as documented in:
http://kheafield.com/code/kenlm/estimation/

Words are delimited by any number of '\0', '\t' '\r', and ' '. UNIX
newline ('\n') delimits lines (but note that DOS files will work because
'\r' will be treated as a word delimiter and ignored at the end of a line).

Can't we just treat this (and window's love for BOM) as a preprocessor
issue in the tokenizer?


On 01/29/2015 04:55 AM, Hieu Hoang wrote:
> i would definitely go with ken's idea and do it as a preprocessing
> step inside of tokenizer, escaping special character script etc. By
> happy coincidence, there's a new c++ tokenizer which i hope we will
> migrate to in future
> contrib/c++tokenizer
> I would be wary of changing anything else like mgiza or train-model.perl.
>
>
> On 28 January 2015 at 20:00, Tom Hoar
> <tahoar@precisiontranslationtools.com
> <mailto:tahoar@precisiontranslationtools.com>> wrote:
>
> Native Moses components (MGIZA++, lmplz, train-model.perl,
> mert-moses.pl <http://mert-moses.pl>
> and other scripts/binaries) currently limit the training corpora
> (parallel and LM) to Posix newline (\n) only. Is this a legacy of
> Posix
> origins and/or a matter of limited resources to update the system to
> support both?
>
> Is there some reason why they should NOT be updated to allow Windows
> newline (\r\n)? Would anyone object if we do the work and contribute
> transparently support that allows Linux or Windows newline?
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150129/6e6998cb/attachment-0001.htm

------------------------------

Message: 2
Date: Thu, 29 Jan 2015 07:55:42 +0000
From: "Read, James C" <jcread@essex.ac.uk>
Subject: [Moses-support] GIZA++ default options
To: "Moses-support@mit.edu" <Moses-support@mit.edu>
Message-ID: <1422518145201.42433@essex.ac.uk>
Content-Type: text/plain; charset="iso-8859-1"

Hi,


does anybody know what the default options for GIZA++ are when train-model.perl is run without specifying any options?


Reading the code in the training script there are some default options listed as:


m1 => 5

m2 => 0

m3 => 3

m4 => 3


but then further down it says that if $_HMM_ALIGN? is set then m3 and m4 get set to zero.


Can anybody clarify exactly which models are run when invoking train-model.perl without passing any parameters explicitly to GIZA++?


thanks,

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150129/e3fda5c8/attachment-0001.htm

------------------------------

Message: 3
Date: Thu, 29 Jan 2015 11:09:08 +0000 (UTC)
From: Benyamin Bosari <b.bosari2010@yahoo.com>
Subject: [Moses-support] Binarised Model
To: Moses-support Support <moses-support@mit.edu>
Message-ID:
<485516381.545877.1422529748986.JavaMail.yahoo@mail.yahoo.com>
Content-Type: text/plain; charset="utf-8"

Hi guys,
I have a problem in order to binarise the phrase-table and lexicalised reordering models!
Actually when I want to run the below command, this error message appears: "bash: ~/mosesdecoder/bin/processPhraseTableMin: No such file or directory"
~/mosesdecoder/bin/processPhraseTableMin \?
-in train/model/phrase-table.gz -nscores 4 \?-out binarised-model/phrase-table
In fact, the file of processPhraseTableMin, didn't exist in mosesdecoder folder! Also I don't see the file of processLexicalTableMin! I have just processPhraseTable and processLexicalTable files in my mosesdecoder folder! ?


Could you please let me know, the point of this problem?
Regards,
Benyamin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150129/dc05217d/attachment-0001.htm

------------------------------

Message: 4
Date: Thu, 29 Jan 2015 12:32:50 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Binarised Model
To: Benyamin Bosari <b.bosari2010@yahoo.com>
Cc: Moses-support Support <moses-support@mit.edu>
Message-ID: <c4e85ac878953199e8f3fa01dffca8d8@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"



Hi Benyamin,

you need to follow the guidelines here:

http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc8

For the compact phrase table to work, moses needs to be compiled and
linked with the CMPH library which is an external dependency.

BTW: I am currently working on my own implementation of the CHD
algorithm, so we will be able to remove that dependency soon.

Best,

Marcin

W dniu 2015-01-29 12:09, Benyamin Bosari napisa?(a):

> Hi guys,
>
> I have a problem in order to binarise the phrase-table and lexicalised reordering models!
>
> Actually when I want to run the below command, this error message appears: "bash: ~/mosesdecoder/bin/processPhraseTableMin: No such file or directory"
>
> ~/mosesdecoder/bin/processPhraseTableMin
>
> -in train/model/phrase-table.gz -nscores 4
> -out binarised-model/phrase-table
>
> In fact, the file of processPhraseTableMin, didn't exist in mosesdecoder folder! Also I don't see the file of processLexicalTableMin! I have just processPhraseTable and processLexicalTable files in my mosesdecoder folder!
>
> Could you please let me know, the point of this problem?
>
> Regards,
>
> Benyamin
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support [1]



Links:
------
[1] http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150129/dc499669/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 99, Issue 66
*********************************************

0 Response to "Moses-support Digest, Vol 99, Issue 66"

Post a Comment