Moses-support Digest, Vol 110, Issue 2

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Exit code: 127 ERROR: Can't generate symmetrized alignment
file (Read, James C)
2. Training script documentation (Read, James C)
3. Best way to mark unknowns in nbest-list (Jeremy Gwinnup)

----------------------------------------------------------------------

Message: 1
Date: Wed, 2 Dec 2015 15:21:01 +0000
From: "Read, James C" <jcread@essex.ac.uk>
Subject: [Moses-support] Exit code: 127 ERROR: Can't generate
symmetrized alignment file
To: Moses Support <moses-support@mit.edu>
Message-ID:
<AM4PR06MB14744EF494C871F1F606D8C9850E0@AM4PR06MB1474.eurprd06.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

nohup nice /media/bigdata/jcread/3rd_party_software/mosesdecoder/scripts/training/train-model.perl -root-dir phrase_table -corpus /media/bigdata/jcread/llv/data/europarlv7/prealigned/tokenized_truecased_cleaned/1-0010/00001000/europarl-v7.it-en.1-0010.00001000 -f it -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/media/bigdata/jcread/llv/lm:8 -external-bin-dir /media/bigdata/jcread/3rd_party_software/bin >& training.out &

Runs well for a while and then bombs out with following output and Error 127

(3) generate word alignment @ Wed Dec 2 01:56:06 GMT 2015
Combining forward and inverted alignment from files:
/media/bigdata/jcread/llv/data/europarlv7/prealigned/tokenized_truecased_cleaned/1-0010/00001000/phrase_table/giza.it-en/it-en.A3.final.{bz2,gz}
/media/bigdata/jcread/llv/data/europarlv7/prealigned/tokenized_truecased_cleaned/1-0010/00001000/phrase_table/giza.en-it/en-it.A3.final.{bz2,gz}
Executing: mkdir -p /media/bigdata/jcread/llv/data/europarlv7/prealigned/tokenized_truecased_cleaned/1-0010/00001000/phrase_table/model
Executing: /media/bigdata/jcread/3rd_party_software/mosesdecoder/scripts/training/giza2bal.pl -d "gzip -cd /media/bigdata/jcread/llv/data/europarlv7/prealigned/tokenized_truecased_cleaned/1-0010/00001000/phrase_table/giza.en-it/en-it.A3.final.gz" -i "gzip -cd /media/bigdata/jcread/llv/data/europarlv7/prealigned/tokenized_truecased_cleaned/1-0010/00001000/phrase_table/giza.it-en/it-en.A3.final.gz" |/media/bigdata/jcread/3rd_party_software/mosesdecoder/scripts/../bin/symal -alignment="grow" -diagonal="yes" -final="yes" -both="yes" > /media/bigdata/jcread/llv/data/europarlv7/prealigned/tokenized_truecased_cleaned/1-0010/00001000/phrase_table/model/aligned.grow-diag-final-and
sh: 1: /media/bigdata/jcread/3rd_party_software/mosesdecoder/scripts/../bin/symal: not found
Exit code: 127
ERROR: Can't generate symmetrized alignment file

It seems this problem with the script has been encountered before:

http://comments.gmane.org/gmane.comp.nlp.moses.user/10489

I'm not sure I understand the accepted solution.

"Use absolute paths to all the scripts, and make sure your parallel files have the same names but the extension"

The command I issued uses only absolute paths. Is this referring to modifications in the training script itself?

James

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151202/d5f87bef/attachment-0001.html

------------------------------

Message: 2
Date: Wed, 2 Dec 2015 15:28:00 +0000
From: "Read, James C" <jcread@essex.ac.uk>
Subject: [Moses-support] Training script documentation
To: Moses Support <moses-support@mit.edu>
Message-ID:
<AM4PR06MB1474E963E74C2BC485D89308850E0@AM4PR06MB1474.eurprd06.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

In the past I've never been able to get the training script to run to completion without rigorously following the instructions here http://www.statmt.org/moses/?n=moses.baseline

1) Tokenise

2) Train truecaser

3) Truecase

4) Clean

What if somebody wants to just tokenize and clean without truecasing or just clean without tokenizing? Why should the script bomb out? Is this something to do with formats required by early stages of the training process?

James

NOTE: This is not an open invitation to discuss why somebody would want to train models without tokenzing or truecasing. This is nothing more than a request for technical assistance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151202/7a14b60d/attachment-0001.html

------------------------------

Message: 3
Date: Wed, 2 Dec 2015 11:36:21 -0500
From: Jeremy Gwinnup <jeremy@gwinnup.org>
Subject: [Moses-support] Best way to mark unknowns in nbest-list
To: moses-support@mit.edu
Message-ID: <19CE3AFD-3789-4621-BCE5-62B794AFDCAE@gwinnup.org>
Content-Type: text/plain; charset=utf-8

Hi,

I?d like to be able to mark unknown words in nbest lists - where is a good place to dig into the code so that it works with both phrase-based and chart decoding?

Thanks!
-Jeremy

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 110, Issue 2
*********************************************

Moses-support Digest, Vol 110, Issue 2

0 Response to "Moses-support Digest, Vol 110, Issue 2"

Post a Comment