Moses-support Digest, Vol 100, Issue 47

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. I sign the train-model.perl petition... and more. (Tom Hoar)
2. OT: CasMaCat: Xliff file import failed (Jon Olds)


----------------------------------------------------------------------

Message: 1
Date: Sat, 14 Feb 2015 23:00:37 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: [Moses-support] I sign the train-model.perl petition... and
more.
To: Kenneth Heafield <moses@kheafield.com>, moses-support
<moses-support@mit.edu>
Message-ID: <54DF7125.7030806@precisiontranslationtools.com>
Content-Type: text/plain; charset=utf-8; format=flowed

Ken,

I'd sign your petition, but we're going further. We're now working on a
significant upgrade to train-model.perl (and soon mert-moses.pl) to run
on native Win64 (dev/testing on Strawberry Perl 64).

It's far enough along that train-model.perl properly handles drive
letters, OS path separators (slash vs back-slash), auto-appends ".exe"
extensions to binaries, etc. We've replaced most system calls like `rm`,
`wc`, `time`, `cat`, etc with native Perl code and continue the work. It
relies on the Moses binaries in the current relative paths & (M)GIZA++
binaries in the -external-bin-dir path. The only non-Moses external
binaries will be sort (gsort), split (gsplit), gzip, and bzcat (unless
others pop up). We're testing with 32-bit Gow binaries (anyone have the
64-bit binaries?). Like Posix, these will have to be in the Win64 system
path. I'm not sure what we'll do to manage symlinks. Suggestions welcome.

To complement the Perl work, Jeroen is updating phrase-based Moses code
(maybe phrase-factored, too) to run on Native Win64 (including lmplz and
query). In the end, the entire train/tune/translate tools chain will run
on native Win64 without Cygwin.

Back to your petition idea. In this upgrade, our original plan included
adding return-code checking and pass-through (I was researching that as
your message came in). We're adding formatted log 'print' statements
immediately after the close() statements (or equivalents) to report the
full paths of each step's final output files. Between return codes and
screen-scrapers, any wrapper should be able detect success/fail of any
step. We've also done a general clean up (e.g. normalizing indentations
with 4 spaces). The goal: maintain the existing Posix (Linux/Mac)
use-case, add the native Win64 use-case AND improve reliability when run
from wrappers -- all without changing the current business rules/ SMT
functions.

We've changed the Perl script names: `train-model.perl` to
`train-model-x.perl` and `mert-moses.pl` to `mert-moses-x.pl` ("x" for
"cross-platform"). We'll add them to the trunk when they're ready (early
March?). Hopefully, enough people will test and validate them to be
reliable and robust. Maybe they can replace the current scripts? As of
today, steps 3, 4 and 9 are fully tested on Strawberry Win64 with Gow on
WinXP64 and Wine64.

Comments, requests, volunteers from the general Moses community are welcome.



On 02/14/2015 09:54 PM, Kenneth Heafield wrote:
> Sign my petition to add return code checking to train-model.perl.
>
> On 02/14/2015 09:33 AM, Tom Hoar wrote:
>> An empty phrase-table.gz file is usually the result of an ill-prepared
>> training corpus. Make sure you run the final corpus through
>> clean-corpus-n.perl.
>>
>>
>>
>> On 02/14/2015 09:19 PM, ????????? ??????? wrote:
>>> Hello, everybody!
>>>
>>> I have a problem with moses. I created big parallel corpus by
>>> concatenating a bunch of existing corpuses on
>>> http://opus.lingfil.uu.se. After that I cleaned up results (while
>>> creating tokens script reported some errors. I deleted error-prone
>>> rows from both of parts).
>>>
>>> Then I started to train translation model using mgiza with such an
>>> executable:
>>>
>>> nohup nice /opt/moses/scripts/training/train-model.perl --parallel
>>> -mgiza -mgiza-cpus 20 -cores 20 -root-dir train -corpus
>>> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and
>>> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8
>>> -external-bin-dir /opt/moses/mgiza >& training.out &
>>>
>>> After a week of work I have this in the end of training.out:
>>> (7) learn reordering model @ Sun Feb 8 15:30:35 MSK 2015
>>> (7.1) [no factors] learn reordering model @ Sun Feb 8 15:30:35 MSK 2015
>>> (7.2) building tables @ Sun Feb 8 15:30:35 MSK 2015
>>> Executing: /opt/moses/scripts/../bin/lexical-reordering-score
>>> /home/adminadmin/working/train/model/extract.o.sorted.gz 0.5
>>> /home/adminadmin/working/train/model/reordering-table. --model "wbe
>>> msd wbe-msd-bidirectional-fe"
>>> Lexical Reordering Scorer
>>> scores lexical reordering models of several types (hierarchical,
>>> phrase-based and word-based-extraction
>>> (8) learn generation model @ Sun Feb 8 15:30:35 MSK 2015
>>> no generation model requested, skipping step
>>> (9) create moses.ini @ Sun Feb 8 15:30:35 MSK 2015
>>>
>>> There is a bunch of files in ~/working/train folder. Looks like
>>> everything is ok, except the tiny problem: phrase-table.tgz has size
>>> of 20 bytes. And, of course, it's not usable at all!
>>>
>>> Can somebody help and give me a direction where to dig?
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



------------------------------

Message: 2
Date: Sat, 14 Feb 2015 16:49:42 +0000
From: Jon Olds <joft_uk@yahoo.co.uk>
Subject: [Moses-support] OT: CasMaCat: Xliff file import failed
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <54DF7CA6.4040702@yahoo.co.uk>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi,

Apologies, if this is the wrong place for this question, but there has
been some talk about CasMaCat here.

I wanted to test it out using some of my own TMs, so I converted them to
Xliff and tried to import them.

I am able to import the Xliff file into the translation interface, but I
can?t get anywhere with "Building a new prototype".

I?ve got four entries in the Corpora box, all entitled unnamed, but all
with 0 segments. I am not able to delete any of them, and when I click
on Upload to try out another file, nothing seems to happen.

Any ideas?

Cheers,

Jon




------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 100, Issue 47
**********************************************

0 Response to "Moses-support Digest, Vol 100, Issue 47"

Post a Comment