Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Phrase extraction breaks on unexpected format of
aligned.grow-diag-final (Maarten van Gompel)
2. Sentence mismatch error! (Arefeh Kazemi)
3. Regression, segmentation fault in mosesserver (Maarten van Gompel)
----------------------------------------------------------------------
Message: 1
Date: Mon, 06 Oct 2014 10:00:00 +0200
From: Maarten van Gompel <proycon@anaproy.nl>
Subject: [Moses-support] Phrase extraction breaks on unexpected format
of aligned.grow-diag-final
To: moses-support@mit.edu
Message-ID: <20141006080000.25569.29083@roma.anaproy.nl>
Content-Type: text/plain; charset="utf-8"
Hi,
I'm using the latest git version of moses, and it seems as if the training
pipeline got broken somehow as the format of aligned.grow-diag.final changed.
I'm invoking model-train.perl as follows:
/vol/customopt/machine-translation/src/mosesdecoder/scripts/training/train-model.perl -external-bin-dir /vol/customopt/machine-translation/bin -root-dir . --corpus train --f fr --e en --first-step 1 --last-step 9 -reordering msd-bidirectional-fe --lm 0:3:/scratch/proycon/mosestest/train.fr.lm -mgiza -mgiza-cpus 20 -cores 20 -sort-buffer-size 10G -sort-batch-size 253 -sort-compress gzip -sort-parallel 20
And it fails with warning like these on every sentence pair:
WARNING: Et is a bad alignment point in sentence 44968
T: If we do , I am sure we will be listened to .
S: Et lorsque nous serons capables de le faire , je suis s?r qu' ils nous ?couteront .
Looking into the code of 'extract', I see aligned.grow-diag-final is supposed to consist of space seperated lines with %d-%d (the alignments). But my aligned.grow-diag-final seems to be in a newer format and looks like this:
Je trouve que ce n' est pas acceptable . {##} I consider this to be unacceptable . {##} 0-0 1-1 2-1 3-2 6-4 4-5 5-5 6-5 7-5 8-6
The 'extract' program only expects the latter part. So I manually stripped the source and target sentences and left only that, and then it works. It seems something is going wrong in the training pipeline?
Regards,
--
Maarten van Gompel
Centre for Language Studies
Radboud Universiteit Nijmegen
proycon@anaproy.nl
http://proycon.anaproy.nl
http://github.com/proycon
GnuPG key: 0x1A31555C XMPP: proycon@anaproy.nl
Bitcoin: 1BRptZsKQtqRGSZ5qKbX2azbfiygHxJPsd
------------------------------
Message: 2
Date: Mon, 6 Oct 2014 03:13:57 -0700
From: Arefeh Kazemi <arefeh_kazemi@yahoo.com>
Subject: [Moses-support] Sentence mismatch error!
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<1412590437.87049.YahooMailNeo@web121702.mail.ne1.yahoo.com>
Content-Type: text/plain; charset="us-ascii"
Hi
I have re-installed moses on my system but I have a problem with giza - symmetrize step.
it gets some errors of this type:
Sentence mismatch error! Line #501714
Sentence mismatch error! Line #501715
.
.
.
Sentence mismatch error! Line #900000
all of my data files are in utf8 format and I have run moses successfully on these files before.
any suggestion to fix the problem would be appreciated.
Regards
Arefeh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141006/984c8f03/attachment-0001.htm
------------------------------
Message: 3
Date: Mon, 06 Oct 2014 17:48:26 +0200
From: Maarten van Gompel <proycon@anaproy.nl>
Subject: [Moses-support] Regression, segmentation fault in mosesserver
To: moses-support@mit.edu
Message-ID: <20141006154826.18931.65296@roma.anaproy.nl>
Content-Type: text/plain; charset="utf-8"
Hi,
A bug appeared in mosesserver that used not to be there in an older version.
I'm running the latest git version. I get a segmentation fault which appears
underministically (underlying memory issue perhaps?). The bug only appears with
the server and not with the normal moses.
I start Mosesserver as follows:
mosesserver --server-port 8080 -xml-input inclusive -f ep7os12-mosesbaseline/fallback.moses.ini -n-best-list ep7os12-mosesbaseline/nbest.txt 25
Then I provide Moses input with XML markup, only one small L1 fragment in L2 context
is to be translated. Moses is trained to translate English to German:
<w translation="Oft">Oft</w><wall/><w translation="gibt">gibt</w><wall/><w translation="es">es</w><wall/>various<wall/>reasons<wall/><w translation="f?r">f?r</w><wall/><w translation="das">das</w><wall/><w translation="Dilemma">Dilemma</w><wall/><w translation=".">.</w><wall/>
This often goes well for a few sentences but then breaks. Here's a gdb trace of
when it fails:
[contrib/server/mosesserver.cpp:708] Listening on port 8080
[contrib/server/mosesserver.cpp:234] Input: <w translation="Oft">Oft</w><wall/><w translation="gibt">gibt</w><wall/><w translation="es">es</w><wall/>various<wall/>reasons<wall/><w translation="f?r">f?r</w><wall/><w translation="das">das</w><wall/><w translation="Dilemma">Dilemma</w><wall/><w translation=".">.</w><wall/>
Translating: Oft gibt es various reasons f?r das Dilemma .
Line 0: Collecting options took 0.000134339 seconds at moses/Manager.cpp:110
Line 0: Search took 0.00144825 seconds
[contrib/server/mosesserver.cpp:340] Output: Oft gibt es verschiedene Gr?nde f?r das Dilemma .
[Thread 0x7ffff7ff1300 (LWP 29516) exited]
[New Thread 0x7ffff7ff1300 (LWP 29559)]
[contrib/server/mosesserver.cpp:234] Input: <w translation="Oft">Oft</w><wall/><w translation="gibt">gibt</w><wall/><w translation="es">es</w><wall/>various<wall/>reasons<wall/><w translation="f?r">f?r</w><wall/><w translation="das">das</w><wall/><w translation="Dilemma">Dilemma</w><wall/><w translation=".">.</w><wall/>
Translating: Oft gibt es various reasons f?r das Dilemma .
Line 0: Collecting options took 0.000134644 seconds at moses/Manager.cpp:110
Line 0: Search took 0.00143984 seconds
[contrib/server/mosesserver.cpp:340] Output: Oft gibt es verschiedene Gr?nde f?r das Dilemma .
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe97ec700 (LWP 27461)]
0x000000000055f9c7 in Moses::ThreadPool::Execute (this=0x22fc5f28) at moses/ThreadPool.cpp:59
59 if (task->DeleteAfterExecution()) {
(gdb) bt
#0 0x000000000055f9c7 in Moses::ThreadPool::Execute (this=0x22fc5f28) at moses/ThreadPool.cpp:59
#1 0x00000000006aa964 in thread_proxy ()
#2 0x00007ffff73a7e9a in start_thread (arg=0x7fffe97ec700) at pthread_create.c:308
#3 0x00007ffff648e31d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4 0x0000000000000000 in ?? ()
You see in this example I passed the same input twice, the first time it went fine, and the second time it segfaulted. In a prior version all was fine.
I'm hoping somebody more familiar with the Moses codebase has an idea what might be wrong?
Thanks in advance,
(PS: I cross-posted this on github-issues: https://github.com/moses-smt/mosesdecoder/issues/76)
--
Maarten van Gompel
Centre for Language Studies
Radboud Universiteit Nijmegen
proycon@anaproy.nl
http://proycon.anaproy.nl
http://github.com/proycon
GnuPG key: 0x1A31555C XMPP: proycon@anaproy.nl
Bitcoin: 1BRptZsKQtqRGSZ5qKbX2azbfiygHxJPsd
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 96, Issue 5
********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 96, Issue 5"
Post a Comment