Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Problems by training Moses (Ricardo Cabello S?nchez)
2. Re: Mosesserver fails with "core dumped"
(Ertu?rul YILMAZ (B?LGEM))
----------------------------------------------------------------------
Message: 1
Date: Thu, 27 Mar 2014 18:26:23 +0100
From: Ricardo Cabello S?nchez
<ricardo.cabello.sanchez@googlemail.com>
Subject: [Moses-support] Problems by training Moses
To: moses-support@mit.edu
Message-ID:
<CAJxWzkYoQCVhokZY7QtpyNPm_DHzHYwztukOGvUi0nSShAfd=g@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Dear all,
I'm new in this mailing list. Sorry if I am asking something very obvious
and easy but I'm pretty new in this field.
I'm starting working with Moses for my PhD in computational linguistics and
now, once I think installation went ok, I am trying to train Moses with
provided default corpora. Process crashes in the early preparation data
step. I have been solving other errors but now I think I would need help
with this. I get these errors (the whole process pasted below) that I can
solve.
Could you please help?
<<<
ricardo@ricardo-Satellite-L40:~/mosesdecoder/primera_vez$ ./experiment.perl
-config config_primera_vez.toy -exec
STARTING UP AS PROCESS 4267 ON ricardo-Satellite-L40 AT dom mar 23 12:34:43
CET 2014
LOAD CONFIG...
working directory is /home/ricardo/mosesdecoder/primera_vez
running experimenal run number 17
ESTABLISH WHICH STEPS NEED TO BE RUN
FIND DEPENDENCIES BETWEEN STEPS
CHECKING IF OLD STEPS ARE RE-USABLE
STEP SUMMARY:
59 CORPUS:toy:tokenize -> re-using (1)
58 CORPUS:toy:clean -> re-using (1)
54 CORPUS:toy:truecase -> re-using (1)
49 TRUECASER:consolidate -> re-using (1)
48 TRUECASER:train -> re-using (1)
47 LM:toy:tokenize -> re-using (1)
45 LM:toy:truecase -> re-using (1)
43 LM:toy:train -> re-using (1)
40 LM:toy:binarize -> re-using (1)
39 TRAINING:consolidate -> re-using (1)
38 TRAINING:prepare-data -> run
37 TRAINING:run-giza -> run
36 TRAINING:run-giza-inverse -> run
35 TRAINING:symmetrize-giza -> run
34 TRAINING:build-lex-trans -> run
31 TRAINING:extract-phrases -> run
30 TRAINING:build-reordering -> run
29 TRAINING:build-ttable -> run
26 TRAINING:create-config -> run
24 TUNING:apply-weights -> run
23 EVALUATION:test:input-from-sgm -> re-using (1)
22 EVALUATION:test:tokenize-input -> re-using (1)
17 EVALUATION:test:truecase-input -> re-using (1)
15 EVALUATION:test:filter -> run
14 EVALUATION:test:apply-filter -> run
13 EVALUATION:test:decode -> run
12 EVALUATION:test:remove-markup -> run
10 EVALUATION:test:detruecase-output -> run
9 EVALUATION:test:detokenize-output -> run
8 EVALUATION:test:wrap -> run
7 EVALUATION:test:reference-from-sgm -> re-using (1)
6 EVALUATION:test:tokenize-reference -> re-using (1)
4 EVALUATION:test:nist-bleu -> run
3 EVALUATION:test:nist-bleu-c -> run
2 EVALUATION:test:analysis -> run
1 EVALUATION:test:analysis-coverage -> run
0 REPORTING:report -> run
convert: iCCP: profile 'default_rgb.icc': 0h: PCS illuminant is not D50
`/tmp/magick-4283knebIQusILAg1' @
warning/png.c/MagickPNGWarningHandler/1830.
Warning: Cannot convert string
"-*-Helvetica-Medium-R-Normal--*-140-*-*-P-*-ISO8859-1" to type FontStruct
Warning: Cannot convert string
"-*-Helvetica-Medium-R-Normal--*-120-*-*-P-*-ISO8859-1" to type FontStruct
Warning: Cannot convert string
"-*-Helvetica-Medium-R-Normal--*-100-*-*-P-*-ISO8859-1" to type FontStruct
Warning: Cannot convert string
"-*-Helvetica-Bold-R-Normal--*-120-*-*-P-*-ISO8859-1" to type FontStruct
convert: iCCP: profile 'default_rgb.icc': 0h: PCS illuminant is not D50
`/tmp/magick-4292U5L_nYYBb8pG1' @
warning/png.c/MagickPNGWarningHandler/1830.
EXECUTE STEPS
number of steps doable or running: 1 at dom mar 23 12:34:45 CET 2014
doable: TRAINING:prepare-data
executing
/home/ricardo/mosesdecoder/primera_vez/steps/17/TRAINING_prepare-data.17
via sh (1 active)
convert: iCCP: profile 'default_rgb.icc': 0h: PCS illuminant is not D50
`/tmp/magick-4307QgvlrdxUKXXc1' @
warning/png.c/MagickPNGWarningHandler/1830.
step TRAINING:prepare-data crashed
number of steps doable or running: 0 at dom mar 23 12:34:54 CET 2014
convert: iCCP: profile 'default_rgb.icc': 0h: PCS illuminant is not D50
`/tmp/magick-4333yx8taQhVp75g1' @
warning/png.c/MagickPNGWarningHandler/1830.
>>>
Thank you and best regards,
Ricardo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140327/e28bc17a/attachment-0001.htm
------------------------------
Message: 2
Date: Fri, 28 Mar 2014 16:33:27 +0200 (EET)
From: Ertu?rul YILMAZ (B?LGEM) <yilmaz.ertugrul@tubitak.gov.tr>
Subject: Re: [Moses-support] Mosesserver fails with "core dumped"
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Cc: Hieu Hoang <hieu.hoang@ed.ac.uk>, moses-support
<moses-support@mit.edu>
Message-ID:
<1587674969.104942206.1396017207139.JavaMail.zimbra@tubitak.gov.tr>
Content-Type: text/plain; charset="utf-8"
Hi,
You were right, running the latest Moses has revealed that it is exactly what the problem was. The corpus I used was morphologically tokenized prior to MT training and had a bunch of phrases with square brackets in them. Removing those entries from the table fixed the problem. Now I should go back and fix the tokenization to also escape those special characters.
Thank you both for your help.
--
Ertugrul
----- Orijinal Mesaj -----
Kimden: "Barry Haddow" <bhaddow@staffmail.ed.ac.uk>
Kime: "Ertu?rul YILMAZ (B?LGEM)" <yilmaz.ertugrul@tubitak.gov.tr>, "Hieu Hoang" <Hieu.Hoang@ed.ac.uk>
Kk: "moses-support" <moses-support@mit.edu>
G?nderilenler: 27 Mart Per?embe 2014 16:03:52
Konu: Re: [Moses-support] Mosesserver fails with "core dumped"
Hi
>From this message:
> Check nextPos != string::npos failed in moses/Phrase.cpp:214
It looks as though there is a format error in your phrase table, such as
a stray pipe (|) or square bracket. Did you use the Moses tokeniser to
prepare the corpus? It will escape the Moses reserved characters. If you
can use a more recent version of Moses then it may give more
information. My version of the source code has this error message at
about the same place:
UTIL_THROW_IF2(nextPos == string::npos,
"Incorrect formatting of non-terminal. Should have 2
non-terms, eg. [X][X]. "
<< "Current string: " << annotatedWord);
cheers - Barry
On 27/03/14 13:25, Ertu?rul YILMAZ (B?LGEM) wrote:
> Hi,
>
> Well, I get the same failure with command line moses too :(
> The interesting thing is that, I could use the exact same moses.ini
> with the only difference of using a smaller phrase table filtered for
> an evaluation set of ours (only changing the line for
> PhraseDictionaryMemory) and both moses and mosesserver works fine.
> The one that works is about 12MB compressed and the one that fails is
> about 596MB compressed, would there be some sort of a setting to play
> with to work with larger phrase tables?
>
> I do have four scores in the translation table (system was trained
> with one of latest versions of Moses).
>
> Thanks,
>
> ---
> /usr/local/smt/moses_2013_11/bin/moses -threads 8 -dl 5
> -minimum-bayes-risk -v 0 -f moses.tuned.ini.5.mod
> /usr/local/smt/moses_2013_11/bin
> line=UnknownWordPenalty
> line=WordPenalty
> line=PhrasePenalty
> line=PhraseDictionaryMemory name=TranslationModel0 table-limit=20
> num-features=4
> path=/media/pala12/burak/aren/ems2/model/phrase-table-sigtest-filter.5.gz
> input-factor=0 output-factor=0
> line=LexicalReordering name=LexicalReordering0 num-features=6
> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
> path=/media/pala12/burak/aren/ems2/model/reordering-table-sigtest-filter.5.wbe-msd-bidirectional-fe.gz
> Initializing LexicalReordering..
> line=Distortion
> line=KENLM lazyken=0 name=LM0 factor=0
> path=/media/pala12/burak/aren/ems2/lm/nist12.binlm.1 order=4
> line=KENLM lazyken=0 name=LM1 factor=0
> path=/media/pala12/burak/aren/ems2/lm/isi.binlm.1 order=4
> line=KENLM lazyken=0 name=LM2 factor=0
> path=/media/pala12/burak/aren/ems2/lm/un.binlm.1 order=4
> line=KENLM lazyken=0 name=LM3 factor=0
> path=/media/pala06/ilknur/pala05/gigaword_lm/gigaword5gr.tcafteraps3.irstlm.bin
> order=5
> Loading table into memory...done.
> Start loading text SCFG phrase table. Moses format : [235.000] seconds
> Check nextPos != string::npos failed in moses/Phrase.cpp:214
> Aborted (core dumped)
>
>
>
> ------------------------------------------------------------------------
> *Kimden: *"Hieu Hoang" <Hieu.Hoang@ed.ac.uk>
> *Kime: *"Ertu?rul YILMAZ (B?LGEM)" <yilmaz.ertugrul@tubitak.gov.tr>
> *Kk: *"moses-support" <moses-support@mit.edu>
> *G?nderilenler: *27 Mart Per?embe 2014 13:55:53
> *Konu: *Re: [Moses-support] Mosesserver fails with "core dumped"
>
> Does the ini file work when you use the command line moses?
>
> How many scores are there for each rule in the translation table?
> There should be 4 but if you had trained it with Moses v.1 then there
> will be 5.
>
> In that case, just delete the last score, which is just a constant 2.718
>
>
> On 26 March 2014 10:11, Ertu?rul YILMAZ (B?LGEM)
> <yilmaz.ertugrul@tubitak.gov.tr
> <mailto:yilmaz.ertugrul@tubitak.gov.tr>> wrote:
>
> Hi All,
>
> I am working on running a phrase-based MT system with mosesserver
> resulting in the attached failure.
> I was able to run the same system with binarized phrase and
> reordering tables, but it fails to run them in memory.
>
> Did any of you run into a similar problem or have any ideas on the
> resolution for this? Attaching my moses.ini also.
>
> Thanks,
>
> ertugrul@pala01$ /usr/local/smt/moses_2013_11/bin/mosesserver
> -threads 8 -dl 5 -minimum-bayes-risk -v 0 -f moses.tuned.ini.5.mod
> /usr/local/smt/moses_2013_11/bin
> line=UnknownWordPenalty
> line=WordPenalty
> line=PhrasePenalty
> line=PhraseDictionaryMemory name=TranslationModel0 table-limit=20
> num-features=4
> path=/media/pala12/burak/aren/ems2/model/phrase-table-sigtest-filter.5.gz
> input-factor=0 output-factor=0
> line=LexicalReordering name=LexicalReordering0 num-features=6
> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
> path=/media/pala12/burak/aren/ems2/model/reordering-table-sigtest-filter.5.wbe-msd-bidirectional-fe.gz
> Initializing LexicalReordering..
> line=Distortion
> line=KENLM lazyken=0 name=LM0 factor=0
> path=/media/pala12/burak/aren/ems2/lm/nist12.binlm.1 order=4
> line=KENLM lazyken=0 name=LM1 factor=0
> path=/media/pala12/burak/aren/ems2/lm/isi.binlm.1 order=4
> line=KENLM lazyken=0 name=LM2 factor=0
> path=/media/pala12/burak/aren/ems2/lm/un.binlm.1 order=4
> line=KENLM lazyken=0 name=LM3 factor=0
> path=/media/pala06/ilknur/pala05/gigaword_lm/gigaword5gr.tcafteraps3.irstlm.bin
> order=5
> Loading table into memory...done.
> Start loading text SCFG phrase table. Moses format : [233] seconds
> Check nextPos != string::npos failed in moses/Phrase.cpp:214
> Aborted (core dumped)
>
>
> *----------------------------------------------------------
> *
> *Ertu?rul YILMAZ*
> Senior Researcher
>
> Speech and Natural Language Processing Group
> T?B?TAK B?LGEM
> 41470 GEBZE / KOCAEL?, TURKEY
> T +90262 648 1000 <callto:262%20648%201000> - 2216
> F +90262 648 1100 <callto:262%20648%201100>
> www.bilgem.tubitak.gov.tr <http://www.bilgem.tubitak.gov.tr>
> yilmaz.ertugrul@tubitak.gov.tr <http://www.bilgem.tubitak.gov.tr/>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140328/499e3b8e/attachment.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 89, Issue 73
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 89, Issue 73"
Post a Comment