Moses-support Digest, Vol 100, Issue 87

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: My phrase-table.tgz is 20-bytes long (????????? ???????)
2. Re: My phrase-table.tgz is 20-bytes long (????????? ???????)
3. Re: My phrase-table.tgz is 20-bytes long (Barry Haddow)


----------------------------------------------------------------------

Message: 1
Date: Wed, 25 Feb 2015 13:32:06 +0800
From: ????????? ??????? <deadyaga@gmail.com>
Subject: Re: [Moses-support] My phrase-table.tgz is 20-bytes long
To: moses-support@mit.edu
Message-ID:
<CAOAX5pnMcHokCoTgV3wRgmOovi9NQfXZMrTxtbL1yvVedXtP1w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Ok, I've started from scratch. I'm pretty sure that I worked with corpus
such a way:

1. I tokenized the initial corpuses with tokenizer.perl. Learned numbers of
lines caused any errors and warnings
2. Deleted these lines from both files using sed
3. Tokenized the files again. No errors
5. Created truecase-model and truecases the files.
6. Deleted too long lines by using clean-corpus-n.perl 1 50

Started translation model creation process by:

nohup nice /opt/moses/scripts/training/train-model.perl --parallel -mgiza
-mgiza-cpus 40 -cores 40 -root-dir train -corpus ~/corpus/ru-en.clean -f ru
-e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm
0:3:$HOME/lm/ru-en.arpa.en:8 -external-bin-dir /opt/moses/mgiza >&
training.out &

After ten days of waiting I have 20-bytes long phraze-table.tgz again! What
I'm doing wrong?

I have both ru-en and en-ru A3.final.gz files, aligned-grow-diag-final.and,
lex.e2f, lex.f2e of quite good size, but empty phrase-table,
extract.*.sorted.gz and reordering table.

I'm still having no idea what and why goes wrong:(

2015-02-14 21:54 GMT+07:00 Kenneth Heafield <moses@kheafield.com>:

> Sign my petition to add return code checking to train-model.perl.
>
> On 02/14/2015 09:33 AM, Tom Hoar wrote:
> > An empty phrase-table.gz file is usually the result of an ill-prepared
> > training corpus. Make sure you run the final corpus through
> > clean-corpus-n.perl.
> >
> >
> >
> > On 02/14/2015 09:19 PM, ????????? ??????? wrote:
> >> Hello, everybody!
> >>
> >> I have a problem with moses. I created big parallel corpus by
> >> concatenating a bunch of existing corpuses on
> >> http://opus.lingfil.uu.se. After that I cleaned up results (while
> >> creating tokens script reported some errors. I deleted error-prone
> >> rows from both of parts).
> >>
> >> Then I started to train translation model using mgiza with such an
> >> executable:
> >>
> >> nohup nice /opt/moses/scripts/training/train-model.perl --parallel
> >> -mgiza -mgiza-cpus 20 -cores 20 -root-dir train -corpus
> >> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and
> >> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8
> >> -external-bin-dir /opt/moses/mgiza >& training.out &
> >>
> >> After a week of work I have this in the end of training.out:
> >> (7) learn reordering model @ Sun Feb 8 15:30:35 MSK 2015
> >> (7.1) [no factors] learn reordering model @ Sun Feb 8 15:30:35 MSK 2015
> >> (7.2) building tables @ Sun Feb 8 15:30:35 MSK 2015
> >> Executing: /opt/moses/scripts/../bin/lexical-reordering-score
> >> /home/adminadmin/working/train/model/extract.o.sorted.gz 0.5
> >> /home/adminadmin/working/train/model/reordering-table. --model "wbe
> >> msd wbe-msd-bidirectional-fe"
> >> Lexical Reordering Scorer
> >> scores lexical reordering models of several types (hierarchical,
> >> phrase-based and word-based-extraction
> >> (8) learn generation model @ Sun Feb 8 15:30:35 MSK 2015
> >> no generation model requested, skipping step
> >> (9) create moses.ini @ Sun Feb 8 15:30:35 MSK 2015
> >>
> >> There is a bunch of files in ~/working/train folder. Looks like
> >> everything is ok, except the tiny problem: phrase-table.tgz has size
> >> of 20 bytes. And, of course, it's not usable at all!
> >>
> >> Can somebody help and give me a direction where to dig?
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150225/fbbb0b73/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 25 Feb 2015 13:37:27 +0800
From: ????????? ??????? <deadyaga@gmail.com>
Subject: Re: [Moses-support] My phrase-table.tgz is 20-bytes long
To: moses-support@mit.edu
Message-ID:
<CAOAX5pkbBXD_ciNkCB40KGwcx87+Q5k9DQ+t+BSFODXR8sW+1Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Parsing of log gave me this warnings:

WARNING: DIFFERENT SUMS: (1) (1.15031)
WARNING: DIFFERENT SUMS: (1) (1.18892)
WARNING: Model2 viterbi alignment has zero score.
Here are the different elements that made this alignment probability zero

And this strange piece:
(4) generate lexical translation table 0-0 @ Sun Feb 22 03:07:38 MSK 2015
(/home/adminadmin/corpus/ru-en.clean.ru
,/home/adminadmin/corpus/ru-en.clean.en,/home/adminadmin/working/train/model/lex)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!...There
are TONS of exclamations marks.
Saved: /home/adminadmin/working/train/model/lex.f2e and
/home/adminadmin/working/train/model/lex.e2f
FILE: /home/adminadmin/corpus/ru-en.clean.en

What does it mean?



2015-02-25 12:32 GMT+07:00 ????????? ??????? <deadyaga@gmail.com>:

> Ok, I've started from scratch. I'm pretty sure that I worked with corpus
> such a way:
>
> 1. I tokenized the initial corpuses with tokenizer.perl. Learned numbers
> of lines caused any errors and warnings
> 2. Deleted these lines from both files using sed
> 3. Tokenized the files again. No errors
> 5. Created truecase-model and truecases the files.
> 6. Deleted too long lines by using clean-corpus-n.perl 1 50
>
> Started translation model creation process by:
>
> nohup nice /opt/moses/scripts/training/train-model.perl --parallel -mgiza
> -mgiza-cpus 40 -cores 40 -root-dir train -corpus ~/corpus/ru-en.clean -f ru
> -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm
> 0:3:$HOME/lm/ru-en.arpa.en:8 -external-bin-dir /opt/moses/mgiza >&
> training.out &
>
> After ten days of waiting I have 20-bytes long phraze-table.tgz again!
> What I'm doing wrong?
>
> I have both ru-en and en-ru A3.final.gz files,
> aligned-grow-diag-final.and, lex.e2f, lex.f2e of quite good size, but empty
> phrase-table, extract.*.sorted.gz and reordering table.
>
> I'm still having no idea what and why goes wrong:(
>
> 2015-02-14 21:54 GMT+07:00 Kenneth Heafield <moses@kheafield.com>:
>
>> Sign my petition to add return code checking to train-model.perl.
>>
>> On 02/14/2015 09:33 AM, Tom Hoar wrote:
>> > An empty phrase-table.gz file is usually the result of an ill-prepared
>> > training corpus. Make sure you run the final corpus through
>> > clean-corpus-n.perl.
>> >
>> >
>> >
>> > On 02/14/2015 09:19 PM, ????????? ??????? wrote:
>> >> Hello, everybody!
>> >>
>> >> I have a problem with moses. I created big parallel corpus by
>> >> concatenating a bunch of existing corpuses on
>> >> http://opus.lingfil.uu.se. After that I cleaned up results (while
>> >> creating tokens script reported some errors. I deleted error-prone
>> >> rows from both of parts).
>> >>
>> >> Then I started to train translation model using mgiza with such an
>> >> executable:
>> >>
>> >> nohup nice /opt/moses/scripts/training/train-model.perl --parallel
>> >> -mgiza -mgiza-cpus 20 -cores 20 -root-dir train -corpus
>> >> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and
>> >> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8
>> >> -external-bin-dir /opt/moses/mgiza >& training.out &
>> >>
>> >> After a week of work I have this in the end of training.out:
>> >> (7) learn reordering model @ Sun Feb 8 15:30:35 MSK 2015
>> >> (7.1) [no factors] learn reordering model @ Sun Feb 8 15:30:35 MSK
>> 2015
>> >> (7.2) building tables @ Sun Feb 8 15:30:35 MSK 2015
>> >> Executing: /opt/moses/scripts/../bin/lexical-reordering-score
>> >> /home/adminadmin/working/train/model/extract.o.sorted.gz 0.5
>> >> /home/adminadmin/working/train/model/reordering-table. --model "wbe
>> >> msd wbe-msd-bidirectional-fe"
>> >> Lexical Reordering Scorer
>> >> scores lexical reordering models of several types (hierarchical,
>> >> phrase-based and word-based-extraction
>> >> (8) learn generation model @ Sun Feb 8 15:30:35 MSK 2015
>> >> no generation model requested, skipping step
>> >> (9) create moses.ini @ Sun Feb 8 15:30:35 MSK 2015
>> >>
>> >> There is a bunch of files in ~/working/train folder. Looks like
>> >> everything is ok, except the tiny problem: phrase-table.tgz has size
>> >> of 20 bytes. And, of course, it's not usable at all!
>> >>
>> >> Can somebody help and give me a direction where to dig?
>> >>
>> >>
>> >> _______________________________________________
>> >> Moses-support mailing list
>> >> Moses-support@mit.edu
>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150225/c344a995/attachment-0001.htm

------------------------------

Message: 3
Date: Wed, 25 Feb 2015 10:19:57 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] My phrase-table.tgz is 20-bytes long
To: ????????? ??????? <deadyaga@gmail.com>, moses-support@mit.edu
Message-ID: <54EDA1CD.9060103@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hi Alexander,

It looks like something went wrong at the extract stage. If you could
make your training.out available then we can look for clues.

Could the system have run out of disk space, either in the working
directory or in /tmp? A lot of space is required to build the extract
files and phrase tables.

cheers - Barry

On 25/02/15 05:32, ????????? ??????? wrote:
> Ok, I've started from scratch. I'm pretty sure that I worked with
> corpus such a way:
>
> 1. I tokenized the initial corpuses with tokenizer.perl. Learned
> numbers of lines caused any errors and warnings
> 2. Deleted these lines from both files using sed
> 3. Tokenized the files again. No errors
> 5. Created truecase-model and truecases the files.
> 6. Deleted too long lines by using clean-corpus-n.perl 1 50
>
> Started translation model creation process by:
>
> nohup nice /opt/moses/scripts/training/train-model.perl --parallel
> -mgiza -mgiza-cpus 40 -cores 40 -root-dir train -corpus
> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and
> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8
> -external-bin-dir /opt/moses/mgiza >& training.out &
>
> After ten days of waiting I have 20-bytes long phraze-table.tgz again!
> What I'm doing wrong?
>
> I have both ru-en and en-ru A3.final.gz files,
> aligned-grow-diag-final.and, lex.e2f, lex.f2e of quite good size, but
> empty phrase-table, extract.*.sorted.gz and reordering table.
>
> I'm still having no idea what and why goes wrong:(
>
> 2015-02-14 21:54 GMT+07:00 Kenneth Heafield <moses@kheafield.com
> <mailto:moses@kheafield.com>>:
>
> Sign my petition to add return code checking to train-model.perl.
>
> On 02/14/2015 09:33 AM, Tom Hoar wrote:
> > An empty phrase-table.gz file is usually the result of an
> ill-prepared
> > training corpus. Make sure you run the final corpus through
> > clean-corpus-n.perl.
> >
> >
> >
> > On 02/14/2015 09:19 PM, ????????? ??????? wrote:
> >> Hello, everybody!
> >>
> >> I have a problem with moses. I created big parallel corpus by
> >> concatenating a bunch of existing corpuses on
> >> http://opus.lingfil.uu.se. After that I cleaned up results (while
> >> creating tokens script reported some errors. I deleted error-prone
> >> rows from both of parts).
> >>
> >> Then I started to train translation model using mgiza with such an
> >> executable:
> >>
> >> nohup nice /opt/moses/scripts/training/train-model.perl --parallel
> >> -mgiza -mgiza-cpus 20 -cores 20 -root-dir train -corpus
> >> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and
> >> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8
> >> -external-bin-dir /opt/moses/mgiza >& training.out &
> >>
> >> After a week of work I have this in the end of training.out:
> >> (7) learn reordering model @ Sun Feb 8 15:30:35 MSK 2015
> >> (7.1) [no factors] learn reordering model @ Sun Feb 8 15:30:35
> MSK 2015
> >> (7.2) building tables @ Sun Feb 8 15:30:35 MSK 2015
> >> Executing: /opt/moses/scripts/../bin/lexical-reordering-score
> >> /home/adminadmin/working/train/model/extract.o.sorted.gz 0.5
> >> /home/adminadmin/working/train/model/reordering-table. --model "wbe
> >> msd wbe-msd-bidirectional-fe"
> >> Lexical Reordering Scorer
> >> scores lexical reordering models of several types (hierarchical,
> >> phrase-based and word-based-extraction
> >> (8) learn generation model @ Sun Feb 8 15:30:35 MSK 2015
> >> no generation model requested, skipping step
> >> (9) create moses.ini @ Sun Feb 8 15:30:35 MSK 2015
> >>
> >> There is a bunch of files in ~/working/train folder. Looks like
> >> everything is ok, except the tiny problem: phrase-table.tgz has
> size
> >> of 20 bytes. And, of course, it's not usable at all!
> >>
> >> Can somebody help and give me a direction where to dig?
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 100, Issue 87
**********************************************

0 Response to "Moses-support Digest, Vol 100, Issue 87"

Post a Comment