Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: missing .gz files at the end of mgiza (Tom Hoar)
----------------------------------------------------------------------
Message: 1
Date: Tue, 24 Feb 2015 17:53:40 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] missing .gz files at the end of mgiza
To: moses-support@mit.edu
Message-ID: <54EC5834.30004@precisiontranslationtools.com>
Content-Type: text/plain; charset="windows-1252"
My experience is these warnings are non-fatal. I.e. they do not
typically cause missing .gz word alignment files. Rather, they indicate
a high degree of complexity in the parallel corpus.
Are you still getting zero-length (20 bytes) .gz alignment files?
On 02/24/2015 04:35 PM, Vito Mandorino wrote:
> Thank you, I did have indeed a bad ratio in a previous try but not in
> this one. I have launched the clean-corpus.perl script right before
> the training.
> In the log output there are actually several lines with warnings such as:
>
> PROBLEM: alignment is 0.
> WARNING: Hill Climbing yielded a zero score viterbi alignment for the
> following pair:
> WARNING: Model2 viterbi alignment has zero score.
> Fert[50] selected WARNING: Model2 viterbi alignment has zero score.
> WARNING: already 41 iterations in hillclimb: 3.24683 2 26 29
> WARNING: DIFFERENT SUMS: (1) (3.13379)
>
> Vito
>
>
> 2015-02-20 19:18 GMT+01:00 Tom Hoar
> <tahoar@precisiontranslationtools.com
> <mailto:tahoar@precisiontranslationtools.com>>:
>
> Fatal errors during step 2 are normally traceable to poor corpus
> preparation. Termination, however, does not always happen
> immediately. Look through the entire log output. You'll probably
> find one of these errors:
>
> "WARNING: The following sentence pair has source/target sentence
> length ration more than"
>
> or "ERROR: Forbidden zero sentence length 0"
>
> or a line beginning with "ERROR:"
>
> The fact that your corpus has placeholders makes me suspect you
> probably have a bad ratio.
>
>
>
>
>
> On 02/20/2015 06:18 PM, Vito Mandorino wrote:
>> Dear All,
>>
>> I am training a model with placeholders from French to English
>> and the process ends before the end of training step 2, without
>> creating the en-fr.AR.final.gz and fr-en.AR.final.gz files in
>> the respective folders giza.en-fr and giza.fr-en .
>>
>> I cannot understand why. Here's the last 7 lines of the
>> training.out file:
>>
>>
>> Entire Viterbi H333444 Training took: 78424 seconds
>> ==========================================================
>>
>> Entire Training took: 134751 seconds
>> Program Finished at: Fri Feb 20 05:45:19 2015
>>
>> ==========================================================
>>
>>
>> and here's the command used for training:
>>
>> nohup nice
>> /home/Moses/mosesdecoder/scripts/training/train-model.perl \
>> --parallel \
>> -mgiza -mgiza-cpus 20 \
>> -root-dir /home/BRIQUES/train-leclerc-light-ph-fren/training \
>> -external-bin-dir /root/external-bin-dir/ \
>> -corpus
>> /home/BRIQUES/train-leclerc-light-ph-enfr/data/Corpus.leclerc-ph.cleanclean
>> -f fr -e en \
>> -alignment grow-diag-final-and -reordering msd-bidirectional-fe \
>> -lm 0:5:/home/DATA/LM/RapAnSoc/lm.RapAnSoc5-ph.blm.en.mm:8
>> <http://lm.RapAnSoc5-ph.blm.en.mm:8> \
>> -write-lexical-counts \
>> -extract-options '--Placeholders @num@,@rom@,@alpha@' \
>> >&
>> /home/BRIQUES/train-leclerc-light-ph-fren/training/training.out &
>>
>>
>> It seems a bit odd to me that the analogous training in the
>> reversed English-French direction has worked nicely.
>> In fact, in this case the folders giza.en-fr and giza.fr-en do
>> contain not only the .gz files but also two more "chunks"
>> en-fr.A3.final.partD and en-fr.A3.final.partE, whereas the
>> French-English training stops at fr-en.A3.final.partC.
>>
>>
>> Thank you,
>>
>> Vito Mandorino
>>
>> --
>>
>> Description : Description : lingua_custodia_final full logo
>>
>> */The Translation Trustee/*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
>> <tel:%2B33%206%2084%2065%2068%2089>*
>>
>> *Email :****vito.mandorino@linguacustodia.com
>> <mailto:massinissa.ahmim@linguacustodia.com>***
>>
>> *Website :****www.linguacustodia.com
>> <http://www.linguacustodia.com/> - www.thetranslationtrustee.com
>> <http://www.thetranslationtrustee.com/>*
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
> Description : Description : lingua_custodia_final full logo
>
> */The Translation Trustee/*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89*
>
> *Email :****vito.mandorino@linguacustodia.com
> <mailto:massinissa.ahmim@linguacustodia.com>***
>
> *Website :****www.linguacustodia.com
> <http://www.linguacustodia.com/> - www.thetranslationtrustee.com
> <http://www.thetranslationtrustee.com/>*
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150224/0cf5dc76/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150224/0cf5dc76/attachment.jpg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150224/0cf5dc76/attachment-0001.jpg
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 100, Issue 81
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 100, Issue 81"
Post a Comment