Moses-support Digest, Vol 115, Issue 4

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Data for building a factored model (Sa?o Kuntaric)
2. Tranliteration error (Sanjanashree Palanivel)


----------------------------------------------------------------------

Message: 1
Date: Wed, 4 May 2016 21:30:17 +0200
From: Sa?o Kuntaric <saso.kuntaric@gmail.com>
Subject: Re: [Moses-support] Data for building a factored model
To: Marwa Refaie <basmallah@hotmail.com>
Cc: moses-support@mit.edu
Message-ID:
<CANsquDosSSn=__Q9h6h96_J3TsiZ9+F5Y64zRwqKJrX-EkKqdw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello again,

I believe I can wrap my head around the theoretical part, but the English
and German corpora in the Moses factored model tutorial (
http://www.statmt.org/moses/?n=Moses.FactoredTutorial) look beautifully
factored, so my question is how were the original corpora processed? Was a
specific tagger used and was there any manual/script postprocessing done?

And since I am already bugging everyone, how is the language model pos.lm
created? Is it extracted from a file, created manually or in another way?

Thank you in advance for all the replies.

Best regards,

Sa?o

2016-05-02 19:45 GMT+02:00 Marwa Refaie <basmallah@hotmail.com>:

> Corpus for translation model should be on 2 parallel files in the format
> Word | pos | Lema .... For example , by a file for each language. You can
> prepare files using word net , Stanford , or any tagger & stemmer as can
> deal with your language pairs. May be before enter the files to moses you
> should adjust the text files by a python script (write it your self)
>
> For language model ... You must build it as follows
> Verb noun noun
> Noun Det adj
> ....... Depending on the target language only ,, Then build it as usual
> n-gram lm.
>
> Sent from my iPad
>
> > On May 2, 2016, at 10:11, Sa?o Kuntaric <saso.kuntaric@gmail.com> wrote:
> >
> > Hi all,
> >
> > I am having some issues producing the corpora in the correct format for
> Moses to execute factored training.
> >
> > I am looking at the factored tutorial on the Moses website and I am
> wondering, how to get such consistent corpora for two languages. What tools
> are being used and can they be trained for specific languages (Slovenian in
> my example). Are such tools available for download or is such data produced
> with custom scripts?
> >
> > --
> > Best regards,
> >
> > Sa?o
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>



--
lp,

Sa?o
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160504/4ecbc25b/attachment-0001.html

------------------------------

Message: 2
Date: Thu, 5 May 2016 16:24:04 +0530
From: Sanjanashree Palanivel <sanjanashree@gmail.com>
Subject: [Moses-support] Tranliteration error
To: moses-support@mit.edu
Message-ID:
<CAAc_kp69zSo0hBAkO=JyhbPPJaEi0Vix0Onqss1OYbe1Jvayqw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear All,


When I try to train transliteration i get following error, I dont
know what is missing please help.

Extracting Transliteration Pairs
> Constructing Graph
> Computing Probs : iteration 1
> Computing Probs : iteration 2
> Computing Probs : iteration 3
> Computing Probs : iteration 4
> Computing Probs : iteration 5
> Computing Probs : iteration 6
> Computing Probs : iteration 7
> Computing Probs : iteration 8
> Computing Probs : iteration 9
> Computing Probs : iteration 10
> Finished...
> Selecting Transliteration Pairs with threshold 0.5
> Name "main::hash" used only once: possible typo at
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/Transliteration/
> threshold.pl line 26.
> Preparing Corpus
> Align Corpus
> Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts
> Using multi-thread GIZA
> ERROR: Cannot find
> /home/sanjana/Documents/SMT/mosesdecoder/tools/merge_alignment.py at
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl
> line 393.
> Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts
> Using multi-thread GIZA
> ERROR: Cannot find
> /home/sanjana/Documents/SMT/mosesdecoder/tools/merge_alignment.py at
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl
> line 393.
> Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts
> Using multi-thread GIZA
> ERROR: Cannot find
> /home/sanjana/Documents/SMT/mosesdecoder/tools/merge_alignment.py at
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl
> line 393.
> Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts
> using gzip
> (3) generate word alignment @ Thu May 5 16:19:50 IST 2016
> Combining forward and inverted alignment from files:
>
> /home/sanjana/Documents/SMT/Transliteration/training/giza-inverse/en-hi.A3.final.{bz2,gz}
>
> /home/sanjana/Documents/SMT/Transliteration/training/giza/hi-en.A3.final.{bz2,gz}
> ERROR: Can't read
> /home/sanjana/Documents/SMT/Transliteration/training/giza-inverse/en-hi.A3.final.{bz2,gz}
> Train Translation Models
> Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts
> using gzip
> (4) generate lexical translation table 0-0 @ Thu May 5 16:19:50 IST 2016
>
> (/home/sanjana/Documents/SMT/Transliteration/training/corpus.en,/home/sanjana/Documents/SMT/Transliteration/training/corpus.hi,/home/sanjana/Documents/SMT/Transliteration/model/lex)
> ERROR: Can't read
> /home/sanjana/Documents/SMT/Transliteration/model/aligned.grow-diag-final-and
> at
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/LexicalTranslationModel.pm
> line 92.
> Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts
> using gzip
> (5) extract phrases @ Thu May 5 16:19:50 IST 2016
> File not found:
> /home/sanjana/Documents/SMT/Transliteration/model/aligned.grow-diag-final-and
> at
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl
> line 1609.
> Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts
> using gzip
> (6) score phrases @ Thu May 5 16:19:50 IST 2016
> (6.1) creating table half
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e @
> Thu May 5 16:19:50 IST 2016
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/generic/score-parallel.perl
> 8 "sort " /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score
> /home/sanjana/Documents/SMT/Transliteration/model/extract.sorted.gz
> /home/sanjana/Documents/SMT/Transliteration/model/lex.f2e
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gz
> --KneserNey 0
> Executing:
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/generic/score-parallel.perl
> 8 "sort " /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score
> /home/sanjana/Documents/SMT/Transliteration/model/extract.sorted.gz
> /home/sanjana/Documents/SMT/Transliteration/model/lex.f2e
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gz
> --KneserNey 0
> using gzip
> Started Thu May 5 16:19:50 2016
> gzip: /home/sanjana/Documents/SMT/Transliteration/model/extract.sorted.gz:
> No such file or directory
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score
> /home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/extract.0.gz
> /home/sanjana/Documents/SMT/Transliteration/model/lex.f2e
> /home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/phrase-table.half.0000000.gz
> --KneserNey 2>> /dev/stderr
> /home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/
> run.0.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.1.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.2.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.3.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.4.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.5.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.6.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.7.shScore
> v2.1 -- scoring methods for extracted rules
> adjusting phrase translation probabilities with Kneser Ney discounting
> Loading lexical translation table from
> /home/sanjana/Documents/SMT/Transliteration/model/lex.f2eCan't read
> /home/sanjana/Documents/SMT/Transliteration/model/lex.f2e
> mv
> /home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/phrase-table.half.0000000.gz
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gzmv:
> cannot stat
> '/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/phrase-table.half.0000000.gz':
> No such file or directory
> Exit code: 1
> ERROR: Scoring of phrases failed at
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl
> line 1773.
> (6.3) creating table half
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.e2f @
> Thu May 5 16:19:50 IST 2016
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/generic/score-parallel.perl
> 8 "sort " /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score
> /home/sanjana/Documents/SMT/Transliteration/model/extract.inv.sorted.gz
> /home/sanjana/Documents/SMT/Transliteration/model/lex.e2f
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.e2f.gz
> --Inverse --KneserNey 1
> Executing:
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/generic/score-parallel.perl
> 8 "sort " /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score
> /home/sanjana/Documents/SMT/Transliteration/model/extract.inv.sorted.gz
> /home/sanjana/Documents/SMT/Transliteration/model/lex.e2f
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.e2f.gz
> --Inverse --KneserNey 1
> using gzip
> Started Thu May 5 16:19:50 2016
> gzip:
> /home/sanjana/Documents/SMT/Transliteration/model/extract.inv.sorted.gz: No
> such file or directory
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score
> /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/extract.0.gz
> /home/sanjana/Documents/SMT/Transliteration/model/lex.e2f
> /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/phrase-table.half.0000000.gz
> --Inverse --KneserNey 2>> /dev/stderr
> /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/
> run.0.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.1.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.2.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.3.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.5.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.6.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.7.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.4.shScore
> v2.1 -- scoring methods for extracted rules
> using inverse mode
> adjusting phrase translation probabilities with Kneser Ney discounting
> Loading lexical translation table from
> /home/sanjana/Documents/SMT/Transliteration/model/lex.e2fCan't read
> /home/sanjana/Documents/SMT/Transliteration/model/lex.e2f
> gunzip -c
> /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/phrase-table.half.*.gz
> 2>> /dev/stderr| LC_ALL=C sort -T
> /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512 | gzip -c >
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.e2f.gz
> 2>> /dev/stderr gzip:
> /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/phrase-table.half.*.gz:
> No such file or directory
> rm -rf /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512
> Finished Thu May 5 16:19:50 2016
> (6.6) consolidating the two halves @ Thu May 5 16:19:50 IST 2016
> Executing:
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/consolidate
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gz
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.e2f.gz
> /dev/stdout --KneserNey
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gz.coc
> | gzip -c >
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.gz
> Consolidate v2.0 written by Philipp Koehn
> consolidating direct and indirect rule tables
> adjusting phrase translation probabilities with Kneser Ney discounting
> Can't read
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gz.coc
> Executing: rm -f
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.*
> Train Language Models
> one of required modified KneserNey count-of-counts is zero
> error in discount estimator for order 2
> while opening /home/sanjana/Documents/SMT/Transliteration/lm/targetLM
> ERROR
> Create Config File
> Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts
> using gzip
> ERROR: Language model file not found or empty:
> /home/sanjana/Documents/SMT/Transliteration/lm/targetLM.bin at
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl
> line 602.
> Running Tuning for Transliteration Module
> Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts
> using gzip
> (9) create moses.ini @ Thu May 5 16:19:50 IST 2016
> Executing: mkdir -p
> /home/sanjana/Documents/SMT/Transliteration/tuning/filtered
> Stripping XML...
> Executing:
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/../generic/strip-xml.perl
> < /home/sanjana/Documents/SMT/Transliteration/tuning/input >
> /home/sanjana/Documents/SMT/Transliteration/tuning/filtered/input.10592
> pt:PhraseDictionaryMemory name=TranslationModel0 num-features=4
> path=/home/sanjana/Documents/SMT/Transliteration/model/phrase-table
> input-factor=0 output-factor=0
> Considering factor 0
> Filtering files...
> filtering /home/sanjana/Documents/SMT/Transliteration/model/phrase-table
> ->
> /home/sanjana/Documents/SMT/Transliteration/tuning/filtered/phrase-table.0-0.1.1...
> No phrases found in
> /home/sanjana/Documents/SMT/Transliteration/model/phrase-table! at
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/
> filter-model-given-input.pl line 398.
> sh: 1: cannot open
> /home/sanjana/Documents/SMT/Transliteration/model/moses.ini: No such file
> Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts
> File not found:
> /home/sanjana/Documents/SMT/Transliteration/tuning/moses.filtered.ini
> (interpreted as
> /home/sanjana/Documents/SMT/Transliteration/tuning/moses.filtered.ini). at
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/mert-moses.pl
> line 494.
> cp: cannot stat
> ?/home/sanjana/Documents/SMT/Transliteration/tuning/tmp/moses.ini?: No such
> file or directory
> ERROR cannot open base-ini
> '/home/sanjana/Documents/SMT/Transliteration/model/moses.ini': No such file
> or directory at
> /home/sanjana/Documents/SMT/mosesdecoder/scripts/ems/support/substitute-weights.perl
> line 16.
> Training Transliteration Module - End Thu May 5 16:19:50 IST 2016
>
> --
Thanks and regards,

Sanjanasri J.P
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160505/3a332748/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 115, Issue 4
*********************************************

0 Response to "Moses-support Digest, Vol 115, Issue 4"

Post a Comment