Moses-support Digest, Vol 86, Issue 24

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: error during testing (amir haghighi)


----------------------------------------------------------------------

Message: 1
Date: Sun, 8 Dec 2013 16:06:09 +0330
From: amir haghighi <amir.haghighi.64@gmail.com>
Subject: Re: [Moses-support] error during testing
To: moses-support@mit.edu
Message-ID:
<CA+UVbEjLG0Lx1TwGxWf8fysa2OW_WX8_PjW7x5kHgXZRUiqPVA@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

the file model/reordering-table.* is not empty but the file
evaluation/*.filtered.*/reordering-table.1.* is!
my test set is not empty.

thank you for your answers.


On Sun, Dec 8, 2013 at 3:29 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:

> everything looks ok, I'm not sure why it's segfaulting
>
> is the file
> model/reordering-table.*
> empty? If it is, then you should look in the log file
> steps/*/TRAINING_build-reordering.*.STDERR
>
> or is
> evaluation/*.filtered.*/reordering-table.1.*
> empty? is your test set empty?
>
>
>
> On 8 December 2013 09:47, amir haghighi <amir.haghighi.64@gmail.com>wrote:
>
>> yes, the parallel data is UTF8.(one is UTF8 and one is ascii).
>> all of the pre-processioning steps are done with moses scripts.
>>
>> here is the EMS config file content:
>>
>> ################################################
>> ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ###
>> ################################################
>>
>> [GENERAL]
>>
>> ### directory in which experiment is run
>> #
>> working-dir = /opt/tools/workingEms
>>
>> # specification of the language pair
>> input-extension = En
>> output-extension = Fa
>> pair-extension = En-Fa
>>
>> ### directories that contain tools and data
>> #
>> # moses
>> moses-src-dir =
>> /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0
>> #
>> # moses binaries
>> moses-bin-dir = $moses-src-dir/bin
>> #
>> # moses scripts
>> moses-script-dir = $moses-src-dir/scripts
>> #
>> # directory where GIZA++/MGIZA programs resides
>> external-bin-dir = $moses-src-dir/tools
>> #
>> # srilm
>> #srilm-dir = $moses-src-dir/srilm/bin/i686
>> #
>> # irstlm
>> irstlm-dir = /opt/tools/irstlm/bin
>> #
>> # randlm
>> #randlm-dir = $moses-src-dir/randlm/bin
>> #
>> # data
>> toy-data = /opt/tools/dataset/mizan
>>
>> ### basic tools
>> #
>> # moses decoder
>> decoder = $moses-bin-dir/moses
>>
>> # conversion of phrase table into binary on-disk format
>> ttable-binarizer = $moses-bin-dir/processPhraseTable
>>
>> # conversion of rule table into binary on-disk format
>> #ttable-binarizer = "$moses-bin-dir/CreateOnDiskPt 1 1 5 100 2"
>>
>> # tokenizers - comment out if all your data is already tokenized
>> input-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l
>> $input-extension"
>> output-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l
>> $output-extension"
>>
>> # truecasers - comment out if you do not use the truecaser
>> input-truecaser = $moses-script-dir/recaser/truecase.perl
>> output-truecaser = $moses-script-dir/recaser/truecase.perl
>> detruecaser = $moses-script-dir/recaser/detruecase.perl
>>
>> ### generic parallelizer for cluster and multi-core machines
>> # you may specify a script that allows the parallel execution
>> # parallizable steps (see meta file). you also need specify
>> # the number of jobs (cluster) or cores (multicore)
>> #
>> #generic-parallelizer =
>> $moses-script-dir/ems/support/generic-parallelizer.perl
>> #generic-parallelizer =
>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl
>>
>> ### cluster settings (if run on a cluster machine)
>> # number of jobs to be submitted in parallel
>> #
>> #jobs = 10
>>
>> # arguments to qsub when scheduling a job
>> #qsub-settings = ""
>>
>> # project for priviledges and usage accounting
>> #qsub-project = iccs_smt
>>
>> # memory and time
>> #qsub-memory = 4
>> #qsub-hours = 48
>>
>> ### multi-core settings
>> # when the generic parallelizer is used, the number of cores
>> # specified here
>> cores = 8
>>
>> #################################################################
>> # PARALLEL CORPUS PREPARATION:
>> # create a tokenized, sentence-aligned corpus, ready for training
>>
>> [CORPUS]
>>
>> ### long sentences are filtered out, since they slow down GIZA++
>> # and are a less reliable source of data. set here the maximum
>> # length of a sentence
>> #
>> max-sentence-length = 80
>>
>> [CORPUS:toy]
>>
>> ### command to run to get raw corpus files
>> #
>> # get-corpus-script =
>>
>> ### raw corpus files (untokenized, but sentence aligned)
>> #
>> raw-stem = $toy-data/M_Tr
>>
>> ### tokenized corpus files (may contain long sentences)
>> #
>> #tokenized-stem =
>>
>> ### if sentence filtering should be skipped,
>> # point to the clean training data
>> #
>> #clean-stem =
>>
>> ### if corpus preparation should be skipped,
>> # point to the prepared training data
>> #
>> #lowercased-stem =
>>
>> #################################################################
>> # LANGUAGE MODEL TRAINING
>>
>> [LM]
>>
>> ### tool to be used for language model training
>> # srilm
>> #lm-training = $srilm-dir/ngram-count
>> #settings = "-interpolate -kndiscount -unk"
>>
>> # irstlm training
>> # msb = modified kneser ney; p=0 no singleton pruning
>> #lm-training = "$moses-script-dir/generic/trainlm-irst2.perl -cores
>> $cores -irst-dir $irstlm-dir -temp-dir $working-dir/tmp"
>> #settings = "-s msb -p 0"
>>
>> # order of the language model
>> order = 5
>>
>> ### tool to be used for training randomized language model from scratch
>> # (more commonly, a SRILM is trained)
>> #
>> #rlm-training = "$randlm-dir/buildlm -falsepos 8 -values 8"
>>
>> ### script to use for binary table format for irstlm or kenlm
>> # (default: no binarization)
>>
>> # irstlm
>> #lm-binarizer = $irstlm-dir/compile-lm
>>
>> # kenlm, also set type to 8
>> #lm-binarizer = $moses-bin-dir/build_binary
>> #type = 8
>>
>> ### script to create quantized language model format (irstlm)
>> # (default: no quantization)
>> #
>> #lm-quantizer = $irstlm-dir/quantize-lm
>>
>> ### script to use for converting into randomized table format
>> # (default: no randomization)
>> #
>> #lm-randomizer = "$randlm-dir/buildlm -falsepos 8 -values 8"
>>
>> ### each language model to be used has its own section here
>>
>> [LM:toy]
>>
>> ### command to run to get raw corpus files
>> #
>> #get-corpus-script = ""
>>
>> ### raw corpus (untokenized)
>> #
>> raw-corpus = $toy-data/M_Tr.$output-extension
>>
>> ### tokenized corpus files (may contain long sentences)
>> #
>> #tokenized-corpus =
>>
>> ### if corpus preparation should be skipped,
>> # point to the prepared language model
>> #
>> lm = /opt/tools/lm2/M_FaforLm.blm.Fa
>>
>> #################################################################
>> # INTERPOLATING LANGUAGE MODELS
>>
>> [INTERPOLATED-LM]
>>
>> # if multiple language models are used, these may be combined
>> # by optimizing perplexity on a tuning set
>> # see, for instance [Koehn and Schwenk, IJCNLP 2008]
>>
>> ### script to interpolate language models
>> # if commented out, no interpolation is performed
>> #
>> # script = $moses-script-dir/ems/support/interpolate-lm.perl
>>
>> ### tuning set
>> # you may use the same set that is used for mert tuning (reference set)
>> #
>> #tuning-sgm =
>> #raw-tuning =
>> #tokenized-tuning =
>> #factored-tuning =
>> #lowercased-tuning =
>> #split-tuning =
>>
>> ### group language models for hierarchical interpolation
>> # (flat interpolation is limited to 10 language models)
>> #group = "first,second fourth,fifth"
>>
>> ### script to use for binary table format for irstlm or kenlm
>> # (default: no binarization)
>>
>> # irstlm
>> #lm-binarizer = $irstlm-dir/compile-lm
>>
>> # kenlm, also set type to 8
>> #lm-binarizer = $moses-bin-dir/build_binary
>> type = 8
>>
>> ### script to create quantized language model format (irstlm)
>> # (default: no quantization)
>> #
>> #lm-quantizer = $irstlm-dir/quantize-lm
>>
>> ### script to use for converting into randomized table format
>> # (default: no randomization)
>> #
>> #lm-randomizer = "$randlm-dir/buildlm -falsepos 8 -values 8"
>>
>> #################################################################
>> # MODIFIED MOORE LEWIS FILTERING
>>
>> [MML] IGNORE
>>
>> ### specifications for language models to be trained
>> #
>> #lm-training = $srilm-dir/ngram-count
>> #lm-settings = "-interpolate -kndiscount -unk"
>> #lm-binarizer = $moses-src-dir/bin/build_binary
>> #lm-query = $moses-src-dir/bin/query
>> #order = 5
>>
>> ### in-/out-of-domain source/target corpora to train the 4 language model
>> #
>> # in-domain: point either to a parallel corpus
>> #outdomain-stem = [CORPUS:toy:clean-split-stem]
>>
>> # ... or to two separate monolingual corpora
>> #indomain-target = [LM:toy:lowercased-corpus]
>> #raw-indomain-source = $toy-data/M_Tr.$input-extension
>>
>> # point to out-of-domain parallel corpus
>> #outdomain-stem = [CORPUS:giga:clean-split-stem]
>>
>> # settings: number of lines sampled from the corpora to train each
>> language model on
>> # (if used at all, should be small as a percentage of corpus)
>> #settings = "--line-count 100000"
>>
>> #################################################################
>> # TRANSLATION MODEL TRAINING
>>
>> [TRAINING]
>>
>> ### training script to be used: either a legacy script or
>> # current moses training script (default)
>> #
>> script = $moses-script-dir/training/train-model.perl
>>
>> ### general options
>> # these are options that are passed on to train-model.perl, for instance
>> # * "-mgiza -mgiza-cpus 8" to use mgiza instead of giza
>> # * "-sort-buffer-size 8G -sort-compress gzip" to reduce on-disk sorting
>> # * "-sort-parallel 8 -cores 8" to speed up phrase table building
>> #
>> #training-options = ""
>>
>> ### factored training: specify here which factors used
>> # if none specified, single factor training is assumed
>> # (one translation step, surface to surface)
>> #
>> #input-factors = word lemma pos morph
>> #output-factors = word lemma pos
>> #alignment-factors = "word -> word"
>> #translation-factors = "word -> word"
>> #reordering-factors = "word -> word"
>> #generation-factors = "word -> pos"
>> #decoding-steps = "t0, g0"
>>
>> ### parallelization of data preparation step
>> # the two directions of the data preparation can be run in parallel
>> # comment out if not needed
>> #
>> parallel = yes
>>
>> ### pre-computation for giza++
>> # giza++ has a more efficient data structure that needs to be
>> # initialized with snt2cooc. if run in parallel, this may reduces
>> # memory requirements. set here the number of parts
>> #
>> #run-giza-in-parts = 5
>>
>> ### symmetrization method to obtain word alignments from giza output
>> # (commonly used: grow-diag-final-and)
>> #
>> alignment-symmetrization-method = grow-diag-final-and
>>
>> ### use of berkeley aligner for word alignment
>> #
>> #use-berkeley = true
>> #alignment-symmetrization-method = berkeley
>> #berkeley-train = $moses-script-dir/ems/support/berkeley-train.sh
>> #berkeley-process = $moses-script-dir/ems/support/berkeley-process.sh
>> #berkeley-jar = /your/path/to/berkeleyaligner-1.1/berkeleyaligner.jar
>> #berkeley-java-options = "-server -mx30000m -ea"
>> #berkeley-training-options = "-Main.iters 5 5 -EMWordAligner.numThreads 8"
>> #berkeley-process-options = "-EMWordAligner.numThreads 8"
>> #berkeley-posterior = 0.5
>>
>> ### use of baseline alignment model (incremental training)
>> #
>> #baseline = 68
>> #baseline-alignment-model =
>> "$working-dir/training/prepared.$baseline/$input-extension.vcb \
>> # $working-dir/training/prepared.$baseline/$output-extension.vcb \
>> #
>> $working-dir/training/giza.$baseline/${output-extension}-$input-extension.cooc
>> \
>> #
>> $working-dir/training/giza-inverse.$baseline/${input-extension}-$output-extension.cooc
>> \
>> #
>> $working-dir/training/giza.$baseline/${output-extension}-$input-extension.thmm.5
>> \
>> #
>> $working-dir/training/giza.$baseline/${output-extension}-$input-extension.hhmm.5
>> \
>> #
>> $working-dir/training/giza-inverse.$baseline/${input-extension}-$output-extension.thmm.5
>> \
>> #
>> $working-dir/training/giza-inverse.$baseline/${input-extension}-$output-extension.hhmm.5"
>>
>> ### if word alignment should be skipped,
>> # point to word alignment files
>> #
>> #word-alignment = $working-dir/model/aligned.1
>>
>> ### filtering some corpora with modified Moore-Lewis
>> # specify corpora to be filtered and ratio to be kept, either before or
>> after word alignment
>> #mml-filter-corpora = toy
>> #mml-before-wa = "-proportion 0.9"
>> #mml-after-wa = "-proportion 0.9"
>>
>> ### create a bilingual concordancer for the model
>> #
>> #biconcor = $moses-script-dir/ems/biconcor/biconcor
>>
>> ### lexicalized reordering: specify orientation type
>> # (default: only distance-based reordering model)
>> #
>> lexicalized-reordering = msd-bidirectional-fe
>>
>> ### hierarchical rule set
>> #
>> #hierarchical-rule-set = true
>>
>> ### settings for rule extraction
>> #
>> #extract-settings = ""
>> max-phrase-length = 5
>>
>> ### add extracted phrases from baseline model
>> #
>> #baseline-extract = $working-dir/model/extract.$baseline
>> #
>> # requires aligned parallel corpus for re-estimating lexical translation
>> probabilities
>> #baseline-corpus = $working-dir/training/corpus.$baseline
>> #baseline-alignment =
>> $working-dir/model/aligned.$baseline.$alignment-symmetrization-method
>>
>> ### unknown word labels (target syntax only)
>> # enables use of unknown word labels during decoding
>> # label file is generated during rule extraction
>> #
>> #use-unknown-word-labels = true
>>
>> ### if phrase extraction should be skipped,
>> # point to stem for extract files
>> #
>> # extracted-phrases =
>>
>> ### settings for rule scoring
>> #
>> score-settings = "--GoodTuring"
>>
>> ### include word alignment in phrase table
>> #
>> #include-word-alignment-in-rules = yes
>>
>> ### sparse lexical features
>> #
>> #sparse-lexical-features = "target-word-insertion top 50,
>> source-word-deletion top 50, word-translation top 50 50, phrase-length"
>>
>> ### domain adaptation settings
>> # options: sparse, any of: indicator, subset, ratio
>> #domain-features = "subset"
>>
>> ### if phrase table training should be skipped,
>> # point to phrase translation table
>> #
>> # phrase-translation-table =
>>
>> ### if reordering table training should be skipped,
>> # point to reordering table
>> #
>> # reordering-table =
>>
>> ### filtering the phrase table based on significance tests
>> # Johnson, Martin, Foster and Kuhn. (2007): "Improving Translation
>> Quality by Discarding Most of the Phrasetable"
>> # options: -n number of translations; -l 'a+e', 'a-e', or a positive real
>> value -log prob threshold
>> #salm-index = /path/to/project/salm/Bin/Linux/Index/IndexSA.O64
>> #sigtest-filter = "-l a+e -n 50"
>>
>> ### if training should be skipped,
>> # point to a configuration file that contains
>> # pointers to all relevant model files
>> #
>> #config-with-reused-weights =
>>
>> #####################################################
>> ### TUNING: finding good weights for model components
>>
>> [TUNING]
>>
>> ### instead of tuning with this setting, old weights may be recycled
>> # specify here an old configuration file with matching weights
>> #
>> weight-config = $toy-data/weight.ini
>>
>> ### tuning script to be used
>> #
>> tuning-script = $moses-script-dir/training/mert-moses.pl
>> tuning-settings = "-mertdir $moses-bin-dir"
>>
>> ### specify the corpus used for tuning
>> # it should contain 1000s of sentences
>> #
>> #input-sgm =
>> #raw-input =
>> #tokenized-input =
>> #factorized-input =
>> #input =
>> #
>> #reference-sgm =
>> #raw-reference =
>> #tokenized-reference =
>> #factorized-reference =
>> #reference =
>>
>> ### size of n-best list used (typically 100)
>> #
>> nbest = 100
>>
>> ### ranges for weights for random initialization
>> # if not specified, the tuning script will use generic ranges
>> # it is not clear, if this matters
>> #
>> # lambda =
>>
>> ### additional flags for the filter script
>> #
>> filter-settings = ""
>>
>> ### additional flags for the decoder
>> #
>> decoder-settings = ""
>>
>> ### if tuning should be skipped, specify this here
>> # and also point to a configuration file that contains
>> # pointers to all relevant model files
>> #
>> #config =
>>
>> #########################################################
>> ## RECASER: restore case, this part only trains the model
>>
>> [RECASING]
>>
>> #decoder = $moses-bin-dir/moses
>>
>> ### training data
>> # raw input needs to be still tokenized,
>> # also also tokenized input may be specified
>> #
>> #tokenized = [LM:europarl:tokenized-corpus]
>>
>> # recase-config =
>>
>> #lm-training = $srilm-dir/ngram-count
>>
>> #######################################################
>> ## TRUECASER: train model to truecase corpora and input
>>
>> [TRUECASER]
>>
>> ### script to train truecaser models
>> #
>> trainer = $moses-script-dir/recaser/train-truecaser.perl
>>
>> ### training data
>> # data on which truecaser is trained
>> # if no training data is specified, parallel corpus is used
>> #
>> # raw-stem =
>> # tokenized-stem =
>>
>> ### trained model
>> #
>> # truecase-model =
>>
>> ######################################################################
>> ## EVALUATION: translating a test set using the tuned system and score it
>>
>> [EVALUATION]
>>
>> ### additional flags for the filter script
>> #
>> #filter-settings = ""
>>
>> ### additional decoder settings
>> # switches for the Moses decoder
>> # common choices:
>> # "-threads N" for multi-threading
>> # "-mbr" for MBR decoding
>> # "-drop-unknown" for dropping unknown source words
>> # "-search-algorithm 1 -cube-pruning-pop-limit 5000 -s 5000" for cube
>> pruning
>> #
>> decoder-settings = "-search-algorithm 1 -cube-pruning-pop-limit 5000 -s
>> 5000"
>>
>> ### specify size of n-best list, if produced
>> #
>> #nbest = 100
>>
>> ### multiple reference translations
>> #
>> #multiref = yes
>>
>> ### prepare system output for scoring
>> # this may include detokenization and wrapping output in sgm
>> # (needed for nist-bleu, ter, meteor)
>> #
>> detokenizer = "$moses-script-dir/tokenizer/detokenizer.perl -l
>> $output-extension"
>> #recaser = $moses-script-dir/recaser/recase.perl
>> wrapping-script = "$moses-script-dir/ems/support/wrap-xml.perl
>> $output-extension"
>> #output-sgm =
>>
>> ### BLEU
>> #
>> nist-bleu = $moses-script-dir/generic/mteval-v13a.pl
>> nist-bleu-c = "$moses-script-dir/generic/mteval-v13a.pl -c"
>> #multi-bleu = $moses-script-dir/generic/multi-bleu.perl
>> #ibm-bleu =
>>
>> ### TER: translation error rate (BBN metric) based on edit distance
>> # not yet integrated
>> #
>> # ter =
>>
>> ### METEOR: gives credit to stem / worknet synonym matches
>> # not yet integrated
>> #
>> # meteor =
>>
>> ### Analysis: carry out various forms of analysis on the output
>> #
>> analysis = $moses-script-dir/ems/support/analysis.perl
>> #
>> # also report on input coverage
>> analyze-coverage = yes
>> #
>> # also report on phrase mappings used
>> report-segmentation = yes
>> #
>> # report precision of translations for each input word, broken down by
>> # count of input word in corpus and model
>> #report-precision-by-coverage = yes
>> #
>> # further precision breakdown by factor
>> #precision-by-coverage-factor = pos
>> #
>> # visualization of the search graph in tree-based models
>> #analyze-search-graph = yes
>>
>> [EVALUATION:test]
>>
>> ### input data
>> #
>> input-sgm = $toy-data/M_Ts.$input-extension
>> # raw-input =
>> # tokenized-input =
>> # factorized-input =
>> # input =
>>
>> ### reference data
>> #
>> reference-sgm = $toy-data/M_Ts.$output-extension
>> # raw-reference =
>> # tokenized-reference =
>> # reference =
>>
>> ### analysis settings
>> # may contain any of the general evaluation analysis settings
>> # specific setting: base coverage statistics on earlier run
>> #
>> #precision-by-coverage-base = $working-dir/evaluation/test.analysis.5
>>
>> ### wrapping frame
>> # for nist-bleu and other scoring scripts, the output needs to be wrapped
>> # in sgm markup (typically like the input sgm)
>> #
>> wrapping-frame = $input-sgm
>>
>> ##########################################
>> ### REPORTING: summarize evaluation scores
>>
>> [REPORTING]
>>
>> ### currently no parameters for reporting section
>>
>>>
>>>
>>
>> On Sat, Dec 7, 2013 at 7:21 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>
>>> are you sure the parallel data is encoded in UTF8? Was it tokenized,
>>> cleaned and escaped by the Moses scripts or by another external script?
>>>
>>> Can you please send me you EMS config file too
>>>
>>>
>>> On 7 December 2013 14:03, amir haghighi <amir.haghighi.64@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I have also the same problem in evaluation step with EMS and I would be
>>>> thankful if you could help me.
>>>> the lexical reordering file is emtpy and the log of the output in
>>>> evaluation_test_filter.2.stderr is:
>>>>
>>>> Using SCRIPTS_ROOTDIR:
>>>> /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0/scripts
>>>> (9) create moses.ini @ Sat Dec 7 04:50:15 PST 2013
>>>> Executing: mkdir -p /opt/tools/workingEms/evaluation/test.filtered.2
>>>> Considering factor 0
>>>> Considering factor 0
>>>> filtering /opt/tools/workingEms/model/phrase-table.2 ->
>>>> /opt/tools/workingEms/evaluation/test.filtered.2/phrase-table.0-0.1.1...
>>>> 0 of 2197240 phrases pairs used (0.00%) - note: max length 10
>>>> binarizing...cat
>>>> /opt/tools/workingEms/evaluation/test.filtered.2/phrase-table.0-0.1.1 |
>>>> LC_ALL=C sort -T /opt/tools/workingEms/evaluation/test.filtered.2 |
>>>>
>>>> /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0/bin/processPhraseTable
>>>> -ttable 0 0 - -nscores 5 -out
>>>> /opt/tools/workingEms/evaluation/test.filtered.2/phrase-table.0-0.1.1
>>>> processing ptree for stdin
>>>> Segmentation fault (core dumped)
>>>> filtering
>>>>
>>>> /opt/tools/workingEms/model/reordering-table.2.wbe-msd-bidirectional-fe.gz
>>>> ->
>>>>
>>>> /opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe...
>>>> 0 of 2197240 phrases pairs used (0.00%) - note: max length 10
>>>>
>>>> binarizing.../opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0/bin/processLexicalTable
>>>> -in
>>>>
>>>> /opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe
>>>> -out
>>>>
>>>> /opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe
>>>> processLexicalTable v0.1 by Konrad Rawlik
>>>> processing
>>>>
>>>> /opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe
>>>> to
>>>>
>>>> /opt/tools/workingEms/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe.*
>>>> ERROR: empty lexicalised reordering file
>>>>
>>>>
>>>>
>>>> Barry Haddow <bhaddow@...> writes:
>>>>
>>>> >
>>>> > Hi Irene
>>>> >
>>>> > > But the output is empty. And the errors are 1. segmentation fault
>>>> > 2. error: empty lexicalized
>>>> > > reordering file
>>>> >
>>>> > Is this lexicalised reordering file empty then?
>>>> >
>>>> > It would be helpful if you could post the full log of the output when
>>>> > your run the filter command,
>>>> >
>>>> > cheers - Barry
>>>> >
>>>> > On 26/10/12 17:59, Irene Huang wrote:
>>>> > > Hi, I have trained and tuned the model, now I am using
>>>> > >
>>>> > > ~/mosesdecoder/scripts/training/filter-model-given-input.pl
>>>> > > <http://filter-model-given-input.pl> filtered-newstest2011
>>>> > > mert-work/moses.ini ~/corpus/newstest2011.true.fr
>>>> > > <http://newstest2011.true.fr> \
>>>> > > -Binarizer ~/mosesdecoder/bin/processPhraseTable
>>>> > >
>>>> > > to filter the phrase table.
>>>> > >
>>>> > > But the output is empty. And the errors are 1. segmentation fault
>>>> > > 2. error: empty lexicalized reordering file
>>>> > >
>>>> > > So does this mean it's out of memory error?
>>>> > >
>>>> > > Thanks
>>>> > >
>>>> > >
>>>> > > _______________________________________________
>>>> > > Moses-support mailing list
>>>> > > Moses-support@...
>>>> > > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>>
>>> --
>>> Hieu Hoang
>>> Research Associate
>>> University of Edinburgh
>>> http://www.hoang.co.uk/hieu
>>>
>>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131208/5e0c66ec/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 86, Issue 24
*********************************************

0 Response to "Moses-support Digest, Vol 86, Issue 24"

Post a Comment