Moses-support Digest, Vol 83, Issue 6

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: size of training.out (Hieu Hoang)
2. Re: size of training.out (Arefeh Kazemi)
3. Re: some problems with tuning step (Barry Haddow)
4. Re: Problem in using MML (Barry Haddow)


----------------------------------------------------------------------

Message: 1
Date: Tue, 03 Sep 2013 19:30:23 +0200
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] size of training.out
To: moses-support@mit.edu
Message-ID: <52261CAF.5030405@gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

what is the EXACT command you executed? How exactly was the training.out
file created?

to open the file in linux, type
less training.out

On 03/09/2013 12:44, Arefeh Kazemi wrote:
> Hello
> the size of my training.out file is 8G for 500K training sentence
> pairs. Is it noraml? I cant open it in the linux.
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130903/d3e7655f/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 4 Sep 2013 02:52:03 -0700 (PDT)
From: Arefeh Kazemi <arefeh_kazemi@yahoo.com>
Subject: Re: [Moses-support] size of training.out
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<1378288323.70468.YahooMailNeo@web121701.mail.ne1.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"

Dear Hieu
Thank you for the reply.
the exact command is:

nohup nice

/opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0/scripts/training/train-model.perl

-root-dir train? -corpus /opt/tools/dataset/mizan/M_Tr_Clean? -f en -e fa

-alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm

0:3:/opt/tools/lm2/Mizan.blm.en:8? -external-bin-dir

/opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0/tools/ >
training.out
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130904/c0722dee/attachment-0001.htm

------------------------------

Message: 3
Date: Wed, 04 Sep 2013 11:03:01 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] some problems with tuning step
To: ???? <907739598@qq.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <52270555.7070800@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hi

If you look at the error message:
> Exception: moses/Word.cpp:108 in void
> Moses::Word::CreateFromString(Moses::FactorDirection, const
> std::vector<long unsigned int>&, const StringPiece&, bool) threw
> StrayFactorException because `fit\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'.
> You have configured 1 factors but the word || contains factor
> delimiter | too many times.
> Exit code: 1
you can see what the problem is. The | character is a special character
in Moses, so you should remove it or escape it,

cheers - Barry


On 02/09/13 02:52, ???? wrote:
> Hi:
> I have a problem with tuning step.
> My commend is :
> /home/tempadmin/mtdir/moses/scripts/training/mert-moses.pl
> /home/tempadmin/hp/tuning/CH400-fenci-quanbanjiao.txt
> /home/tempadmin/hp/tuning/LD-bf
> /home/tempadmin/mtdir/moses/moses-chart-cmd/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/moses_chart
> /home/tempadmin/hp/test/model/moses.ini --working-dir
> /home/tempadmin/hp/tuning/mert --filtercmd
> \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'/home/tempadmin/mtdir/moses/scripts/training/filter-model-given-input.pl
> -hierarchical\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' -rootdir
> /home/tempadmin/mtdir/moses/scripts/ --mertdir
> /home/tempadmin/mtdir/moses/bin/ --decoder-flags
> \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"-v 0\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
> &>/home/tempadmin/hp/tuning/mert.out
>
>
>
> mert.out document is as follow:
> Using SCRIPTS_ROOTDIR: /home/tempadmin/mtdir/moses/scripts/
> filtering the phrase tables... Mon Sep 2 08:57:49 CST 2013
> exec:
> /home/tempadmin/mtdir/moses/scripts/training/filter-model-given-input.pl
> -hierarchical ./filtered /home/tempadmin/hp/test/model/moses.ini
> /home/tempadmin/hp/tuning/CH400-fenci-quanbanjiao.txt
> Executing:
> /home/tempadmin/mtdir/moses/scripts/training/filter-model-given-input.pl
> -hierarchical ./filtered /home/tempadmin/hp/test/model/moses.ini
> /home/tempadmin/hp/tuning/CH400-fenci-quanbanjiao.txt >
> filterphrases.out 2> filterphrases.err
> Asking moses for feature names and values from filtered/moses.ini
> Executing:
> /home/tempadmin/mtdir/moses/moses-chart-cmd/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/moses_chart
> -v 0 -config filtered/moses.ini -inputtype 0 -show-weights >
> ./features.list
> /home/tempadmin/mtdir/moses/moses-chart-cmd/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi
> line=UnknownWordPenalty
> WEIGHT UnknownWordPenalty0=1.000,
> line=WordPenalty
> WEIGHT WordPenalty0=-1.000,
> line=PhraseDictionaryMemory name=TranslationModel0 table-limit=20
> num-features=5
> path=/home/tempadmin/hp/tuning/mert/filtered/phrase-table.0-0.1.1.gz
> input-factor=0 output-factor=0
> WEIGHT TranslationModel0=0.200,0.200,0.200,0.200,0.200,
> line=PhraseDictionaryMemory name=TranslationModel1 num-features=1
> path=/home/tempadmin/hp/test/model/glue-grammar input-factor=0
> output-factor=0
> WEIGHT TranslationModel1=1.000,
> line=SRILM name=LM0 factor=0 path=/home/tempadmin/hp/lm/mg-lm.txt order=3
> WEIGHT LM0=0.500,
> /home/tempadmin/hp/lm/mg-lm.txt: line 5614: warning: non-zero
> probability for <unk> in closed-vocabulary LM
> Start loading text SCFG phrase table. Moses format : [2.000] seconds
> Exception: moses/Word.cpp:108 in void
> Moses::Word::CreateFromString(Moses::FactorDirection, const
> std::vector<long unsigned int>&, const StringPiece&, bool) threw
> StrayFactorException because `fit\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'.
> You have configured 1 factors but the word || contains factor
> delimiter | too many times.
> Exit code: 1
> Failed to run moses with the config filtered/moses.ini at
> /home/tempadmin/mtdir/moses/scripts/training/mert-moses.pl line 1271.
>
>
>
> Thank you for your help.
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 4
Date: Wed, 04 Sep 2013 11:13:33 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Problem in using MML
To: Hassan Sajjad <sajjad@ims.uni-stuttgart.de>
Cc: moses-support@mit.edu
Message-ID: <522707CD.1050205@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Hassan

The MML filtering is seeing all your data as in-domain, and I think it
is because you have not correctly specified your out-of-domain data.

In your configuration, the variable mml-filter-corpora should be a
(space-separated) list of the short names of all of your out-of-domain
data, i.e. all the corpora that you want to be filtered.

So if you have [CORPUS:europarl] in the [CORPUS] section, and you want
europarl to be filtered then you set the variable like this:

mml-filter-corpora = europarl

cheers - Barry



On 31/08/13 13:36, Hassan Sajjad wrote:
> Hi,
>
> I am trying to use MML but it's crashing at the
> TRAINING_mml-filter-before-wa step. I could not resolve the problem.
> The error and conf entries are copied here.
>
> The corpus-mml-score.3 contains lines equal to my in-domain data and
> have score 99999 on all lines. Is this correct?
>
> Thank you,
>
> Regards,
> Hassan
>
> ------------------------------------------------------------------------------
> /work/moses-2013-07-10/scripts/ems/support/mml-filter.py
> /training/corpus-mml.3.ini
> 2013-08-31 12:29:57,126 Loading configuration from
> /training/corpus-mml.3.ini
> 2013-08-31 12:29:57,128 Configuration:
> general:strategy = Score
> general:source_language = ar
> general:target_language = en
> general:input_stem = /training/corpus.1
> general:output_stem = /training/corpus-mml.3
> general:domain_file = /model/domains.3
> general:domain_file_out = /training/corpus-mml.3
> score:score_file = /training/corpus-mml-score.3
> score:proportion = 0.9
>
> 2013-08-31 12:29:57,170 Retaining at least 0 entries and ignoring 149244
> Traceback (most recent call last):
> File "/work/moses-2013-07-10/scripts/ems/support/mml-filter.py",
> line 156, in <module>
> main()
> File "/work/moses-2013-07-10/scripts/ems/support/mml-filter.py",
> line 111, in main
> strategy = strategy_class(config)
> File "/work/moses-2013-07-10/scripts/ems/support/mml-filter.py",
> line 72, in __init__
> [float(line[:-1]) for line in open(self.score_file)],
> reverse=True)[ignore_count + count]
> IndexError: list index out of range
> ~
> ------------------------------------------------------------------------
>
> Here are the entries in the conf file:
>
> [MML]
>
> ### specifications for language models to be trained
> #
>
> lm-training = $srilm-dir/ngram-count
> lm-settings = "-interpolate -kndiscount -unk"
> lm-binarizer = $moses-src-dir/bin/build_binary
> lm-query = $moses-src-dir/bin/query
> order = 5
> type = 8
>
> raw-indomain-source = $training/train.$pair-extension.$input-extension
> raw-indomain-target = $training/train.$pair-extension.$output-extension
>
> outdomain-stem = /adapt/un.$pair-extension.utf8.ng.clean
> settings = "--line-count 100000"
>
>
> In TRAINING
>
> ### filtering some corpora with modified Moore-Lewis
> # specify corpora to be filtered and ratio to be kept, either before
> or after word alignment
> mml-filter-corpora = /adapt/un.$pair-extension.utf8.ng.clean
> mml-before-wa = "-proportion 0.9"
> #mml-after-wa = "-proportion 0.9"
>
> ### domain adaptation settings
> # options: sparse, any of: indicator, subset, ratio
> domain-features = "subset"
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 83, Issue 6
********************************************

0 Response to "Moses-support Digest, Vol 83, Issue 6"

Post a Comment