Moses-support Digest, Vol 106, Issue 27

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Easiest way to tune with several data sets ? (Vincent Nguyen)
2. giza direction running snt2cooc.out (Stefy D.)


----------------------------------------------------------------------

Message: 1
Date: Wed, 12 Aug 2015 10:01:46 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: [Moses-support] Easiest way to tune with several data sets ?
To: moses-support <moses-support@mit.edu>
Message-ID: <55CAFD6A.6050704@neuf.fr>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi,

I am wondering if I could get better results with a larger tuning data set.

Is there a way in EMS to cumulate several data set files or do I need to
concatenate sets.

is last option, how can I do this easily ? just concat the sgm files ?

thanks,

Vincent


------------------------------

Message: 2
Date: Wed, 12 Aug 2015 15:55:18 +0000 (UTC)
From: "Stefy D." <tsuki_stefy@yahoo.com>
Subject: [Moses-support] giza direction running snt2cooc.out
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<1860402213.3565218.1439394918512.JavaMail.yahoo@mail.yahoo.com>
Content-Type: text/plain; charset="utf-8"

Hello.
I am trying to use incremental training and I am stuck at inc-giza. Could you please help me with some clarifications? Thank you very much.
(I used simplified filenames just to make it easier to read and I'm using the latest release of moses).
In train-model.perl there is a part of code where it says:if ($___DIRECTION == 1 || $___DIRECTION == 2 || $___NOFORK) {
?? ?&run_single_giza($___GIZA_F2E,$___E,$___F,
?? ???? ???? $___VCB_E,$___VCB_F,
?? ???? ???? $___CORPUS_DIR."/$___F-$___E-int-train.snt")
?? ???? unless $___DIRECTION == 2;
?? ?&run_single_giza($___GIZA_E2F,$___F,$___E,
?? ???? ???? $___VCB_F,$___VCB_E,
?? ???? ???? $___CORPUS_DIR."/$___E-$___F-int-train.snt")
?? ???? unless $___DIRECTION == 1;
??? }
>From here I understand that:Direction 1 is used for f2e and the passed arguments for sntcooc.out are (e.vcb, f.vcb, f2e.snt) leading to giza/f2e.cooc
Direction 2 is used for e2f and the passed arguments for sntcooc.out are (f.vcb, e.vcb, e2f.snt) leading to giza-inverse/e2f.cooc
On the moses website http://www.statmt.org/moses/?n=FactoredTraining.TrainingParameters it says:
- --giza-f2e -- GIZA++ directory (default $ROOT/giza.$F-$E)
- --giza-e2f -- inverse GIZA++ directory (default $ROOT/giza.$E-$F)
which gives me the impression of contradiction with http://www.statmt.org/moses/?n=Advanced.Incremental where it says:
The previous cooccurrence files can be found in <experiment-dir>/training/giza.<run>/<target-lang>-<source-lang>.cooc and <experiment-dir>/training/giza-inverse.<run>/<source-lang>-<target-lang>.cooc.

In scripts/ems/example/config.toy there is a part of code where it says:# $working-dir/training/giza.$baseline/${output-extension}-$input-extension.cooc \ # $working-dir/training/giza-inverse.$baseline/${input-extension}-$output-extension.cooc \

>From here I understand that f2e.cooc is mapped to a folder "giza" and e2f.cooc is mapped to "giza-inverse".

Here http://www.statmt.org/moses/?n=Advanced.Incremental it says that snt2cooc.out should be run like this:$ $INC_GIZA_PP/bin/snt2cooc.out <new-source-vcb> <new-target-vcb> <new-source_target.snt> \
<previous-source-target.cooc > new.source-target.coocMy confusion comes from the fact that train-model.perl is running snt2cooc.out with arguments (e.vcb, f.vcb, f2e.snt) leading to f2e.cooc while by simply running snt2cooc.out the arguments that should be passed are (f.vcb, e.vcb, f2e.snt) leading to f2e.cooc.
Could someone please tell me if giza-inverse is f2e or e2f and if I understood correctly this:??? direction 1 ---- f2e ----- sntcooc.out(e.vcb, f.vcb, f2e.snt, f2e.cooc) ----- giza??? direction 2 ---- e2f ----- sntcooc.out(f.vcb, e.vcb, e2f.snt, e2f.cooc) ----- giza-inverse
and if the command from http://www.statmt.org/moses/?n=Advanced.Incremental should be$ $INC_GIZA_PP/bin/snt2cooc.out <new-source-vcb> <new-target-vcb> <new-target_source.snt> \
<previous-target-source.cooc > new.target-source.coocso that it stays in concordance with train-model.perl way of passing arguments to snt2cooc?
Thank you very much for your time!


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150812/cfa7c02a/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 106, Issue 27
**********************************************

0 Response to "Moses-support Digest, Vol 106, Issue 27"

Post a Comment