Moses-support Digest, Vol 97, Issue 51

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Regarding Factored Model (Hieu Hoang)
2. Re: How should I properly change the moses.ini file for
tuning if I did not prepare an arpa file (and do we need an arpa
file)? (Barry Haddow)
3. Re: Regarding Factored Model (Mukund Roy)
4. Re: mgiza++ force alignment: segmentation fault when
reloading a big N table (Hala Almaghout)

----------------------------------------------------------------------

Message: 1
Date: Wed, 19 Nov 2014 09:49:01 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Regarding Factored Model
To: Mukund Roy <mukundkumarroy@cdac.in>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbjD67MHz5YZiiKYz-30AY1xdsdxjqs-78zF9cFu0g5iLA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

So the target side of your phrase table contains lemma and POS tags?

Also, is the moses.ini file you sent the exact 1 you used? There are 2 LM
in specified, but only weight for 1 of them

On 18 November 2014 11:30, Mukund Roy <mukundkumarroy@cdac.in> wrote:

> Dear Sir
>
> I used below command for building factored model
>
> $MOSES_HOME/scripts/training/
> train-model.perl -root-dir
> $WORKING_DIR/train -corpus $WORKING_DIR/Train.true.clean -f $slang -e
> $tlang -alignment grow-diag-final-and -reordering msd-bidirectional-fe
> --lm 2:3:$WORKING_DIR/lm/lm-corpus.blm.POS.$tlang:0 --alignment-facor
> 0-0 --translation-factors 0-0,2 --reordering-factors 0-0
> --decoding-steps t0
>
> I have a factored corpus with two factor: lemma & POS. The baseline
> Phrase based model produced BLEU score of around 27 but using above
> command for Factored model, BLEU score dipped to 3.5.
>
> @ Hoang: Sir As you said I am attaching the ini file and Sample input
> outputs of Baseline phrase based model and Factored Model
>
> Thanks & Regards
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141119/a217d3a7/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 19 Nov 2014 09:51:20 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] How should I properly change the
moses.ini file for tuning if I did not prepare an arpa file (and do we
need an arpa file)?
To: Daniel Seita <takeshidanny@gmail.com>
Cc: Moses support <moses-support@mit.edu>
Message-ID: <546C6818.8000409@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hi Daniel

That's good news. I'm not sure how the equals sign dropped out of the
Moses documentation but I've put it back in now and simplified things a bit,

cheers - Barry

On 18/11/14 16:26, Daniel Seita wrote:
> Putting in the equal sign appeared to do the trick. So --text=yes
> works but not --text yes.
>
> (PS: Sorry for emailing this directly to you Barry, I meant to respond
> to the whole mailing list so everyone could know.)
>
> Thanks,
> Daniel
>
>
> On Tue, Nov 18, 2014 at 7:25 AM, Barry Haddow
> <bhaddow@staffmail.ed.ac.uk <mailto:bhaddow@staffmail.ed.ac.uk>> wrote:
>
> Hi Daniel
>
> On 18/11/14 15:14, Daniel Seita wrote:
>
> Thanks for the response Barry. I'm still confused after
> reading your suggestions so perhaps you or someone can clarify
> when you have time?
>
> (1) I think that the /tuning/ step requires /both/ the arpa
> and the binarized files, right? While the /training/ step only
> requires the binarized version? I haven't reached the testing
> step yet.
>
>
> The training step doesn't actually use the LM, it just inserts the
> path into the moses.ini file. The tuning step can use either an
> arpa or a binarised file (not both) but using a binarised file
> will take up less RAM.
>
>
> (2) OK, so as you mention, the baseline instructions assume we
> use IRSTLM to create the arpa file, then use KenLM to binarize
> it. Under the "Language Model Training" section, there are six
> boxes that have command line instructions (the last one is
> querying the language model). I assume this means you /only/
> want us to execute the commands in the first and fifth boxes?
>
>
> Yes, you should only run the first and the fifth. The others are
> options which imho confuse the reader.
>
>
> (3) Is it possible to get the entire training, tuning, and
> testing steps done /without/ an arpa file? This might help
> avoid my problems because I don't think I have a problem
> getting my binarized IRSTLM files. The instructions, as you
> say, do not explain how to configure Moses to do that (and we
> do this by changing the moses.ini file, right?).
>
>
> You need a language model file for tuning and testing, but if you
> directly build an IRSTLM binarised file, then you don't need an
> ARPA file. You do need to make changes to moses.ini (as compared
> to the baseline instructions) and at the moment I can't lay my
> hands on the correct arguments.
>
>
> I'm going to check the IRSTLM documentation because in the
> version I have (5.80.06) both "--text yes" and "--text" fail
> and create the exact error "DEBUG: warning too many arguments"
> that we see in the mailing list discussion that we both linked
> to. Also, running that perl script (to do "steps 1-5") to get
> the LM also fails (that command itself doesn't fail; it causes
> problems later in the sequence), and using the EMS fails on
> the tuning step, I assume because of the same issues above,
> but that's a story for another day.
>
>
> That's all a bit strange. The "official" IRSTLM argument is
> "--text=yes" so that should work. The other methods you mention
> should also work.
>
> cheers - Barry
>
>
>
> Thanks,
> Daniel
>
>
>
> On Tue, Nov 18, 2014 at 1:30 AM, Barry Haddow
> <bhaddow@staffmail.ed.ac.uk
> <mailto:bhaddow@staffmail.ed.ac.uk>
> <mailto:bhaddow@staffmail.ed.ac.uk
> <mailto:bhaddow@staffmail.ed.ac.uk>>> wrote:
>
> Hi Daniel
>
> I looked at the baseline system instructions, and they are
> a bit
> confusing around the LM building. They explain how to use
> IRSTLM
> to binarise a language model, but do not say how to configure
> Moses to load an IRSTLM-binarised model.
>
> In fact, when I wrote the original baseline system manual, I
> assumed that you would build an ARPA file with IRSTLM
> (since KENLM
> didn't do estimation then, and SRILM wasn't open-source),
> and then
> binarise with KENLM and use it at runtime.
>
> Now, however, KENLM does estimation, and creates ARPA
> files. This
> could be one solution to your problem:
> http://kheafield.com/code/kenlm/estimation/
>
> If you want to build an ARPA file with IRSTLM, then this is
> definitely possible, but as noted here
> http://comments.gmane.org/gmane.comp.nlp.moses.user/9924
> there is some uncertainty over the arguments. I assume
> this is a
> versioning issue, but the bottom line is that either
> "--text yes"
> or "--text" should work. When I originally wrote the baseline
> instructions, the arguments I gave worked with the version of
> IRSTLM I installed.
>
> Hope that helps,
>
> cheers
> Barry
>
> On 17/11/14 16:54, Daniel Seita wrote:
>
> Hello everyone,
>
> I am struggling to follow the baseline instructions. I am
> using a Mac OS X 10.9 with boost 1.57, irstlm 5.80.06,
> and the
> latest moses/mgiza version from github. I ran training
> successfully using this command
>
> nohup nice
> ~/mosesdecoder/scripts/training/train-model.perl
> -root-dir train -corpus
> ~/corpus/news-commentary-v8.fr-en.clean -f fr -e en
> -alignment
> grow-diag-final-and -reordering msd-bidirectional-fe -lm
> 0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8 -mgiza
> -mgiza-cpus 8 -external-bin-dir
> ~/mosesdecoder/word_align_tools/ >&training.out &
>
> Notice that I'm using mgiza (which is different from
> what's
> listed on the baseline), and that my word_align_tools
> contains
> the mgiza binaries and merge_align.py. Also notice
> that I'm
> using the "blm.en" language model file. This is what
> is listed
> on the baseline instructions, so I assumed this is
> correct.
> Unfortunately, tuning fails. I can successfully
> download the
> data and run scripts on it, but the major tuning
> command fails:
>
> nohup nice
> ~/mosesdecoder/scripts/training/mert-moses.pl
> <http://mert-moses.pl>
> <http://mert-moses.pl> <http://mert-moses.pl>
>
> ~/corpus/news-test2008.true.fr
> <http://news-test2008.true.fr> <http://news-test2008.true.fr>
> <http://news-test2008.true.fr>
> ~/corpus/news-test2008.true.en
> ~/mosesdecoder/bin/moses train/model/moses.ini --mertdir
> ~/mosesdecoder/bin/ &> mert.out
> --decoder-flags="-threads 8" &
>
> My ~/working/mert.out file says at the end:
>
> "This looks like an IRSTLM binary file. Did you forget to
> pass --text yes to compile-lm? Byte: 40"
>
> I'm confused because /the baseline instructions imply
> that we
> want an IRSTLM binary file/. I have attached my
> ~/working/train/model/moses.ini file that was
> generated from
> training, if it helps. I suspect the line to change is:
>
> KENLM lazyken=0 name=LM0 factor=0
>
> path=/Users/danielseita/lm/news-commentary-v8.fr-en.blm.en order=3
>
> However, changing KENLM to IRSTLM did not work, and
> I'm not
> sure what to do with "lazyken".
>
> The one other problem I think I might have is that I
> failed to
> create the "arpa" file according to the baseline, but I
> thought that was okay because we wouldn't need it.
> Specifically, I ran into the problem listed in this
> mailing list:
>
> http://comments.gmane.org/gmane.comp.nlp.moses.user/9924
>
> But following the suggestion of just using "text" or
> omitting
> "text" did not work. I'm using IRSTLM 5.80.06 instead
> of the
> 5.80.03 that's assumed in the baseline, so that might
> change
> stuff (installing 5.80.03 fails on my computer due to some
> esoteric errors that don't appear on Google
> searching). And in
> any case, I'm not sure I even need the arpa file
> because that
> seems to be /unbinarized/, so why would we want it? I
> followed
> the command under the section "/You can directly create an
> IRSTLM binary LM (for faster loading in Moses) by
> replacing
> the last command with the following:/" and used that
> /instead/
> of this command:
>
> ~/irstlm/bin/compile-lm \
> --text yes \
> news-commentary-v8.fr-en.lm.en.gz \
> news-commentary-v8.fr-en.arpa.en
>
> Because the above command did not work due to DEBUG:
> too many
> arguments.
>
> So to summarize...
>
> (1) I think I can fix my issue by figuring out how to
> fix the
> moses.ini file to refer to IRSTLM, but I'm confused
> about why
> I'd need to do that since the baseline instructions assume
> that we're using IRSTLM, right?
>
> (2) How ca I get irstlm's compile-lm to work to create the
> .arpa file, because it seems like it's needed after all?
>
> I know this seems like a lot so if you can address
> even part
> of my questions that would be great.
>
> Thanks,
> Daniel Seita
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> <mailto:Moses-support@mit.edu <mailto:Moses-support@mit.edu>>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> -- The University of Edinburgh is a charitable body,
> registered in
> Scotland, with registration number SC005336.
>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

Message: 3
Date: Wed, 19 Nov 2014 15:37:37 +0530
From: Mukund Roy <mukundkumarroy@cdac.in>
Subject: Re: [Moses-support] Regarding Factored Model
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <20141119153737.027c3a91@controller.noida.cdac.in>
Content-Type: text/plain; charset=US-ASCII

On Wed, 19 Nov 2014 09:49:01 +0000
Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:

> So the target side of your phrase table contains lemma and POS tags?
>

Yes Sir, phrase table do contains lemma and POS.

> Also, is the moses.ini file you sent the exact 1 you used? There are
> 2 LM in specified, but only weight for 1 of them
>

Before MERT, moses.ini had both LM0 and LM1 . After MERT, only LM0
existed. LM1 weight was dropped.

Thanks & regards
Mukund K Roy

> On 18 November 2014 11:30, Mukund Roy <mukundkumarroy@cdac.in> wrote:
>
> > Dear Sir
> >
> > I used below command for building factored model
> >
> > $MOSES_HOME/scripts/training/
> > train-model.perl -root-dir
> > $WORKING_DIR/train -corpus $WORKING_DIR/Train.true.clean -f $slang
> > -e $tlang -alignment grow-diag-final-and -reordering
> > msd-bidirectional-fe --lm
> > 2:3:$WORKING_DIR/lm/lm-corpus.blm.POS.$tlang:0 --alignment-facor
> > 0-0 --translation-factors 0-0,2 --reordering-factors 0-0
> > --decoding-steps t0
> >
> > I have a factored corpus with two factor: lemma & POS. The baseline
> > Phrase based model produced BLEU score of around 27 but using above
> > command for Factored model, BLEU score dipped to 3.5.
> >
> > @ Hoang: Sir As you said I am attaching the ini file and Sample
> > input outputs of Baseline phrase based model and Factored Model
> >
> > Thanks & Regards
> >
> >
> >
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
>
>

-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

------------------------------

Message: 4
Date: Wed, 19 Nov 2014 10:34:45 +0000
From: Hala Almaghout <halmaghout@computing.dcu.ie>
Subject: Re: [Moses-support] mgiza++ force alignment: segmentation
fault when reloading a big N table
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: Eleftherios Avramidis <eleftherios.avramidis@dfki.de>,
moses-support <moses-support@mit.edu>
Message-ID:
<CAE9fu9X3LaDQSwbajiffzY4XVd=We0jCt8i+LBwLZStEPewScw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

I'm facing a segmentation fault problem when loading big N tables during
forced alignment using MGIZA, which was posted previously on moses list
(thread below) but no solution was suggested. As Lefteris explained, it's
due to big N table size. Any suggestion on how to solve it other than
cutting our entries from the N table?

Many thanks,

Best,

Hala

On 11 August 2014 11:03, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:

> did you manage to solve this issue? I tried contacting Qin Gao but there's
> been no reply so far.
>
> From my experience with mgiza a while ago, force alignment works ok
>
>
> On 3 August 2014 23:34, Eleftherios Avramidis <
> Eleftherios.Avramidis@dfki.de> wrote:
>
>> Hi,
>>
>> I am trying to produce word alignment for individual sentences. For this
>> purpose I am using the "force align" functionality of mgiza++ Unfortunately
>> when I am loading a big N table (fertility), mgiza crashes with a
>> segmentation fault.
>>
>> In particular, I have initially run mgiza on the full training parallel
>> corpus using the default settings of the Moses script:
>>
>> /project/qtleap/software/moses-2.1.1/bin/training-tools/mgiza -CoocurrenceFile /local/tmp/elav01/selection-mechanism/systems/de-en/training/giza.1/en-de.cooc -c /local/tmp/elav01/selection-mechanism/systems/de-en/training/prepared.1/en-de-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 24 -nodumps 0 -nsmooth 4 -o /local/tmp/elav01/selection-mechanism/systems/de-en/training/giza.1/en-de -onlyaldumps 0 -p0 0.999 -s /local/tmp/elav01/selection-mechanism/systems/de-en/training/prepared.1/de.vcb -t /local/tmp/elav01/selection-mechanism/systems/de-en/training/prepared.1/en.vcb
>>
>> Afterwards, by executing the mgiza force-align script, I run the
>> following command
>>
>> /project/qtleap/software/moses-2.1.1/mgizapp-code/mgizapp//bin/mgiza giza.en-de/en-de.gizacfg -c /local/tmp/elav01/selection-mechanism/systems/de-en/falign/qtmp_SOVBrE/prepared./en-de.snt -o /local/tmp/elav01/selection-mechanism/systems/de-en/falign/qtmp_SOVBrE/giza./en-de -s /local/tmp/elav01/selection-mechanism/systems/de-en/falign/qtmp_SOVBrE/prepared./de.vcb -t /local/tmp/elav01/selection-mechanism/systems/de-en/falign/qtmp_SOVBrE/prepared./en.vcb -m1 0 -m2 0 -mh 0 -coocurrence /local/tmp/elav01/selection-mechanism/systems/de-en/falign/qtmp_SOVBrE/giza./en-de.cooc -restart 11 -previoust giza.en-de/en-de.t3.final -previousa giza.en-de/en-de.a3.final -previousd giza.en-de/en-de.d3.final -previousn giza.en-de/en-de.n3.final -previousd4 giza.en-de/en-de.d4.final -previousd42 giza.en-de/en-de.D4.final -m3 0 -m4 1
>>
>> This runs fine, until I get the following error:
>>
>> We are going to load previous N model from giza.en-de/en-de.n3.final
>>
>> Reading fertility table from giza.en-de/en-de.n3.final
>>
>> Segmentation fault (core dumped)
>>
>>
>> The n-table that is failing has about 300k entries. For this reason, I
>> thought I should try to see if the size is a problem. So I concatenated the
>> table to 60k entries. And it works! But the alignments are not good.
>>
>> I am struggling to fix this, so any help would be appreciated. I am
>> running a freshly installed mgiza, on Ubuntu 12.04
>>
>> cheers,
>> Lefteris
>>
>> --
>> MSc. Inf. Eleftherios Avramidis
>> DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
>> Tel. +49-30 238 95-1806
>>
>> Fax. +49-30 238 95-1810
>>
>> -------------------------------------------------------------------------------------------
>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>>
>> Geschaeftsfuehrung:
>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>> Dr. Walter Olthoff
>>
>> Vorsitzender des Aufsichtsrats:
>> Prof. Dr. h.c. Hans A. Aukes
>>
>> Amtsgericht Kaiserslautern, HRB 2313
>> -------------------------------------------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141119/cc76d55e/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 97, Issue 51
*********************************************

Moses-support Digest, Vol 97, Issue 51

0 Response to "Moses-support Digest, Vol 97, Issue 51"

Post a Comment