Moses-support Digest, Vol 97, Issue 47

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. How should I properly change the moses.ini file for tuning if
I did not prepare an arpa file (and do we need an arpa file)?
(Daniel Seita)


----------------------------------------------------------------------

Message: 1
Date: Mon, 17 Nov 2014 08:54:50 -0800
From: Daniel Seita <takeshidanny@gmail.com>
Subject: [Moses-support] How should I properly change the moses.ini
file for tuning if I did not prepare an arpa file (and do we need an
arpa file)?
To: moses-support@mit.edu
Message-ID:
<CAKUmyF4psU2weccvN3Uvd4XArXc57jh972JrgNmuNvTg8dOFMg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello everyone,

I am struggling to follow the baseline instructions. I am using a Mac OS X
10.9 with boost 1.57, irstlm 5.80.06, and the latest moses/mgiza version
from github. I ran training successfully using this command

nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir train
-corpus ~/corpus/news-commentary-v8.fr-en.clean -f fr -e en -alignment
grow-diag-final-and -reordering msd-bidirectional-fe -lm
0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8 -mgiza -mgiza-cpus 8
-external-bin-dir ~/mosesdecoder/word_align_tools/ >&training.out &

Notice that I'm using mgiza (which is different from what's listed on the
baseline), and that my word_align_tools contains the mgiza binaries and
merge_align.py. Also notice that I'm using the "blm.en" language model
file. This is what is listed on the baseline instructions, so I assumed
this is correct. Unfortunately, tuning fails. I can successfully download
the data and run scripts on it, but the major tuning command fails:

nohup nice ~/mosesdecoder/scripts/training/mert-moses.pl ~/corpus/
news-test2008.true.fr ~/corpus/news-test2008.true.en
~/mosesdecoder/bin/moses train/model/moses.ini --mertdir
~/mosesdecoder/bin/ &> mert.out --decoder-flags="-threads 8" &

My ~/working/mert.out file says at the end:

"This looks like an IRSTLM binary file. Did you forget to pass --text yes
to compile-lm? Byte: 40"

I'm confused because *the baseline instructions imply that we want an
IRSTLM binary file*. I have attached my ~/working/train/model/moses.ini
file that was generated from training, if it helps. I suspect the line to
change is:

KENLM lazyken=0 name=LM0 factor=0
path=/Users/danielseita/lm/news-commentary-v8.fr-en.blm.en order=3

However, changing KENLM to IRSTLM did not work, and I'm not sure what to do
with "lazyken".

The one other problem I think I might have is that I failed to create the
"arpa" file according to the baseline, but I thought that was okay because
we wouldn't need it. Specifically, I ran into the problem listed in this
mailing list:

http://comments.gmane.org/gmane.comp.nlp.moses.user/9924

But following the suggestion of just using "text" or omitting "text" did
not work. I'm using IRSTLM 5.80.06 instead of the 5.80.03 that's assumed in
the baseline, so that might change stuff (installing 5.80.03 fails on my
computer due to some esoteric errors that don't appear on Google
searching). And in any case, I'm not sure I even need the arpa file because
that seems to be *unbinarized*, so why would we want it? I followed the
command under the section "*You can directly create an IRSTLM binary LM
(for faster loading in Moses) by replacing the last command with the
following:*" and used that *instead* of this command:

~/irstlm/bin/compile-lm \
--text yes \
news-commentary-v8.fr-en.lm.en.gz \
news-commentary-v8.fr-en.arpa.en


Because the above command did not work due to DEBUG: too many arguments.

So to summarize...

(1) I think I can fix my issue by figuring out how to fix the moses.ini
file to refer to IRSTLM, but I'm confused about why I'd need to do that
since the baseline instructions assume that we're using IRSTLM, right?

(2) How ca I get irstlm's compile-lm to work to create the .arpa file,
because it seems like it's needed after all?

I know this seems like a lot so if you can address even part of my
questions that would be great.

Thanks,
Daniel Seita
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141117/17521e5d/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mert.out
Type: application/octet-stream
Size: 3006 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141117/17521e5d/attachment-0002.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: moses.ini
Type: application/octet-stream
Size: 933 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141117/17521e5d/attachment-0003.obj

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 97, Issue 47
*********************************************

0 Response to "Moses-support Digest, Vol 97, Issue 47"

Post a Comment