Moses-support Digest, Vol 97, Issue 48

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. subscription for support. (shiva kumar)
2. support req on unicode rendering issue. (shiva kumar)
3. Re: How should I properly change the moses.ini file for
tuning if I did not prepare an arpa file (and do we need an arpa
file)? (Barry Haddow)
4. nist-bleu evaluation with EMS (Gary Daine)
5. placeholders for numbers - extract step (Vito Mandorino)


----------------------------------------------------------------------

Message: 1
Date: Tue, 18 Nov 2014 01:03:30 -0800
From: shiva kumar <shivadvg19@yahoo.com>
Subject: [Moses-support] subscription for support.
To: moses-support@mit.edu
Message-ID:
<1416301410.16753.YahooMailBasic@web162306.mail.bf1.yahoo.com>
Content-Type: text/plain; charset=us-ascii



shivadvg19@yahoo.com


thanks,
ShivaKumar KM



------------------------------

Message: 2
Date: Tue, 18 Nov 2014 01:05:30 -0800
From: shiva kumar <shivadvg19@yahoo.com>
Subject: [Moses-support] support req on unicode rendering issue.
To: moses-support@mit.edu
Message-ID:
<1416301530.3659.YahooMailBasic@web162305.mail.bf1.yahoo.com>
Content-Type: text/plain; charset=us-ascii

hi
i am working on baseline SMT with moses for Kannada-english MT. in the tokenization step the input unicode fonts of kannada words will get added with their unicode references because of glyph substitution.

due to this i am not able to get good translation. if i give the tokenized sentences as input to decoder i am getting correct translation.

how to solve this problem?

i am using ubuntu12.04 and moses.

regards,
ShivaKumar KM
Asst.Professor,
Amrita VishwaVidyaPeetham Mysore Campus
Bogadi 2nd stage
Mysore
9611913393


------------------------------

Message: 3
Date: Tue, 18 Nov 2014 09:30:47 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] How should I properly change the
moses.ini file for tuning if I did not prepare an arpa file (and do we
need an arpa file)?
To: Daniel Seita <takeshidanny@gmail.com>, moses-support@mit.edu
Message-ID: <546B11C7.6000109@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Daniel

I looked at the baseline system instructions, and they are a bit
confusing around the LM building. They explain how to use IRSTLM to
binarise a language model, but do not say how to configure Moses to load
an IRSTLM-binarised model.

In fact, when I wrote the original baseline system manual, I assumed
that you would build an ARPA file with IRSTLM (since KENLM didn't do
estimation then, and SRILM wasn't open-source), and then binarise with
KENLM and use it at runtime.

Now, however, KENLM does estimation, and creates ARPA files. This could
be one solution to your problem:
http://kheafield.com/code/kenlm/estimation/

If you want to build an ARPA file with IRSTLM, then this is definitely
possible, but as noted here
http://comments.gmane.org/gmane.comp.nlp.moses.user/9924
there is some uncertainty over the arguments. I assume this is a
versioning issue, but the bottom line is that either "--text yes" or
"--text" should work. When I originally wrote the baseline instructions,
the arguments I gave worked with the version of IRSTLM I installed.

Hope that helps,

cheers
Barry

On 17/11/14 16:54, Daniel Seita wrote:
> Hello everyone,
>
> I am struggling to follow the baseline instructions. I am using a Mac
> OS X 10.9 with boost 1.57, irstlm 5.80.06, and the latest moses/mgiza
> version from github. I ran training successfully using this command
>
> nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir
> train -corpus ~/corpus/news-commentary-v8.fr-en.clean -f fr -e en
> -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm
> 0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8 -mgiza -mgiza-cpus 8
> -external-bin-dir ~/mosesdecoder/word_align_tools/ >&training.out &
>
> Notice that I'm using mgiza (which is different from what's listed on
> the baseline), and that my word_align_tools contains the mgiza
> binaries and merge_align.py. Also notice that I'm using the "blm.en"
> language model file. This is what is listed on the baseline
> instructions, so I assumed this is correct. Unfortunately, tuning
> fails. I can successfully download the data and run scripts on it, but
> the major tuning command fails:
>
> nohup nice ~/mosesdecoder/scripts/training/mert-moses.pl
> <http://mert-moses.pl> ~/corpus/news-test2008.true.fr
> <http://news-test2008.true.fr> ~/corpus/news-test2008.true.en
> ~/mosesdecoder/bin/moses train/model/moses.ini --mertdir
> ~/mosesdecoder/bin/ &> mert.out --decoder-flags="-threads 8" &
>
> My ~/working/mert.out file says at the end:
>
> "This looks like an IRSTLM binary file. Did you forget to pass --text
> yes to compile-lm? Byte: 40"
>
> I'm confused because /the baseline instructions imply that we want an
> IRSTLM binary file/. I have attached my
> ~/working/train/model/moses.ini file that was generated from training,
> if it helps. I suspect the line to change is:
>
> KENLM lazyken=0 name=LM0 factor=0
> path=/Users/danielseita/lm/news-commentary-v8.fr-en.blm.en order=3
>
> However, changing KENLM to IRSTLM did not work, and I'm not sure what
> to do with "lazyken".
>
> The one other problem I think I might have is that I failed to create
> the "arpa" file according to the baseline, but I thought that was okay
> because we wouldn't need it. Specifically, I ran into the problem
> listed in this mailing list:
>
> http://comments.gmane.org/gmane.comp.nlp.moses.user/9924
>
> But following the suggestion of just using "text" or omitting "text"
> did not work. I'm using IRSTLM 5.80.06 instead of the 5.80.03 that's
> assumed in the baseline, so that might change stuff (installing
> 5.80.03 fails on my computer due to some esoteric errors that don't
> appear on Google searching). And in any case, I'm not sure I even need
> the arpa file because that seems to be /unbinarized/, so why would we
> want it? I followed the command under the section "/You can directly
> create an IRSTLM binary LM (for faster loading in Moses) by replacing
> the last command with the following:/" and used that /instead/ of this
> command:
>
> ~/irstlm/bin/compile-lm \
> --text yes \
> news-commentary-v8.fr-en.lm.en.gz \
> news-commentary-v8.fr-en.arpa.en
>
> Because the above command did not work due to DEBUG: too many arguments.
>
> So to summarize...
>
> (1) I think I can fix my issue by figuring out how to fix the
> moses.ini file to refer to IRSTLM, but I'm confused about why I'd need
> to do that since the baseline instructions assume that we're using
> IRSTLM, right?
>
> (2) How ca I get irstlm's compile-lm to work to create the .arpa file,
> because it seems like it's needed after all?
>
> I know this seems like a lot so if you can address even part of my
> questions that would be great.
>
> Thanks,
> Daniel Seita
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 4
Date: Tue, 18 Nov 2014 11:10:19 +0100
From: Gary Daine <gdaine@gmail.com>
Subject: [Moses-support] nist-bleu evaluation with EMS
To: moses-support@mit.edu
Message-ID: <546B1B0B.8000000@gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi,

I have Moses installed and running properly, but I can't work out how to
set up the config file for NIST-BLEU.

The error message I get is:

executing /home/gary/working/steps/4/EVALUATION_testcorpus_wrap.4
via sh (1 active)
number of steps doable or running: 1 at mar nov 18 10:51:09 CET 2014
doable: EVALUATION:testcorpus:nist-bleu
ERROR: you need to define GENERAL:input-sgm

I understand that NIST-BLEU requires sgm-formatted files. My corpus is
in utf-8, and I've specified raw input for all the other steps, which
seems to work fine. I've read and re-read all the documentation I can
find, and I can't work out:

(1) which file(s) need to be in sgm format, and
(2) how to specify this in the config file
(obviously I need to specify 'input-sgm =', but what do I use as a
parameter? Do I need to convert the tuning(?) file manually beforehand?)

I would appreciate any pointers.

Thanks,
Gary



------------------------------

Message: 5
Date: Tue, 18 Nov 2014 12:30:42 +0100
From: Vito Mandorino <vito.mandorino@linguacustodia.com>
Subject: [Moses-support] placeholders for numbers - extract step
To: moses-support@mit.edu
Message-ID:
<CA+8mSmFykzf6wErTtJvbqjy883iixyF7HAkNfF=5qTt2h8PUMg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello everyone,

I am trying to use placeholders for numbers in phrase-based MT, according
to http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc75

The above page says

---

During extraction, add the following to the extract command (phrase-based
only for now):

./extract --Placeholders @num@ ....

--

Does this mean that I have to first run train-model.perl with
--last-step=4, then the line above and then again train-model.perl with
--first-step=6?

If this is the case, which arguments and options should I pass to extract
for a baseline training? I think the syntax is something like

syntax: extract en de align extract max-length [orientation [ --model
[wbe|phrase|hier]-[msd|mslr|mono] ] | --OnlyOutputSpanInfo | --NoTTable |
--GZOutput | --IncludeSentenceId | --SentenceOffset n | --InstanceWeights
filename ]

In particular I cannot figure out what should be passed as 'align' and
'extract' arguments.


Regards,

Vito

--

*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

*The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89
<%2B33%206%2084%2065%2068%2089>*

*Email :* *vito.mandorino@linguacustodia.com
<massinissa.ahmim@linguacustodia.com>*

*Website :* *www.linguacustodia.com <http://www.linguacustodia.com/> -
www.thetranslationtrustee.com <http://www.thetranslationtrustee.com/>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141118/ec7084ff/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141118/ec7084ff/attachment.jpg

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 97, Issue 48
*********************************************

0 Response to "Moses-support Digest, Vol 97, Issue 48"

Post a Comment