Moses-support Digest, Vol 91, Issue 43

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Get the probability of a given n-gram in a language model
(Kenneth Heafield)
2. Moses Training issue (Mohsen Afshin)
3. Nooj2014 Conference Programme now online
(MONTI JOHANNA -Professore associato scienze umanistiche e sociali-d)
4. Re: Moses-support post from lars.bungum@idi.ntnu.no requires
approval (Hieu Hoang)
5. Configuring LMs (Lars Bungum)
6. Decode error in EMS (Mauro Zanotti)


----------------------------------------------------------------------

Message: 1
Date: Mon, 26 May 2014 11:04:48 -0700
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Get the probability of a given n-gram in
a language model
To: Albert Llorens <albert.llorens@lucysoftware.com>,
"moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <53838240.5060904@kheafield.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

Here's a cheap server for fragment scoring.

socat TCP4-LISTEN:2000,fork EXEC:"bin/fragment lm/test.arpa"

Then in another terminal

socat TCP4-CONNECT:localhost:2000 STDIO <text

Now if you want to translate fragments instead then that's what Moses is
for, though keep in mind that it will always prepend <s> and append </s>
for translation.

Kenneth

On 05/26/14 02:04, Albert Llorens wrote:
> Thanks, Kenneth.
>
> Yes, I want to score sentence fragments. I want to use Moses for fragment translation, but only for frequent or probable fragments. I'll try what you suggest. Any chance the query could be done remotely, using mosesserver or anything else?
>
> Kind regards.
>
> Albert
>
>
> -----Original Message-----
> From: moses-support-bounces@mit.edu [mailto:moses-support-bounces@mit.edu] On Behalf Of Kenneth Heafield
> Sent: viernes, 23 de mayo de 2014 17:34
> To: moses-support@mit.edu
> Subject: Re: [Moses-support] Get the probability of a given n-gram in a language model
>
> Hi,
>
> You can use bin/query on an ARPA or KenLM file. Then just type sentences at it (or use a file as stdin). By default it will assume you are scoring sentences. You can pass -n to not wrap in <s> and </s>.
>
> It appears that you are asking to score sentence fragments. The leading words will be scored using unigrams, bigrams, etc. from, say, a 5-gram model. If you are using Kneser-Ney, these lower-order probabilities (unigrams through 4-grams) are conditioned on having backed off to them. If you want accurate scores for sentence fragments, build a model of order 1, order 2, order 3, etc. then combine them using
>
> build_binary -r "1.arpa 2.arpa 3.arpa 4.arpa" 5.arpa 5.rest
>
> You can then use
>
> bin/fragment 5.rest <fragments
>
> to attain log10 frequencies. For more on this rant, read
>
> http://kheafield.com/professional/edinburgh/rest_paper.pdf
>
> Kenneth
>
> On 05/23/14 05:13, Albert Llorens wrote:
>> Hi,
>>
>>
>>
>> Is there a straightforward way I can ask Moses for the probability (or
>> the frequency) of a given n-gram in a given language model? If so, can
>> I do the query through mosesserver?
>>
>>
>>
>> Thanks.
>>
>>
>>
>> Kind regards.
>>
>>
>>
>> Albert
>>
>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

Message: 2
Date: Mon, 26 May 2014 10:23:09 +0430
From: Mohsen Afshin <mafshin89@gmail.com>
Subject: [Moses-support] Moses Training issue
To: moses-support@mit.edu
Message-ID:
<CAM5Q6A54vTX2sx6JbcoAbXeH2t_2FbmuE6Cn4KNfWZrTh0_1Lg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Moses devs

I just followed the guide on installation of Moses
here<http://www.statmt.org/moses/?n=moses.baseline> step
by step. Everything worked fine till the step "Training the Translation
System" where there should be a generated "moses.ini" but it doesn't exist.

I get the following error in "training.out" file.

ERROR: Giza did not produce the output file
> /home/mohsen/working/train/giza.fr-en/fr-en.A3.final. Is your corpus clean
> (reasonably-sized sentences)? at
> /home/mohsen/mosesdecoder/scripts/training/train-model.perl line 1191.


Here is the training.out file :
https://www.dropbox.com/s/osuaowjg5cfsuci/training.out
<https://www.dropbox.com/s/osuaowjg5cfsuci/training.out>

--
"Mathematics is the queen of the sciences and number theory is the queen of
mathematics."
--Gauss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140526/0884371b/attachment-0001.htm

------------------------------

Message: 3
Date: Mon, 26 May 2014 22:01:05 +0200
From: "MONTI JOHANNA -Professore associato scienze umanistiche e
sociali-d" <jmonti@uniss.it>
Subject: [Moses-support] Nooj2014 Conference Programme now online
To: <corpora@uib.no>, <dbworld@cs.wisc.edu>, <elsnet-list@elsnet.org>,
<flarenet_subscribers@ilc.cnr.it>, <IRList@lists.shef.ac.uk>,
<LINGUIST@listserv.linguistlist.org>, <moses-support@mit.edu>,
<mt-list@eamt.org>, corpora@gandalf.uib.no,
elsnet-list@cogsci.ed.ac.uk, ln@frmop11.bitnet, openlogos-list@dfki.de
Message-ID: <20140526195640.M70996@uniss.it>
Content-Type: text/plain; charset=utf-8

[Apologies for multiple postings]

The programme for the 3 days of the Nooj2014 conference, June 3, 4 and 5,
2014, is now online at: http://nooj2014.uniss.it/programme.html

Contact: nooj2014@uniss.it

Johanna Monti
on behalf of the Organizing Committee
University of Sassari - Italy




------------------------------

Message: 4
Date: Mon, 26 May 2014 22:03:48 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Moses-support post from
lars.bungum@idi.ntnu.no requires approval
To: moses-support@mit.edu, lars.bungum@idi.ntnu.no
Message-ID: <5383AC34.1080404@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

hi lars

please subscribe to the Moses mailing list before posting to it. You can
subscribe here
http://mailman.mit.edu/mailman/listinfo/moses-support

to answer your question - bjam links the moses decoder to tcmalloc
automatically if it is installed on the computer Moses was compiled on.

You can see it when you ask bjam to give details about the compiler
options it's using
./bjam -d2 ...
....
g++ ... -ltcmalloc_minimal ...
....

However, i've just noticed on the new Ubuntu 14.04, tcmalloc is now
called tcmalloc_minimal4. Moses can't use it yet. We'll get round to
fixing this issue


On 26/05/14 16:14, moses-support-owner@mit.edu wrote:
> As list administrator, your authorization is requested for the
> following mailing list posting:
>
> List:Moses-support@mit.edu
> From:lars.bungum@idi.ntnu.no
> Subject: Using tcmalloc with bjam
> Reason: Post by non-member to a members-only list
>
> At your convenience, visit:
>
> http://mailman.mit.edu/mailman/admindb/moses-support
>
> to approve or deny the request.



------------------------------

Message: 5
Date: Tue, 27 May 2014 11:57:30 +0200
From: Lars Bungum <lars.bungum@idi.ntnu.no>
Subject: [Moses-support] Configuring LMs
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <5384618A.3080107@idi.ntnu.no>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hi,

I am a bit confused on how to configure the LM features correctly.

In my moses.ini this feature line is provided from running the script
train-model.perl with the lm parameters 0:3:$LMPATH (otherwise standard
parameters from the baseline system instructions). I built the LM with
srilm. WIth the text model I receive the following error message when
decoding:

The ARPA file is missing <unk>. Substituting log10 probability
-100.000

but it otherwise works. However, when I compiled the LM with the command:

ngram -order 3 -lm en-de.kn5.lm -write-bin-lm en-de.kn5.lm.bin

I receive the error message:

Reading $PATH/en-de.kn5.lm.bin
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Exception: lm/read_arpa.cc:65 in void
lm::ReadARPACounts(util::FilePiece&, std::vector<long unsigned int>&)
threw FormatLoadException'.
first non-empty line was "SRILM_BINARY_NGRAM_002" not \data\. Byte: 23

this led me to trying to figure out why and I looked in my moses.ini
file. Here the LM is configured with this line:

KENLM lazyken=0 name=LM0 factor=0 path=$PATH/en-de.kn5.lm.bin order=3

and I here is when I couldn't find out why. Why is this feature named
KENLM? And how do I know how to configure it? Did I make a mistake in
running train-model somehow? I guess intuitively I should configure it
with a line that is called SRILM that knows how to read this binary
format, but I was not able to find out how to do that.

Thanks
//LB


------------------------------

Message: 6
Date: Tue, 27 May 2014 17:36:54 +0200
From: Mauro Zanotti <mau.zanotti@gmail.com>
Subject: [Moses-support] Decode error in EMS
To: "moses-support@mit.edu support" <moses-support@mit.edu>
Message-ID:
<CAMBBhmZzqDstkKyqzyqQ2nDOW7451hb7HEkwzLzGNp=CcRVeEg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear all,

I launched an experiment in ems, I created 3 lm and in the decode phase I
get the following error (from EVALUATION_test_decode.2.STDERR file)

line=Distortion
line=IRSTLM name=LM0 factor=0
path=/home/user/opt/casmacat/moses/scripts/ems/example_002/lm/toy.binlm.2
order=5
line=IRSTLM name=LM1 factor=0
path=/home/user/opt/casmacat/moses/scripts/ems/example_002/lm/nc.binlm.2
order=5
Exception: moses/ScoreComponentCollection.cpp:242 in void
Moses::ScoreComponentCollection::Assign(const Moses::FeatureFunction*,
const std::vector<float>&) threw util::Exception'.
Feature function LM1 specified 1 dense scores or weights. Actually has 0

The full ini file generated is

#########################
### MOSES CONFIG FILE ###
#########################

# input factors
[input-factors]
0

# mapping steps
[mapping]
0 T 0

[distortion-limit]
6

# feature functions
[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
PhraseDictionaryBinary name=TranslationModel0 table-limit=20 num-features=4
path=/home/user/opt/casmacat/moses/scripts/ems/example_002/evaluation/test.filtered.2/phrase-table.0-0.1.1
input-factor=0 output-factor=0
LexicalReordering name=LexicalReordering0 num-features=6
type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
path=/home/user/opt/casmacat/moses/scripts/ems/example_002/evaluation/test.filtered.2/reordering-table.2.wbe-msd-bidirectional-fe
Distortion
IRSTLM name=LM0 factor=0
path=/home/user/opt/casmacat/moses/scripts/ems/example_002/lm/toy.binlm.2
order=5
IRSTLM name=LM1 factor=0
path=/home/user/opt/casmacat/moses/scripts/ems/example_002/lm/nc.binlm.2
order=5
IRSTLM name=LM2 factor=0
path=/home/user/opt/casmacat/moses/scripts/ems/example_002/lm/europarl.binlm.2
order=5

# core weights
[weight]
Distortion0= 0.3
UnknownWordPenalty0= 1
WordPenalty0= -1
TranslationModel0= 0.2 0.2 0.2 0.2
PhrasePenalty0= 0.2
LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
LM0= 0.5


Could someone help me?

Thank you in advance
Mauro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140527/0f0ef722/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 91, Issue 43
*********************************************

0 Response to "Moses-support Digest, Vol 91, Issue 43"

Post a Comment