Moses-support Digest, Vol 91, Issue 41

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Moses setup error (Nishkarsh Shastri)
2. Re: Moses setup error (Hieu Hoang)
3. Re: Using xml markup in EMS (Barry Haddow)
4. Re: Get the probability of a given n-gram in a language model
(Albert Llorens)


----------------------------------------------------------------------

Message: 1
Date: Mon, 26 May 2014 13:25:07 +0530
From: Nishkarsh Shastri <nishkarsh.shastri@gmail.com>
Subject: Re: [Moses-support] Moses setup error
To: Matthias Huck <mhuck@inf.ed.ac.uk>
Cc: moses-support@mit.edu
Message-ID:
<CAB9695ZfYmpg+uUcmXQBx8=6-umMNo5KFCgj4E4O-Z_j8+AP3A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hey Matthias,
I have checked my training data and there are no such characters in it.


On Sat, May 24, 2014 at 10:33 PM, Matthias Huck <mhuck@inf.ed.ac.uk> wrote:

> Hi Nishkarsh,
>
> I guess it might be a preprocessing problem. Phrase::CreateFromString
> throws an exception:
>
> > Exception: moses/Phrase.cpp:214 in void
> Moses::Phrase::CreateFromString(Moses::FactorDirection, const
> std::vector<long unsigned int, std::allocator<long unsigned int> >&, const
> StringPiece&, const StringPiece&, Moses::Word**) threw util::Exception
> because `nextPos == string::npos'.
> > Incorrect formatting of non-terminal. Should have 2 non-terms, eg.
> [X][X]. Current string: [MSRTC]
>
> Did you use a custom preprocessing pipeline? Try to remove all square
> brackets from your training data (i.e. the characters "[" and "]"), or
> replace them with something else.
> If you're using the preprocessing scripts provided with Moses, they
> should take care of replacing special characters for you
>
> Cheers,
> Matthias
>
>
> On Sat, 2014-05-24 at 11:42 +0530, Nishkarsh Shastri wrote:
> > Sir,
> >
> > I am getting the Tuning Crashed error while setting up the moses in my
> > PC
> >
> > I am attaching the actual error, error log and moses.ini along with
> > the mail.
> >
> > Please see to it.
> >
> > --
> > Nishkarsh Shastri
> > 2nd year U/G
> > Dept. of Computer Science and Engineering
> > IIT Kharagpur
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>


--
Nishkarsh Shastri
2nd year U/G
Dept. of Computer Science and Engineering
IIT Kharagpur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140526/17f820e4/attachment-0001.htm

------------------------------

Message: 2
Date: Mon, 26 May 2014 09:06:16 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Moses setup error
To: Nishkarsh Shastri <nishkarsh.shastri@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbjAhhOBstdKNSoqExAm_-QvuQ1jb=4n6uzoX-zCGHfX_w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

check your test set too. There is definitely the word
[MSRTC]
the square brackets should be escaped


On 26 May 2014 08:55, Nishkarsh Shastri <nishkarsh.shastri@gmail.com> wrote:

> Hey Matthias,
> I have checked my training data and there are no such characters in it.
>
>
> On Sat, May 24, 2014 at 10:33 PM, Matthias Huck <mhuck@inf.ed.ac.uk>wrote:
>
>> Hi Nishkarsh,
>>
>> I guess it might be a preprocessing problem. Phrase::CreateFromString
>> throws an exception:
>>
>> > Exception: moses/Phrase.cpp:214 in void
>> Moses::Phrase::CreateFromString(Moses::FactorDirection, const
>> std::vector<long unsigned int, std::allocator<long unsigned int> >&, const
>> StringPiece&, const StringPiece&, Moses::Word**) threw util::Exception
>> because `nextPos == string::npos'.
>> > Incorrect formatting of non-terminal. Should have 2 non-terms, eg.
>> [X][X]. Current string: [MSRTC]
>>
>> Did you use a custom preprocessing pipeline? Try to remove all square
>> brackets from your training data (i.e. the characters "[" and "]"), or
>> replace them with something else.
>> If you're using the preprocessing scripts provided with Moses, they
>> should take care of replacing special characters for you
>>
>> Cheers,
>> Matthias
>>
>>
>> On Sat, 2014-05-24 at 11:42 +0530, Nishkarsh Shastri wrote:
>> > Sir,
>> >
>> > I am getting the Tuning Crashed error while setting up the moses in my
>> > PC
>> >
>> > I am attaching the actual error, error log and moses.ini along with
>> > the mail.
>> >
>> > Please see to it.
>> >
>> > --
>> > Nishkarsh Shastri
>> > 2nd year U/G
>> > Dept. of Computer Science and Engineering
>> > IIT Kharagpur
>> > _______________________________________________
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>
>
> --
> Nishkarsh Shastri
> 2nd year U/G
> Dept. of Computer Science and Engineering
> IIT Kharagpur
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140526/84b15726/attachment-0001.htm

------------------------------

Message: 3
Date: Mon, 26 May 2014 09:21:04 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Using xml markup in EMS
To: Wei Qiu <wei@qiu.es>, moses-support <moses-support@mit.edu>
Message-ID: <5382F970.9010004@staffmail.ed.ac.uk>
Content-Type: text/plain; charset="iso-8859-1"

Hi Wei

You can "protect" certain patterns from the tokeniser using the
-protected <FILENAME> switch. The filename should contain regular
expressions which signal which terms you want to protect. This could be
used to prevent the tokeniser from breaking up xml tags - I have used it
for URLs.

cheers - Barry

On 24/05/14 18:16, Wei Qiu wrote:
> Hi,
>
> Is it also reasonable to use xml markup for tuning?
>
> How can I use xml markup in ems? I am asking because it seems that the
> tokenize step would break the xml tags into tokens.
>
> Thanks in advance.
>
> Best,
> Wei
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140526/dddcbd70/attachment-0001.htm

------------------------------

Message: 4
Date: Mon, 26 May 2014 09:04:24 +0000
From: Albert Llorens <albert.llorens@lucysoftware.com>
Subject: Re: [Moses-support] Get the probability of a given n-gram in
a language model
To: Kenneth Heafield <moses@kheafield.com>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID:
<833DD166FFE895458E5E7BE06CDEECD42C755141@MUC-EX.lucysoftware.com>
Content-Type: text/plain; charset="us-ascii"

Thanks, Kenneth.

Yes, I want to score sentence fragments. I want to use Moses for fragment translation, but only for frequent or probable fragments. I'll try what you suggest. Any chance the query could be done remotely, using mosesserver or anything else?

Kind regards.

Albert


-----Original Message-----
From: moses-support-bounces@mit.edu [mailto:moses-support-bounces@mit.edu] On Behalf Of Kenneth Heafield
Sent: viernes, 23 de mayo de 2014 17:34
To: moses-support@mit.edu
Subject: Re: [Moses-support] Get the probability of a given n-gram in a language model

Hi,

You can use bin/query on an ARPA or KenLM file. Then just type sentences at it (or use a file as stdin). By default it will assume you are scoring sentences. You can pass -n to not wrap in <s> and </s>.

It appears that you are asking to score sentence fragments. The leading words will be scored using unigrams, bigrams, etc. from, say, a 5-gram model. If you are using Kneser-Ney, these lower-order probabilities (unigrams through 4-grams) are conditioned on having backed off to them. If you want accurate scores for sentence fragments, build a model of order 1, order 2, order 3, etc. then combine them using

build_binary -r "1.arpa 2.arpa 3.arpa 4.arpa" 5.arpa 5.rest

You can then use

bin/fragment 5.rest <fragments

to attain log10 frequencies. For more on this rant, read

http://kheafield.com/professional/edinburgh/rest_paper.pdf

Kenneth

On 05/23/14 05:13, Albert Llorens wrote:
> Hi,
>
>
>
> Is there a straightforward way I can ask Moses for the probability (or
> the frequency) of a given n-gram in a given language model? If so, can
> I do the query through mosesserver?
>
>
>
> Thanks.
>
>
>
> Kind regards.
>
>
>
> Albert
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 91, Issue 41
*********************************************

0 Response to "Moses-support Digest, Vol 91, Issue 41"

Post a Comment