Moses-support Digest, Vol 86, Issue 64

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: about testing on part of training dataset (Miles Osborne)
2. Re: How to convert new moses.ini format of to old format
(Rajen Chatterjee)
3. Re: about testing on part of training dataset (Prasanth K)
4. How to use MOSES multi-threaded (Asad A.Malik)


----------------------------------------------------------------------

Message: 1
Date: Sat, 21 Dec 2013 12:39:43 -0500
From: Miles Osborne <miles@inf.ed.ac.uk>
Subject: Re: [Moses-support] about testing on part of training dataset
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAPRfTYo8YFLX7u8ooEv-1UmPXTEMzxag4V+Z2OB0Ca_cd_-8Cg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

SMT systems such as Moses do not guarantee that they can reproduce
the training set. For example, phrases might be pruned due to
frequencies being too low, not all words might be aligned, the
decoder might discard the true translation during etc etc.

This doesn't really have much to do with Indian languages per se;
instead, it is the way that systems are built in general.

Miles

>
Can anyone please tell me about why we got low BLEU score on a testset
we get from training set for sparse resourced languages like Indian
languages.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


------------------------------

Message: 2
Date: Sat, 21 Dec 2013 17:57:23 +0000
From: Rajen Chatterjee <rajen.k.chatterjee@gmail.com>
Subject: Re: [Moses-support] How to convert new moses.ini format of to
old format
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAC4-+Nz9Ow6tveNin7BK1R_Rz4EjSzUE0_=K607tcYrzCyWntA@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

ok thanks a lot.


On Sat, Dec 21, 2013 at 2:02 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:

> You're right, there were 2 different translation models for phrase-based
> and syntax decoding in the old version. The phrase-based model didn't check
> for [ ] characters.
>
> The new Moses use only 1 translation model which always checks for [ ].
>
> It will be easier for you to escape the characters, rather than back port
> the ini file
>
> Sent while bumping into things
>
> On 21 Dec 2013, at 09:18, Rajen Chatterjee <rajen.k.chatterjee@gmail.com>
> wrote:
>
> Hi,
> You are right there is [ character in the phrase table. But the problem
> is I am running same language pair with same train and test set on both,
> old moses version and the new version. I am getting decoding output in the
> old moses version (in which '[' character is present in the phrase table)
> but I am not getting decoding output in the new moses.
> As you say '[' character is a problem then why it is not giving error
> when decoding with old moses?
>
> Thanks
>
>
> On Fri, Dec 20, 2013 at 3:54 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>
>> from the line number, that's likely that there is a [ character in your
>> phrase table. Moses interpret words with [ ] as non-terminals.
>>
>> I think the error would happen whatever version of Moses you are using.
>>
>>
>> You should escape these characters. Moses' tokenizer converts
>> [ --> &#91;
>> ] --> &#93;
>>
>> On 20/12/2013 05:49, Rajen Chatterjee wrote:
>>
>> I am getting this error during decoding using the new moses:
>>
>> Start loading text SCFG phrase table. Moses format : [34.000] seconds
>> Reading
>> /home/rajen/Public/SMT/experiments/acl-14-TAG/results/pb-cross-valid/en-kK1/moses_data/model/phrase-table.gz
>>
>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>> ***************Check nextPos != string::npos failed in
>> moses/Phrase.cpp:214
>>
>>
>> So I thought let me try decoding using old mosesdecoder
>>
>>
>> On Thu, Dec 19, 2013 at 5:31 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>>
>>> There's no script to do that.
>>>
>>> Is there a reason you need to use the old decoder?
>>>
>>>
>>> On 19 December 2013 16:40, Rajen Chatterjee <
>>> rajen.k.chatterjee@gmail.com> wrote:
>>>
>>>> Hello,
>>>> There is a script
>>>> "scripts/training/convert-moses-ini-to-v2.perl" which converts an old
>>>> format of moses.ini to new format, but I want vice versa i.e. from new
>>>> format to old format. How can I achieve this?
>>>>
>>>>
>>>> --
>>>> -Regards,
>>>> Rajen Chatterjee.
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>>
>>> --
>>> Hieu Hoang
>>> Research Associate
>>> University of Edinburgh
>>> http://www.hoang.co.uk/hieu
>>>
>>>
>>
>>
>> --
>> -Regards,
>> Rajen Chatterjee.
>>
>>
>>
>
>
> --
> -Regards,
> Rajen Chatterjee.
>
>


--
-Regards,
Rajen Chatterjee.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131221/cd0dcb7d/attachment-0001.htm

------------------------------

Message: 3
Date: Sat, 21 Dec 2013 19:09:18 +0100
From: Prasanth K <prasanthk.ms09@gmail.com>
Subject: Re: [Moses-support] about testing on part of training dataset
To: Miles Osborne <miles@inf.ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CA+n+9-j6nvda7_ZSSR8L4YsN2Xq3DKsSk4Zc+VMzXtPpQ+Q4GQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Nadeem,

Miles is right. I think I can add some details of experiments which show
the same thing you have observed.

There have been large scale experiments doing what you are trying to see:
evaluating the translation quality on data that has been used to train the
system. The link to one of such papers is here (
http://www.hindawi.com/journals/aai/2012/484580/) and a attested figure
from the same paper http://www.hindawi.com/journals/aai/2012/484580/fig4/

- Prasanth


On Sat, Dec 21, 2013 at 6:39 PM, Miles Osborne <miles@inf.ed.ac.uk> wrote:

> SMT systems such as Moses do not guarantee that they can reproduce
> the training set. For example, phrases might be pruned due to
> frequencies being too low, not all words might be aligned, the
> decoder might discard the true translation during etc etc.
>
> This doesn't really have much to do with Indian languages per se;
> instead, it is the way that systems are built in general.
>
> Miles
>
> >
> Can anyone please tell me about why we got low BLEU score on a testset
> we get from training set for sparse resourced languages like Indian
> languages.
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



--
"Theories have four stages of acceptance. i) this is worthless nonsense;
ii) this is an interesting, but perverse, point of view, iii) this is true,
but quite unimportant; iv) I always said so."

--- J.B.S. Haldane
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131221/0c79133b/attachment-0001.htm

------------------------------

Message: 4
Date: Sat, 21 Dec 2013 17:49:37 -0800 (PST)
From: "Asad A.Malik" <asad_12204@yahoo.com>
Subject: [Moses-support] How to use MOSES multi-threaded
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<1387676977.26222.YahooMailNeo@web122201.mail.ne1.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi All,

In the manual there i mentioned that using command --decoder-flags="-threads 4" we can use MOSES multi-threaded. But I am getting following message.

PS. I have C2D system. I think I'll use 2 instead of 4 in the given command
?
Regards?


Asad A.Malik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131221/0705f172/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Multithread error
Type: application/octet-stream
Size: 35310 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20131221/0705f172/attachment.obj

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 86, Issue 64
*********************************************

0 Response to "Moses-support Digest, Vol 86, Issue 64"

Post a Comment