Moses-support Digest, Vol 87, Issue 11

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: "No moses in ~/mosesdecoder/bin/" (Philipp Koehn)
2. Re: about Tuning in moses (Philipp Koehn)
3. Re: problem in tokenization (Hieu Hoang)

----------------------------------------------------------------------

Message: 1
Date: Sun, 5 Jan 2014 23:13:19 +0000
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] "No moses in ~/mosesdecoder/bin/"
To: "Asad A.Malik" <asad_12204@yahoo.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDDgDcBsU7M0ai8Zd8gAUGTF_8NgKP6N+scgctkGhf16WA@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

have you tried to compile the code?
http://www.statmt.org/moses/?n=Development.GetStarted

-phi

On Sun, Jan 5, 2014 at 6:11 AM, Asad A.Malik <asad_12204@yahoo.com> wrote:

> Hi All,
>
> I am trying to install MOSES, I've followed steps of user manual. but when
> I try to run decoder on sample-models then there is no moses in the :
> ~/mosesdecoder/bin/
>
> and also there is only config.log file in ~/mosesdecoder/bin . I dont
> know what is the problem. I've cloned the moses decoder via
> git clone git://github.com/moses-smt/mosesdecoder.git
>
>
> Regards
>
> Asad A.Malik
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140105/7728bb5d/attachment-0001.htm

------------------------------

Message: 2
Date: Sun, 5 Jan 2014 23:15:59 +0000
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] about Tuning in moses
To: nadeem khan <nad_star06@yahoo.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDB=CDM-DeWCUAaeWEpMfpMMk7MqK4ONp52qhn2GFQB-nA@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

these are good questions that should be easy to answer if you understand
the purpose of tuning when building machine translation systems.

You can find some information here:
http://www.statmt.org/moses/?n=FactoredTraining.Tuning

-phi

On Fri, Jan 3, 2014 at 2:26 PM, nadeem khan <nad_star06@yahoo.com> wrote:

> Hi all
> I have a few question about tuning step of moses SMT.
> 1. Why we need tuning of the system ? as We can decode without it then why
> do we need it>?
> 2. What is reason behind getting optimized weights and where these weights
> are being used while decoding???
> 3. Why corpus is needed for tuning and why we cant use training datatset
> or testset for tunning of the system???
>
>
> THANK YOU
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140105/b72274bf/attachment-0001.htm

------------------------------

Message: 3
Date: Sun, 5 Jan 2014 23:20:21 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] problem in tokenization
To: Renu Kumar <renu17775@gmail.com>
Cc: Arththika Paramanathan <arthiparamanathan@gmail.com>,
moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbjNc_OC=y=LgyPnq5tv2KMhuD7DYrh_PfURWuL7bcvtnA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

If the golu character doesn't play any role as an independent character,
perhaps you should delete it. It may be similar to the kashida in arabic:
http://en.wikipedia.org/wiki/Kashida
Perhaps you should write a separate script to delete it, for example
delete-golu.perl. Then call it just before the tokenizer is called. That
is, instead of calling the tokenizer like so:
cat untokenized.txt | .../tokenizer.perl -l hi > tokenized.txt
Also call the delete-golu.perl script:
cat untokenized.txt | .../delete-golu.perl | .../tokenizer.perl -l hi >
tokenized.txt

2014/1/4 Renu Kumar <renu17775@gmail.com>

> Hi,
>
> I had faced similar problem for Hindi. However I ignored the tokenization
> step then & moved ahead. However I would also like to sort this problem and
> add any changes needed for Hindi language.
>
> This is generally termed as a golu character that we see in the output and
> comes up for vowel characters which are used with another consonant to form
> a single character of Hindi (or may be Tamil also --I do not know Tamil but
> I think that will be the case for most of the Indian Languages).
>
> Since it is two and in some cases even more than two characters that are
> joined to form and infact represent a single character in Hindi.....so when
> we use the tokenizer script all the characters are broken up individually
> and hence the golu character appears, which infact is the actual
> representation of these characters if we look at the Unicode character
> chart , and these do not play any role as independent characters.
>
> Any suggestions.
> I am also attaching the Unicode character chart for Hindi.
>
> Regards
> Renu
>
>
> ---------- Original Message ----------
> From: Arththika Paramanathan <arthiparamanathan@gmail.com>
> To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
> Cc: moses-support <moses-support@mit.edu>
> Date: January 3, 2014 at 11:33 PM
> Subject: Re: [Moses-support] problem in tokenization
> Hi,
>
> 1)this is an untokenized sentence,
> ???????? ??????? ?????? ??? ???????,????? ???????? ????? ?????? ????????
> ??????? ????????????? ?????.????????? ???????????? ??????
> ??????????????????? ,??????? ???????? ???????????,????????? ???????
> ?????????? ????????.
>
> 2)the command I gave is,
> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ta <
> ~/corpus/training/squirrelmail.ta-en.ta >
> ~/corpus/squirrelmail.ta-en.tok.ta
>
> 3)the output is,
> ??? ? ??? ? ???? ? ?? ?? ? ??? ??? ???? ? ?? , ???? ? ??? ? ??? ? ?? ? ??
> ?????? ???? ? ?? ? ???? ? ?? ??? ? ?? ? ????? ? ???? ? .?? ? ?????? ????? ?
> ?????? ??? ? ?? ???? ? ????? ? ????? ? ?? , ??? ? ?? ? ???? ? ??? ??? ? ? ?
> ???? ? , ??? ? ???? ? ?? ? ??? ? ???? ? ???? ? ??????? ? .
>
> 4)Preferred output is,
> ???????? ??????? ?????? ??? ??????? , ????? ???????? ????? ?????? ????????
> ??????? ????????????? ????? . ????????? ???????????? ??????
> ??????????????????? , ??????? ???????? ??????????? , ????????? ???????
> ?????????? ???????? .
> I attached the non-breaking prefix file also, I want to add more
> abbreviations to this
>
>
> 2014/1/4 renubalyan <renubalyan@cdac.in>
>
>>
>>
>> ---------- Original Message ----------
>> From: Arththika Paramanathan <arthiparamanathan@gmail.com>
>> To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
>> Cc: moses-support <moses-support@mit.edu>
>> Date: January 3, 2014 at 11:33 PM
>> Subject: Re: [Moses-support] problem in tokenization
>> Hi,
>>
>> 1)this is an untokenized sentence,
>> ???????? ??????? ?????? ??? ???????,????? ???????? ????? ?????? ????????
>> ??????? ????????????? ?????.????????? ???????????? ??????
>> ??????????????????? ,??????? ???????? ???????????,????????? ???????
>> ?????????? ????????.
>>
>> 2)the command I gave is,
>> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ta <
>> ~/corpus/training/squirrelmail.ta-en.ta >
>> ~/corpus/squirrelmail.ta-en.tok.ta
>>
>> 3)the output is,
>> ??? ? ??? ? ???? ? ?? ?? ? ??? ??? ???? ? ?? , ???? ? ??? ? ??? ? ?? ? ??
>> ?????? ???? ? ?? ? ???? ? ?? ??? ? ?? ? ????? ? ???? ? .?? ? ?????? ????? ?
>> ?????? ??? ? ?? ???? ? ????? ? ????? ? ?? , ??? ? ?? ? ???? ? ??? ??? ? ? ?
>> ???? ? , ??? ? ???? ? ?? ? ??? ? ???? ? ???? ? ??????? ? .
>>
>> 4)Preferred output is,
>> ???????? ??????? ?????? ??? ??????? , ????? ???????? ????? ??????
>> ???????? ??????? ????????????? ????? . ????????? ???????????? ??????
>> ??????????????????? , ??????? ???????? ??????????? , ????????? ???????
>> ?????????? ???????? .
>> I attached the non-breaking prefix file also, I want to add more
>> abbreviations to this
>>
>>
>>
>> --
>> regards,
>> P.Arththika
>>
>> -------------------------------------------------------------------------------------------------------------------------------
>>
>> This e-mail is for the sole use of the intended recipient(s) and may
>> contain confidential and privileged information. If you are not the
>> intended recipient, please contact the sender by reply e-mail and destroy
>> all copies and the original message. Any unauthorized review, use,
>> disclosure, dissemination, forwarding, printing or copying of this email
>> is strictly prohibited and appropriate legal action will be taken.
>> -------------------------------------------------------------------------------------------------------------------------------
>>
>>
>
>

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140105/84ffeab4/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 87, Issue 11
*********************************************

Moses-support Digest, Vol 87, Issue 11

0 Response to "Moses-support Digest, Vol 87, Issue 11"

Post a Comment