Moses-support Digest, Vol 87, Issue 5

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: problem in tokenization (Arththika Paramanathan)
2. Cannot Run MOSES after reinstalling Ubuntu 12.04 (Asad A.Malik)
3. Re: problem in tokenization (Renu Kumar)


----------------------------------------------------------------------

Message: 1
Date: Fri, 3 Jan 2014 23:33:52 +0530
From: Arththika Paramanathan <arthiparamanathan@gmail.com>
Subject: Re: [Moses-support] problem in tokenization
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAJSfqEzHN_qG9ixKwaNPVeC7BU4CWT_HX_p5yeniaNJJsEYChQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

1)this is an untokenized sentence,
???????? ??????? ?????? ??? ???????,????? ???????? ????? ?????? ????????
??????? ????????????? ?????.????????? ???????????? ??????
??????????????????? ,??????? ???????? ???????????,????????? ???????
?????????? ????????.

2)the command I gave is,
~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ta <
~/corpus/training/squirrelmail.ta-en.ta > ~/corpus/squirrelmail.ta-en.tok.ta

3)the output is,
??? ? ??? ? ???? ? ?? ?? ? ??? ??? ???? ? ?? , ???? ? ??? ? ??? ? ?? ? ??
?????? ???? ? ?? ? ???? ? ?? ??? ? ?? ? ????? ? ???? ? .?? ? ?????? ????? ?
?????? ??? ? ?? ???? ? ????? ? ????? ? ?? , ??? ? ?? ? ???? ? ??? ??? ? ? ?
???? ? , ??? ? ???? ? ?? ? ??? ? ???? ? ???? ? ??????? ? .

4)Preferred output is,
???????? ??????? ?????? ??? ??????? , ????? ???????? ????? ?????? ????????
??????? ????????????? ????? . ????????? ???????????? ??????
??????????????????? , ??????? ???????? ??????????? , ????????? ???????
?????????? ???????? .

I attached the non-breaking prefix file also, I want to add more
abbreviations to this



--
regards,
P.Arththika
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140103/7b05e553/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nonbreaking_prefix.ta
Type: application/octet-stream
Size: 2674 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20140103/7b05e553/attachment-0001.obj

------------------------------

Message: 2
Date: Fri, 3 Jan 2014 10:29:42 -0800 (PST)
From: "Asad A.Malik" <asad_12204@yahoo.com>
Subject: [Moses-support] Cannot Run MOSES after reinstalling Ubuntu
12.04
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<1388773782.73836.YahooMailNeo@web122201.mail.ne1.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi All,


Due to less space in my Ubuntu drive, I tried to extend it. But before doing that I backuped my whole MOSES directory. And during extending I was in a situation where I have to reinstall Ubuntu 12.04. And after installing it I copied the MOSES directory back to Ubuntu drive, but now it is not working anymore. Is there any way that I can start using again my previous developed SMT.

?


Regards?

Asad A.Malik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140103/13df7b76/attachment-0001.htm

------------------------------

Message: 3
Date: Sat, 4 Jan 2014 06:33:53 +0530
From: Renu Kumar <renu17775@gmail.com>
Subject: Re: [Moses-support] problem in tokenization
To: arthiparamanathan@gmail.com, Hieu.Hoang@ed.ac.uk
Cc: moses-support@mit.edu
Message-ID:
<CAGOzkqSHOT2LE7aGznpOmQEgEG98Gw3n9Z3S54C9SJcW4PQtRg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

I had faced similar problem for Hindi. However I ignored the tokenization
step then & moved ahead. However I would also like to sort this problem and
add any changes needed for Hindi language.

This is generally termed as a golu character that we see in the output and
comes up for vowel characters which are used with another consonant to form
a single character of Hindi (or may be Tamil also --I do not know Tamil but
I think that will be the case for most of the Indian Languages).

Since it is two and in some cases even more than two characters that are
joined to form and infact represent a single character in Hindi.....so when
we use the tokenizer script all the characters are broken up individually
and hence the golu character appears, which infact is the actual
representation of these characters if we look at the Unicode character
chart , and these do not play any role as independent characters.

Any suggestions.
I am also attaching the Unicode character chart for Hindi.

Regards
Renu


---------- Original Message ----------
From: Arththika Paramanathan <arthiparamanathan@gmail.com>
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>
Date: January 3, 2014 at 11:33 PM
Subject: Re: [Moses-support] problem in tokenization
Hi,

1)this is an untokenized sentence,
???????? ??????? ?????? ??? ???????,????? ???????? ????? ?????? ????????
??????? ????????????? ?????.????????? ???????????? ??????
??????????????????? ,??????? ???????? ???????????,????????? ???????
?????????? ????????.

2)the command I gave is,
~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ta <
~/corpus/training/squirrelmail.ta-en.ta >
~/corpus/squirrelmail.ta-en.tok.ta

3)the output is,
??? ? ??? ? ???? ? ?? ?? ? ??? ??? ???? ? ?? , ???? ? ??? ? ??? ? ?? ? ??
?????? ???? ? ?? ? ???? ? ?? ??? ? ?? ? ????? ? ???? ? .?? ? ?????? ????? ?
?????? ??? ? ?? ???? ? ????? ? ????? ? ?? , ??? ? ?? ? ???? ? ??? ??? ? ? ?
???? ? , ??? ? ???? ? ?? ? ??? ? ???? ? ???? ? ??????? ? .

4)Preferred output is,
???????? ??????? ?????? ??? ??????? , ????? ???????? ????? ?????? ????????
??????? ????????????? ????? . ????????? ???????????? ??????
??????????????????? , ??????? ???????? ??????????? , ????????? ???????
?????????? ???????? .
I attached the non-breaking prefix file also, I want to add more
abbreviations to this


2014/1/4 renubalyan <renubalyan@cdac.in>

>
>
> ---------- Original Message ----------
> From: Arththika Paramanathan <arthiparamanathan@gmail.com>
> To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
> Cc: moses-support <moses-support@mit.edu>
> Date: January 3, 2014 at 11:33 PM
> Subject: Re: [Moses-support] problem in tokenization
> Hi,
>
> 1)this is an untokenized sentence,
> ???????? ??????? ?????? ??? ???????,????? ???????? ????? ?????? ????????
> ??????? ????????????? ?????.????????? ???????????? ??????
> ??????????????????? ,??????? ???????? ???????????,????????? ???????
> ?????????? ????????.
>
> 2)the command I gave is,
> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ta <
> ~/corpus/training/squirrelmail.ta-en.ta >
> ~/corpus/squirrelmail.ta-en.tok.ta
>
> 3)the output is,
> ??? ? ??? ? ???? ? ?? ?? ? ??? ??? ???? ? ?? , ???? ? ??? ? ??? ? ?? ? ??
> ?????? ???? ? ?? ? ???? ? ?? ??? ? ?? ? ????? ? ???? ? .?? ? ?????? ????? ?
> ?????? ??? ? ?? ???? ? ????? ? ????? ? ?? , ??? ? ?? ? ???? ? ??? ??? ? ? ?
> ???? ? , ??? ? ???? ? ?? ? ??? ? ???? ? ???? ? ??????? ? .
>
> 4)Preferred output is,
> ???????? ??????? ?????? ??? ??????? , ????? ???????? ????? ?????? ????????
> ??????? ????????????? ????? . ????????? ???????????? ??????
> ??????????????????? , ??????? ???????? ??????????? , ????????? ???????
> ?????????? ???????? .
> I attached the non-breaking prefix file also, I want to add more
> abbreviations to this
>
>
>
> --
> regards,
> P.Arththika
>
> -------------------------------------------------------------------------------------------------------------------------------
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> -------------------------------------------------------------------------------------------------------------------------------
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140104/b819dfa5/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 87, Issue 5
********************************************

0 Response to "Moses-support Digest, Vol 87, Issue 5"

Post a Comment