Moses-support Digest, Vol 162, Issue 4

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. phrase-table with ' " and other strage things.
Additional corpus cleaning necessary? (Artem Shevchenko)
2. Re: Moses-support Digest, Vol 162, Issue 3 (nakul sharma)


----------------------------------------------------------------------

Message: 1
Date: Sun, 5 Apr 2020 01:34:10 +0200
From: Artem Shevchenko <shevart@gmail.com>
Subject: [Moses-support] phrase-table with &apos; &quot; and other
strage things. Additional corpus cleaning necessary?
To: moses-support@mit.edu
Message-ID:
<CACmqYH1r_zc0RYHKXuB=D5jOqd8+MLW68XnM-pg4a+OkaakbtQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,

following the manual for baseline creaition, I have trained the model using
Europarl v9 de-en pair.
Now I observe that obtained phrase table contains a lot of noise.

E.g. a lot of "&apos; ", "&quot;" which seem to distort the model and
decoder.
E.g. truecasing did not work properly with those special symbols:

&quot; ( Das sind sehr ||| &apos; ( these are very ||| 0.5 2.47962e-05
0.333333 7.4064e-05 ||| 0-0 1-1 2-2 3-3 4-4 ||| 2 3 1 ||| |||

Did you do any additional purification of the corpus before training?
Please share your experience.

Artem Shevchenko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200404/c1784b66/attachment-0001.html

------------------------------

Message: 2
Date: Sun, 5 Apr 2020 08:27:45 +0530
From: nakul sharma <nakul777@gmail.com>
Subject: Re: [Moses-support] Moses-support Digest, Vol 162, Issue 3
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAHGfYnqbmym+S5s-fc8kYXqcE-OVEPj0OwREHfFVLoC3Xn+AgQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I have not done transliteration personally but the below link may help:-

https://github.com/anoopkunchukuttan/indic_nlp_resources/blob/master/transliterate/README.md

More specifically

https://arxiv.org/abs/2003.08925On

Prof.(Dr.) Pushpak Bhattacharya is a pioneer in the field of NLP. His works
may be referred on his official website


https://www.cse.iitb.ac.in/~pb/

Hope this helps.


Saturday, April 4, 2020, <moses-support-request@mit.edu> wrote:
> Send Moses-support mailing list submissions to
> moses-support@mit.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mailman.mit.edu/mailman/listinfo/moses-support
> or, via email, send a message with subject or body 'help' to
> moses-support-request@mit.edu
>
> You can reach the person managing the list at
> moses-support-owner@mit.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Moses-support digest..."
>
>
> Today's Topics:
>
> 1. Help on transliteration (Nameet Mankar)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 4 Apr 2020 11:39:29 +0530
> From: Nameet Mankar <nameetdmankar@gmail.com>
> Subject: [Moses-support] Help on transliteration
> To: moses-support@mit.edu
> Message-ID:
> <CA+woudpu51MMgfw5BqWqwGTx_NPar43HN2KhOC1=
EfBdSgm_KA@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello everyone I want to ask help regarding transliteration. How can I do
> that in mosses using indic nlp or other if anyone has done this then I
> kindly ask for their guidance.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://mailman.mit.edu/mailman/private/moses-support/attachments/20200404/a7ce5dc4/attachment.html
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: transliterate.png
> Type: image/png
> Size: 47035 bytes
> Desc: not available
> Url :
http://mailman.mit.edu/mailman/private/moses-support/attachments/20200404/a7ce5dc4/attachment.png
>
> ------------------------------
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> End of Moses-support Digest, Vol 162, Issue 3
> *********************************************
>

--
Thanks & Regards,
Nakul Sharma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200404/68090bf3/attachment-0001.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 162, Issue 4
*********************************************

0 Response to "Moses-support Digest, Vol 162, Issue 4"

Post a Comment