Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. PMIndia - A Collection of Parallel Corpora of Languages of
India (Barry Haddow)
2. Re: PMIndia - A Collection of Parallel Corpora of Languages
of India (Thoudam Doren Singh)
3. First CFP: 5th Workshop on Indian Language Data: Resources
and Evaluation (WILDRE-5) under LREC 2020. (Atul Kr. Ojha)
----------------------------------------------------------------------
Message: 1
Date: Wed, 29 Jan 2020 11:44:18 +0000
From: Barry Haddow <bhaddow@inf.ed.ac.uk>
Subject: [Moses-support] PMIndia - A Collection of Parallel Corpora of
Languages of India
To: <moses-support@mit.edu>
Message-ID: <c1dc7236-f1fe-0c42-e827-663d5e56217f@inf.ed.ac.uk>
Content-Type: text/plain; charset=utf-8; format=flowed
Hi All
We have released a new sentence aligned corpora pairing English with 13
different languages spoken in India. Up to 56k sentence pairs are
available for each pair. The languages of India contained in the corpora
are Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Manipuri,
Marathi, Odia, Punjabi, Tamil, Telugu and Urdu. We also provide a larger
version of the corpus, document-aligned only.
The corpus is available here: http://data.statmt.org/pmindia/
There is an accompanying paper which describes the construction of the
corpus, a comparison of alignment methods, and some initial MT results.
https://arxiv.org/abs/2001.09907
Barry Haddow and Faheem Kirefu
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
------------------------------
Message: 2
Date: Wed, 29 Jan 2020 20:06:53 +0530
From: Thoudam Doren Singh <thoudam.doren@gmail.com>
Subject: Re: [Moses-support] PMIndia - A Collection of Parallel
Corpora of Languages of India
To: Barry Haddow <bhaddow@inf.ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAC7R0zz-cJrs0yLVzDzadMAkjfU8V_-BZM4bJ8w2MxW4ruQjzw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi Barry
Good job. For some language pairs below 10k, it's quite appealing BLEU
scores as reported.
Best Regards
Doren
On Wednesday, January 29, 2020, Barry Haddow <bhaddow@inf.ed.ac.uk> wrote:
> Hi All
>
> We have released a new sentence aligned corpora pairing English with 13
> different languages spoken in India. Up to 56k sentence pairs are
> available for each pair. The languages of India contained in the corpora
> are Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Manipuri,
> Marathi, Odia, Punjabi, Tamil, Telugu and Urdu. We also provide a larger
> version of the corpus, document-aligned only.
>
> The corpus is available here: http://data.statmt.org/pmindia/
>
> There is an accompanying paper which describes the construction of the
> corpus, a comparison of alignment methods, and some initial MT results.
>
> https://arxiv.org/abs/2001.09907
>
>
> Barry Haddow and Faheem Kirefu
>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200129/3a80b210/attachment-0001.html
------------------------------
Message: 3
Date: Thu, 2 Jan 2020 14:17:34 +0100
From: "Atul Kr. Ojha" <shashwatup9k@gmail.com>
Subject: [Moses-support] First CFP: 5th Workshop on Indian Language
Data: Resources and Evaluation (WILDRE-5) under LREC 2020.
To: undisclosed-recipients:;
Message-ID:
<CACvVY2j4D51rPPaeQkWxcBTTMV7+Ut+=tL-WtAd-TvKgoa8V8w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Dear all,
Apologies for cross-posting. you are requested to please circulate it for
wider publicity...
*...........................................................................................................................................................*
* 5th *
*Workshop on Indian Language Data: Resources and Evaluation (WILDRE-5) *
*Date: Saturday, 16**th** May 2020*
*Venue: *Le Palais du Pharo, Marseille (France) (Organized under the
platform of LREC2020 (11-16 May 2020))
*Website:*
-
*Main website* - *
<http://sanskrit.jnu.ac.in/conf/wildre5>http://sanskrit.jnu.ac.in/conf/wildre5
<http://sanskrit.jnu.ac.in/conf/wildre5>*
-
*Submit papers on-
<https://www.softconf.com/lrec2020/WILDRE-5/>https://www.softconf.com/lrec2020/WILDRE-5/
<https://www.softconf.com/lrec2020/WILDRE-5/>*
-
*LREC website:
<http://lrec2020.lrec-conf.org/en/>http://lrec2020.lrec-conf.org/en/
<http://lrec2020.lrec-conf.org/en/>*
*.................................................................................................................................................*
WILDRE ? the 5th workshop on Indian Language Data: Resources and Evaluation
is being organized in Marseille (France) on 16th May 2020 under the LREC
platform. India has a huge linguistic diversity and has seen concerted
efforts from the Indian government and industry towards developing language
resources. European Language Resource Association (ELRA) and its associate
organizations have been very active and successful in addressing the
challenges and opportunities related to language resource creation and
evaluation. It is, therefore, a great opportunity for resource creators of
Indian languages to showcase their work on this platform and also to
interact and learn from those involved in similar initiatives all over the
world. In addition to research papers, WILDRE-5 is going to organize *shared
task on ?Universal Dependency based Indian Language Parser*? (the details
will be shared very soon on the workshop website). The broader objectives
of the WILDRE will be
-
To map the status of Indian Language Resources
-
To investigate challenges related to creating and sharing various levels
of language resources
-
To promote a dialogue between language resource developers and users
-
To provide an opportunity for researchers from India to collaborate with
researchers from other parts of the world
*IMPORTANT DATES*
*Short/Long paper, Poster and Demo Dates*
*February 06, 2020*: Paper submissions due
*March 13, 2020*: Paper acceptance notification
*April 02, 2020*: Camera-ready papers due
*May 16, 2020*: Workshop
*Shared Task Dates*
*January 20, 2020*: Data set Release for Shared Task
*February 14, 2020*: Test Set Release
*February 20, 2020*: System Submission Due
*February 24, 2020*: Results Announcement
*March 08, 2020*: System Description Paper Due
*March 13, 2020*: Paper notification
*April 02, 2020*: Camera-ready papers due
*SUBMISSIONS *
Papers must describe original, completed or in progress, and unpublished
work. Each submission will be reviewed by three program committee members.
Accepted papers will be given up to 10 pages (for full papers) 5 pages (for
short papers and posters) in the workshop proceedings, and will be
presented oral presentation or poster.
Papers should be formatted according to the LREC style-sheet, which is
provided on the LREC 2020 website (
<https://lrec2020.lrec-conf.org/en/submission2020/authors-kit/>
https://lrec2020.lrec-conf.org/en/submission2020/authors-kit/). Please
submit papers in PDF format to the LREC website.
We are seeking submissions under the following category
-
Full papers (10 pages)
-
Short papers (work in progress ? 5 pages)
-
Posters (innovative ideas/proposals, a research proposal of students)
-
Demo (of working online/standalone systems)
WILDRE-5 will have a special focus on Demos of Indian Language Technology.
In the past few years, as more resources have been developed and made
available, there has been increased activity in developing usable
technology using these. WILDRE-5 would like to encourage and widen the Demo
track to allow the community to showcase their demos and have mutually
beneficial interactions with each other as well as resource developers.
WILRE-5 will invite technical, policy and position paper submissions on the
following topics related to Indian Language Resources:
-
Digital Humanities, heritage computing
-
Corpora - text, speech, multimodal, methodologies, annotation and tools
-
Lexicons and Machine-readable dictionaries
-
Ontologies, Grammars
-
Language resources for basic NLP, IR, Machine Translation and Speech
Technology tasks, tools and Infrastructure for constructing and sharing
language resources
-
Standards or specifications for language resources applications
-
Licensing and copyright issues
Both submission and review processes handled electronically. The review
process will be *double-blind*. The workshop website will provide the
submission guidelines and the link for the electronic submission.
When submitting a paper from the START page (
<https://www.softconf.com/lrec2020/WILDRE-5/>
https://www.softconf.com/lrec2020/WILDRE-5/), authors will be asked to
provide essential information about resources (in a broad sense, i.e. also
technologies, standards, evaluation kits, etc.) that have been used for the
work described in the paper or are a new result of your research. Moreover,
ELRA encourages all LREC authors to share the described LRs (data, tools,
services, etc.), to enable their reuse, replicability of experiments,
including evaluation ones, etc.
For further information on this initiative, please refer to
http://lrec2020.lrec-conf.org/en/
*Conference Chairs*
-
Girish Nath Jha, Jawaharlal Nehru University, India
-
Kalika Bali, Microsoft Research India Lab, Bangalore, India
-
Sobha L, AU-KBC, Anna University, Chennai, India
-
S. S. Agrawal, KIIT, Gurgaon, India
*Program Committee (to be updated)*
1.
Adil Amin Kak, Kashmir University
2.
Anupam Basu, Director, NIIT, Durgapur
3.
Anil Singh, IIT-BHU, Varanasi
4.
Atul Kr. Ojha, ?FAL, Charles University, Prague, Czech Republic &
Panlingua Language Processing LLP, India
5.
Arul Mozhi, University of Hyderabad
6.
Asif Iqbal, IIT Patna, Patna
7.
Bogdan Babych, University of Leeds, UK
8.
Claudia Soria, CNR-ILC, Italy
9.
Dan Zeman, ?FAL, Charles University, Prague, Czech Republic
10.
Delyth Prys, Bangor University, UK
11.
Dipti Mishra Sharma, IIIT, Hyderabad
12.
Diwakr Mishra, Amazon-Banglore, India
13.
Dorothee Beermann, Norwegian University of Science and Technology (NTNU)
14.
Elizabeth Sherley, IITM-Kerala, Trivandrum
15.
Esha Banerjee, Google, USA
16.
Eveline Wandl-Vogt, Austrian Academy of Sciences, Austria
17.
Georg Rehm, DFKI, Germany
18.
Girish Nath Jha, Jawaharlal Nehru University, New Delhi
19.
Jan Odijk, Utrecht University, The Netherlands
20.
Jolanta Bachan, Adam Mickiewicz University, Poland
21.
Joseph Mariani, LIMSI-CNRS, France
22.
Jyoti D. Pawar, Goa University
23.
Kalika Bali, MSRI, Bangalore
24.
Khalid Choukri, ELRA, France
25.
Lars Hellan, NTNU, Norway
26.
Malhar Kulkarni, IIT Bombay
27.
Manji Bhadra, Bankura University, West Bengal
28.
Marko Tadic, Croatian Academy of Sciences and Arts, Croatia
29.
Massimo Monaglia, University of Florence, Italy
30.
Monojit Choudhary, MSRI Bangalore
31.
Narayan Choudhary, CIIL, Mysore
32.
Nicoletta Calzolari, ILC-CNR, Pisa, Italy
33.
Niladri Shekhar Dash, ISI Kolkata
34.
Panchanan Mohanty, GLA, Mathura
35.
Pinky Nainwani, Cognizant Technology Solutions, Bangalore
36.
Pushpak Bhattacharya, Director, IIT Patna
37.
Rajeev R R, ICFOSS, Trivandrum
38.
Ritesh Kumar, Agra University
39.
S.k. Shrivastava, Head, TDIL, MEITY, Govt of India
40.
S.S. Agrawal, KIIT, Gurgaon, India
41.
Sachin Kumar, EZDI, Ahmedabad
42.
Santanu Chaudhury, Director, IIT Jodhpur
43.
Shivaji Bandhopadhyay, Director, NIT, Silchar
44.
Sobha L, AU-KBC Research Centre, Anna University
45.
Stelios Piperidis, ILSP, Greece
46.
Subhash Chandra, Delhi University
47.
Swaran Lata, Retired Head, TDIL, MCIT, Govt of India
48.
Virach Sornlertlamvanich, Thammasat University, Bangkok, Thailand
49.
Vishal Goyal, Punjabi University, Patiala
50.
Zygmunt Vetulani, Adam Mickiewicz University, Poland
*Workshop Manager and contact:*
*Atul Kr. Ojha*, ?FAL, Charles University, Prague, Czech Republic
*shashwatup9k@gmail.com
<shashwatup9k@gmail.com>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200102/03bf8603/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: WILDRE-5_CFP.pdf
Type: application/pdf
Size: 143616 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20200102/03bf8603/attachment.pdf
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 159, Issue 10
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 159, Issue 10"
Post a Comment