Moses-support Digest, Vol 167, Issue 2

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: CFP: WAT2020 (The 7th Workshop on Asian Translation)
(Raj Dabre)


----------------------------------------------------------------------

Message: 1
Date: Tue, 15 Sep 2020 21:52:29 +0900
From: Raj Dabre <prajdabre@gmail.com>
Subject: Re: [Moses-support] CFP: WAT2020 (The 7th Workshop on Asian
Translation)
To: moses-support@mit.edu
Message-ID:
<CAB3gfjDizUpY+8BTjsZzZcj+5to=9E_+dsvh29enRjnWemge1Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear all,

Sorry for the spam but I mistyped the shared task deadline.

The correct deadline is the 18th of September.

Sorry for the confusion.


On Mon, 14 Sep 2020, 10:26 Raj Dabre, <prajdabre@gmail.com> wrote:

> Dear WAT participants,
>
> If you are participating in the Software Documentation and Wikinews task
> (NICT-SAP task), the test set has been released. Kindly look here:
> https://github.com/SAP/software-documentation-data-set-for-machine-translation
>
>
> Please submit your translations by the 16th of September.
>
> Thanks and Regards.
>
> On Sat, 1 Aug 2020, 21:00 Raj Dabre, <prajdabre@gmail.com> wrote:
>
>> Dear potential WAT participants,
>>
>> If you are participating in the Indic languages task then kindly note
>> that the English side of the Bengali--English test set was faulty.
>> We have fixed the issue.
>>
>> Kindly check the updated evaluation set on the Indic task page:
>> http://lotus.kuee.kyoto-u.ac.jp/WAT/indic-multilingual/
>>
>> Sorry if this caused any inconvenience.
>>
>> Thanks and Regards.
>>
>> On Wed, Jul 8, 2020 at 8:53 AM Toshiaki Nakazawa <
>> nakazawa@logos.t.u-tokyo.ac.jp> wrote:
>>
>>> Dear all MT researchers/users,
>>>
>>> I'm Toshiaki Nakazawa from The University of Tokyo, Japan. This is the
>>> call for participation for the MT shared tasks and research papers to
>>> the 7th Workshop on Asian Translation (WAT2020), workshop of
>>> AACL-IJCNLP 2020. Those who are working on machine translation, please
>>> join us.
>>>
>>> IMPORTANT DATES
>>> ---------------
>>>
>>> August 28, 2020 Translation Task Submission Deadline
>>> September 18, 2020 ? Research Paper Submission Deadline
>>> October 23, 2020 ? Notification of Acceptance for Research Papers
>>> October 23, 2020 System Description Paper Submission Deadline
>>> October 30, 2020 Review Feedback of System Description Papers
>>> November 6, 2020 - Camera-ready Deadline
>>> December 4-7, 2020 Workshop Dates (one of these days)
>>>
>>> * All deadlines are calculated at 11:59PM UTC-12
>>>
>>> Best regards,
>>>
>>>
>>> ---------------------------------------------------------------------------
>>> WAT2020
>>> (The 7th Workshop on Asian Translation)
>>> in conjunction with AACL-IJCNLP2020
>>> http://lotus.kuee.kyoto-u.ac.jp/WAT/
>>> December 4-7, 2020, Suzhou, China (ONLINE)
>>>
>>> Following the success of the previous WAT workshops (WAT2014 --
>>> WAT2019), WAT2020 will bring together machine translation researchers
>>> and users to try, evaluate, share and discuss brand-new ideas about
>>> machine translation. For the 7th WAT, we will include the following
>>> new translation tasks:
>>>
>>> * Japanese <--> English multimodal task
>>> * Document-level test set for Japanese <--> English newswire task
>>> * Hindi/Thai/Malay/Indonesian <--> English IT-domain and Wikinews task
>>> * Odia <--> English mixed-domain task
>>>
>>> together with the following continuing tasks:
>>>
>>> * English/Chinese <--> Japanese scientific paper task
>>> * English/Chinese/Korean <--> Japanese patent task
>>> * English <--> Japanese newswire task
>>> * Russian <--> Japanese news commentary task
>>> * Myanmar <--> English mixed-domain task
>>> * Khmer <--> English mixed-domain task
>>> * Indian language <--> English mixed-domain multilingual translation
>>> task
>>>
>>> * English --> Hindi multimodal task
>>>
>>> In addition to the shared tasks, the workshop will also feature
>>> scientific papers on topics related to the machine translation,
>>> especially for Asian languages. Topics of interest include, but are
>>> not limited to:
>>>
>>> - analysis of the automatic/human evaluation results in the past WAT
>>> workshops
>>> - word-/phrase-/syntax-/semantics-/rule-based, neural and hybrid
>>> machine translation
>>> - Asian language processing
>>> - incorporating linguistic information into machine translation
>>> - decoding algorithms
>>> - system combination
>>> - error analysis
>>> - manual and automatic machine translation evaluation
>>> - machine translation applications
>>> - quality estimation
>>> - domain adaptation
>>> - machine translation for low resource languages
>>> - language resources
>>>
>>> ************************* IMPORTANT NOTICE *************************
>>> Participants of the previous workshop are also required to sign up to
>>> WAT2020
>>> ********************************************************************
>>>
>>> TRANSLATION TASKS
>>> -----------------
>>>
>>> The task is to improve the text translation quality for scientific
>>> papers and patent documents. Participants choose any of the subtasks
>>> in which they would like to participate and translate the test data
>>> using their machine translation systems. The WAT organizers will
>>> evaluate the results submitted using automatic evaluation and human
>>> evaluation. We will also provide a baseline machine translation.
>>>
>>> Tasks:
>>> Scientific Paper: [Asian Scientific Paper Excerpt Corpus (ASPEC)]
>>> English/Chinese <--> Japanese
>>> Patent: [Japan Patent Office Patent Corpus 2.0 (JPC2)]
>>> English/Chinese/Korean <--> Japanese
>>> Newswire: [JIJI Corpus] (document-level testset is newly added)
>>> Japanese <--> English
>>> News Commentary:
>>> Japanese <--> Russian (Japanese <--> English and English <-->
>>> Russian included)
>>> IT Documentation and Wikinews: [SAP-NICT Corpus]
>>> Hindi/Thai/Malay/Indonesian <--> English [ALT and other mixed
>>> corpora] NEW!!
>>> Mixed domain:
>>> Myanmar <--> English [UCSY and ALT corpora]
>>> Khmer <--> English [ECCC and ALT corpora]
>>> Indic:
>>> Indian Language <--> English multilingual [Assorted Corpus from
>>> various sources]
>>> Odia <--> English [UFAL (EnOdia) corpus] NEW!!
>>> Multimodal:
>>> Hindi --> English Multimodal [Hindi Visual Genome corpus]
>>> Japanese <--> English Multimodal [Flickr30kEnt-JP corpus] NEW!!
>>>
>>> Dataset:
>>>
>>> * Scientific paper
>>>
>>> WAT uses ASPEC for the dataset including training, development,
>>> development test and test data. Participants of the scientific papers
>>> subtask must get a copy of ASPEC by themselves. ASPEC consists of
>>> approximately 3 million Japanese-English parallel sentences from paper
>>> abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese
>>> paper excerpts (ASPEC-JC)
>>>
>>> * Patent
>>>
>>> WAT uses JPO Patent Corpus, which is constructed by Japan Patent
>>> Office (JPO). This corpus consists of 1 million English-Japanese
>>> parallel sentences, 1 million Chinese-Japanese parallel sentences, and
>>> 1 million Korean-Japanese parallel sentences from patent description
>>> with four categories. Participants of patent tasks are required to get
>>> it on WAT2019 site of JPO Patent Corpus.
>>>
>>> - English/Chinese/Korean <--> Japanese:
>>> These tasks evaluate performance of a translation model similarly as
>>> the other translation tasks. Differing from the previous tasks at
>>> WAT2015, WAT2016 and WAT2017, new test sets of these tasks consists
>>> of (a) patent documents published between 2011 and 2013, which were
>>> used in the past years' WAT, and (b) ones published between 2016 and
>>> 2017 for each language pair. We will also evaluate performance of the
>>> section (a) so as to compare systems submitted in the past years'
>>> WAT.
>>>
>>> - Chinese -> Japanese expression pattern task:
>>> This task evaluates performance of a translation model for each
>>> predifined category of expression patterns, which corresponds to
>>> title of invention (TIT), abstract (ABS), scope of claim (CLM) or
>>> description (DES). Test set of this task consists of sentences each
>>> of which is annotated with a corresponding category of expression
>>> patterns.
>>>
>>> * Newswire
>>>
>>> WAT uses JIJI Corpus, which is constructed by Jiji Press Ltd. in
>>> collaboration with the National Institute of Information and
>>> Communications Technology (NICT). This corpus consists of a
>>> Japanese-English news corpus of 200K parallel sentences, from Jiji
>>> Press news with various categories. At WAT2020, the organizers newly
>>> added a new document-level translation testset, which consists of
>>> manually filtered test and reference sentences and document-level
>>> context of the test sentences. Participants of the newswire subtask
>>> are required to get it on WAT2020 site of JIJI Corpus.
>>>
>>> * News Commentary
>>>
>>> WAT uses a manually aligned and cleaned Japanese <--> Russian corpus
>>> from the News Commentary domain to study extremely low resource
>>> situations for distant language pairs. The parallel corpus contains
>>> around 12,000 lines. This year, we invite participants to utilize any
>>> existing monolingual or parallel corpora from WMT 2020 in addition to
>>> those listed on the WAT website. In particular, solutions focusing on
>>> monolingual pretraining and multilingualism are encouraged.
>>>
>>> * IT and Wikinews
>>>
>>> - Hindi/Thai/Malay/Indonesian <--> English
>>>
>>> In collaboration with SAP and NICT, WAT is organising a pilot
>>> translation task to/from English to/from Hindi, Thai, Malay and
>>> Indonesian. The evaluation data belongs to the IT domain (Software
>>> Documentation) and Wikinews domain (Asian Language Treebank).
>>> Participants will be expected to train systems and submit translations
>>> for all language pairs (to and from English) and both domains using
>>> any existing monolingual or parallel data. Given the growing focus on
>>> a universal translation model for multiple languages and domains, WAT
>>> encourages a single multilingual and multi-domain model for all
>>> language pairs and both domains (IT as well as Wikinews). Additional
>>> details will be given on the WAT 2020 website.
>>>
>>> * Mixed domain
>>>
>>> - Myanmar (Burmese) <--> English
>>> WAT uses UCSY Corpus and ALT Corpus. The UCSY corpus and a portion of
>>> the ALT corpus are use as training data, which are around 220,000
>>> lines of sentences and phrases. The development and test data are
>>> from the ALT corpus.
>>>
>>> - Khmer <--> English
>>> WAT uses ECCC Corpus and ALT Corpus. The ECCC corpus and a portion of
>>> the ALT corpus are use as training data, which are around 120,000
>>> lines of sentences and phrases. The development and test data are
>>> from the ALT corpus.
>>>
>>> * Indic
>>>
>>> - Odia <--> English
>>> For the first time, WAT organizing a translation task for the low
>>> resource language Odia. WAT will use the OdiEnCorp2.0 corpus collected
>>> by researchers at Idiap Research Institute and UFAL. The training data
>>> contains around 98K parallel sentences covering different domains.
>>>
>>> - Indian language <--> English multilingual translation task. This
>>> task is being revived after 2018 with major revisions. There has been
>>> an increase in the available datasets for Indian languages in the last
>>> couple of years along with major advances in multilingual learning.
>>> The task will involve training a single model for multiple Indian
>>> languages to English (and vice-versa) translation. The goal is to
>>> encourage exploration of methods which utilize language relatedness to
>>> improve translation quality for low-resource languages while having a
>>> single, compact translation model. The training set would be compiled
>>> from many publicly available datasets spanning 7-8 Indian languages.
>>>
>>> * Multimodal
>>>
>>> - Hindi --> English Multimodal (Visual Genome) After a warm
>>> response from the participants for the ?WAT 2019 Multimodal
>>> Translation Task?, WAT will continue organizing a multimodal English
>>> --> Hindi translation task where the input will be text and an Image
>>> and the output will be a caption (text). The training set contains
>>> around 30,000 segments. Additional details will be given on the task
>>> website.
>>>
>>> - Japanese <--> English Multimodal (Flickr30kEnt-JP)
>>> Details of this task will be announced later. We will use the
>>> Flickr30kEnt-JP corpus for this task.
>>> https://github.com/nlab-mpg/Flickr30kEnt-JP
>>>
>>> EVALUATION
>>> ----------
>>>
>>> Automatic evaluation:
>>> We are providing an automatic evaluation server. It is free for
>>> everyone, but you need to create an account for evaluation. Just
>>> showing the list of evaluation results does not require an account.
>>>
>>> Sign-up: http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2020/
>>> Eval. result: http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/
>>>
>>> Human evaluation:
>>> Both crowdsourcing evaluation and JPO adequacy evaluation will be
>>> carried out for selected subtasks and selected submitted systems (the
>>> details will be announced later).
>>>
>>> INVITED TALK
>>> ------------
>>>
>>> TBA
>>>
>>> ORGANIZERS
>>> ----------
>>>
>>> Toshiaki Nakazawa, The University of Tokyo, Japan
>>> Hideki Nakayama, The University of Tokyo, Japan
>>> Chenchen Ding, National Institute of Information and Communications
>>> Technology (NICT), Japan
>>> Raj Dabre, National Institute of Information and Communications
>>> Technology (NICT), Japan
>>> Hiroshi Manabe, National Institute of Information and Communications
>>> Technology (NICT), Japan
>>> Anoop Kunchukuttan, Microsoft, India
>>> Win Pa Pa, University of Computer Studies, Yangon (UCSY), Myanmar
>>> Ond?ej Bojar, Charles University, Prague, Czech Republic
>>> Shantipriya Parida, Idiap Research Institute, Martigny, Switzerland
>>> Isao Goto, Japan Broadcasting Corporation (NHK), Japan
>>> Hidaya Mino, Japan Broadcasting Corporation (NHK), Japan
>>> Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST), Japan
>>> Sadao Kurohashi, Kyoto University, Japan
>>> Pushpak Bhattacharyya, Indian Institute of Technology Bombay (IITB),
>>> India
>>>
>>> CONTACT
>>> -------
>>>
>>> wat-organizer@googlegroups.com
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>> --
>> Raj Dabre.
>> Researcher at NICT, Japan.
>> Ph.D., Graduate School of Informatics, Kyoto University.
>> M.Tech., IIT Bombay.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200915/95563296/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 167, Issue 2
*********************************************

Related Posts :

0 Response to "Moses-support Digest, Vol 167, Issue 2"

Post a Comment