Moses-support Digest, Vol 167, Issue 1

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: CFP: WAT2020 (The 7th Workshop on Asian Translation)
(Raj Dabre)

----------------------------------------------------------------------

Message: 1
Date: Mon, 14 Sep 2020 10:26:49 +0900
From: Raj Dabre <prajdabre@gmail.com>
Subject: Re: [Moses-support] CFP: WAT2020 (The 7th Workshop on Asian
Translation)
To: moses-support@mit.edu
Message-ID:
<CAB3gfjAOtqqE7HV0vgRfx6ScegMFiced58zYAn5m-55pv9OraQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear WAT participants,

If you are participating in the Software Documentation and Wikinews task
(NICT-SAP task), the test set has been released. Kindly look here:
https://github.com/SAP/software-documentation-data-set-for-machine-translation

Please submit your translations by the 16th of September.

Thanks and Regards.

On Sat, 1 Aug 2020, 21:00 Raj Dabre, <prajdabre@gmail.com> wrote:

> Dear potential WAT participants,
>
> If you are participating in the Indic languages task then kindly note that
> the English side of the Bengali--English test set was faulty.
> We have fixed the issue.
>
> Kindly check the updated evaluation set on the Indic task page:
> http://lotus.kuee.kyoto-u.ac.jp/WAT/indic-multilingual/
>
> Sorry if this caused any inconvenience.
>
> Thanks and Regards.
>
> On Wed, Jul 8, 2020 at 8:53 AM Toshiaki Nakazawa <
> nakazawa@logos.t.u-tokyo.ac.jp> wrote:
>
>> Dear all MT researchers/users,
>>
>> I'm Toshiaki Nakazawa from The University of Tokyo, Japan. This is the
>> call for participation for the MT shared tasks and research papers to
>> the 7th Workshop on Asian Translation (WAT2020), workshop of
>> AACL-IJCNLP 2020. Those who are working on machine translation, please
>> join us.
>>
>> IMPORTANT DATES
>> ---------------
>>
>> August 28, 2020 Translation Task Submission Deadline
>> September 18, 2020 ? Research Paper Submission Deadline
>> October 23, 2020 ? Notification of Acceptance for Research Papers
>> October 23, 2020 System Description Paper Submission Deadline
>> October 30, 2020 Review Feedback of System Description Papers
>> November 6, 2020 - Camera-ready Deadline
>> December 4-7, 2020 Workshop Dates (one of these days)
>>
>> * All deadlines are calculated at 11:59PM UTC-12
>>
>> Best regards,
>>
>>
>> ---------------------------------------------------------------------------
>> WAT2020
>> (The 7th Workshop on Asian Translation)
>> in conjunction with AACL-IJCNLP2020
>> http://lotus.kuee.kyoto-u.ac.jp/WAT/
>> December 4-7, 2020, Suzhou, China (ONLINE)
>>
>> Following the success of the previous WAT workshops (WAT2014 --
>> WAT2019), WAT2020 will bring together machine translation researchers
>> and users to try, evaluate, share and discuss brand-new ideas about
>> machine translation. For the 7th WAT, we will include the following
>> new translation tasks:
>>
>> * Japanese <--> English multimodal task
>> * Document-level test set for Japanese <--> English newswire task
>> * Hindi/Thai/Malay/Indonesian <--> English IT-domain and Wikinews task
>> * Odia <--> English mixed-domain task
>>
>> together with the following continuing tasks:
>>
>> * English/Chinese <--> Japanese scientific paper task
>> * English/Chinese/Korean <--> Japanese patent task
>> * English <--> Japanese newswire task
>> * Russian <--> Japanese news commentary task
>> * Myanmar <--> English mixed-domain task
>> * Khmer <--> English mixed-domain task
>> * Indian language <--> English mixed-domain multilingual translation task
>>
>> * English --> Hindi multimodal task
>>
>> In addition to the shared tasks, the workshop will also feature
>> scientific papers on topics related to the machine translation,
>> especially for Asian languages. Topics of interest include, but are
>> not limited to:
>>
>> - analysis of the automatic/human evaluation results in the past WAT
>> workshops
>> - word-/phrase-/syntax-/semantics-/rule-based, neural and hybrid
>> machine translation
>> - Asian language processing
>> - incorporating linguistic information into machine translation
>> - decoding algorithms
>> - system combination
>> - error analysis
>> - manual and automatic machine translation evaluation
>> - machine translation applications
>> - quality estimation
>> - domain adaptation
>> - machine translation for low resource languages
>> - language resources
>>
>> ************************* IMPORTANT NOTICE *************************
>> Participants of the previous workshop are also required to sign up to
>> WAT2020
>> ********************************************************************
>>
>> TRANSLATION TASKS
>> -----------------
>>
>> The task is to improve the text translation quality for scientific
>> papers and patent documents. Participants choose any of the subtasks
>> in which they would like to participate and translate the test data
>> using their machine translation systems. The WAT organizers will
>> evaluate the results submitted using automatic evaluation and human
>> evaluation. We will also provide a baseline machine translation.
>>
>> Tasks:
>> Scientific Paper: [Asian Scientific Paper Excerpt Corpus (ASPEC)]
>> English/Chinese <--> Japanese
>> Patent: [Japan Patent Office Patent Corpus 2.0 (JPC2)]
>> English/Chinese/Korean <--> Japanese
>> Newswire: [JIJI Corpus] (document-level testset is newly added)
>> Japanese <--> English
>> News Commentary:
>> Japanese <--> Russian (Japanese <--> English and English <-->
>> Russian included)
>> IT Documentation and Wikinews: [SAP-NICT Corpus]
>> Hindi/Thai/Malay/Indonesian <--> English [ALT and other mixed
>> corpora] NEW!!
>> Mixed domain:
>> Myanmar <--> English [UCSY and ALT corpora]
>> Khmer <--> English [ECCC and ALT corpora]
>> Indic:
>> Indian Language <--> English multilingual [Assorted Corpus from
>> various sources]
>> Odia <--> English [UFAL (EnOdia) corpus] NEW!!
>> Multimodal:
>> Hindi --> English Multimodal [Hindi Visual Genome corpus]
>> Japanese <--> English Multimodal [Flickr30kEnt-JP corpus] NEW!!
>>
>> Dataset:
>>
>> * Scientific paper
>>
>> WAT uses ASPEC for the dataset including training, development,
>> development test and test data. Participants of the scientific papers
>> subtask must get a copy of ASPEC by themselves. ASPEC consists of
>> approximately 3 million Japanese-English parallel sentences from paper
>> abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese
>> paper excerpts (ASPEC-JC)
>>
>> * Patent
>>
>> WAT uses JPO Patent Corpus, which is constructed by Japan Patent
>> Office (JPO). This corpus consists of 1 million English-Japanese
>> parallel sentences, 1 million Chinese-Japanese parallel sentences, and
>> 1 million Korean-Japanese parallel sentences from patent description
>> with four categories. Participants of patent tasks are required to get
>> it on WAT2019 site of JPO Patent Corpus.
>>
>> - English/Chinese/Korean <--> Japanese:
>> These tasks evaluate performance of a translation model similarly as
>> the other translation tasks. Differing from the previous tasks at
>> WAT2015, WAT2016 and WAT2017, new test sets of these tasks consists
>> of (a) patent documents published between 2011 and 2013, which were
>> used in the past years' WAT, and (b) ones published between 2016 and
>> 2017 for each language pair. We will also evaluate performance of the
>> section (a) so as to compare systems submitted in the past years'
>> WAT.
>>
>> - Chinese -> Japanese expression pattern task:
>> This task evaluates performance of a translation model for each
>> predifined category of expression patterns, which corresponds to
>> title of invention (TIT), abstract (ABS), scope of claim (CLM) or
>> description (DES). Test set of this task consists of sentences each
>> of which is annotated with a corresponding category of expression
>> patterns.
>>
>> * Newswire
>>
>> WAT uses JIJI Corpus, which is constructed by Jiji Press Ltd. in
>> collaboration with the National Institute of Information and
>> Communications Technology (NICT). This corpus consists of a
>> Japanese-English news corpus of 200K parallel sentences, from Jiji
>> Press news with various categories. At WAT2020, the organizers newly
>> added a new document-level translation testset, which consists of
>> manually filtered test and reference sentences and document-level
>> context of the test sentences. Participants of the newswire subtask
>> are required to get it on WAT2020 site of JIJI Corpus.
>>
>> * News Commentary
>>
>> WAT uses a manually aligned and cleaned Japanese <--> Russian corpus
>> from the News Commentary domain to study extremely low resource
>> situations for distant language pairs. The parallel corpus contains
>> around 12,000 lines. This year, we invite participants to utilize any
>> existing monolingual or parallel corpora from WMT 2020 in addition to
>> those listed on the WAT website. In particular, solutions focusing on
>> monolingual pretraining and multilingualism are encouraged.
>>
>> * IT and Wikinews
>>
>> - Hindi/Thai/Malay/Indonesian <--> English
>>
>> In collaboration with SAP and NICT, WAT is organising a pilot
>> translation task to/from English to/from Hindi, Thai, Malay and
>> Indonesian. The evaluation data belongs to the IT domain (Software
>> Documentation) and Wikinews domain (Asian Language Treebank).
>> Participants will be expected to train systems and submit translations
>> for all language pairs (to and from English) and both domains using
>> any existing monolingual or parallel data. Given the growing focus on
>> a universal translation model for multiple languages and domains, WAT
>> encourages a single multilingual and multi-domain model for all
>> language pairs and both domains (IT as well as Wikinews). Additional
>> details will be given on the WAT 2020 website.
>>
>> * Mixed domain
>>
>> - Myanmar (Burmese) <--> English
>> WAT uses UCSY Corpus and ALT Corpus. The UCSY corpus and a portion of
>> the ALT corpus are use as training data, which are around 220,000
>> lines of sentences and phrases. The development and test data are
>> from the ALT corpus.
>>
>> - Khmer <--> English
>> WAT uses ECCC Corpus and ALT Corpus. The ECCC corpus and a portion of
>> the ALT corpus are use as training data, which are around 120,000
>> lines of sentences and phrases. The development and test data are
>> from the ALT corpus.
>>
>> * Indic
>>
>> - Odia <--> English
>> For the first time, WAT organizing a translation task for the low
>> resource language Odia. WAT will use the OdiEnCorp2.0 corpus collected
>> by researchers at Idiap Research Institute and UFAL. The training data
>> contains around 98K parallel sentences covering different domains.
>>
>> - Indian language <--> English multilingual translation task. This
>> task is being revived after 2018 with major revisions. There has been
>> an increase in the available datasets for Indian languages in the last
>> couple of years along with major advances in multilingual learning.
>> The task will involve training a single model for multiple Indian
>> languages to English (and vice-versa) translation. The goal is to
>> encourage exploration of methods which utilize language relatedness to
>> improve translation quality for low-resource languages while having a
>> single, compact translation model. The training set would be compiled
>> from many publicly available datasets spanning 7-8 Indian languages.
>>
>> * Multimodal
>>
>> - Hindi --> English Multimodal (Visual Genome) After a warm
>> response from the participants for the ?WAT 2019 Multimodal
>> Translation Task?, WAT will continue organizing a multimodal English
>> --> Hindi translation task where the input will be text and an Image
>> and the output will be a caption (text). The training set contains
>> around 30,000 segments. Additional details will be given on the task
>> website.
>>
>> - Japanese <--> English Multimodal (Flickr30kEnt-JP)
>> Details of this task will be announced later. We will use the
>> Flickr30kEnt-JP corpus for this task.
>> https://github.com/nlab-mpg/Flickr30kEnt-JP
>>
>> EVALUATION
>> ----------
>>
>> Automatic evaluation:
>> We are providing an automatic evaluation server. It is free for
>> everyone, but you need to create an account for evaluation. Just
>> showing the list of evaluation results does not require an account.
>>
>> Sign-up: http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2020/
>> Eval. result: http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/
>>
>> Human evaluation:
>> Both crowdsourcing evaluation and JPO adequacy evaluation will be
>> carried out for selected subtasks and selected submitted systems (the
>> details will be announced later).
>>
>> INVITED TALK
>> ------------
>>
>> TBA
>>
>> ORGANIZERS
>> ----------
>>
>> Toshiaki Nakazawa, The University of Tokyo, Japan
>> Hideki Nakayama, The University of Tokyo, Japan
>> Chenchen Ding, National Institute of Information and Communications
>> Technology (NICT), Japan
>> Raj Dabre, National Institute of Information and Communications
>> Technology (NICT), Japan
>> Hiroshi Manabe, National Institute of Information and Communications
>> Technology (NICT), Japan
>> Anoop Kunchukuttan, Microsoft, India
>> Win Pa Pa, University of Computer Studies, Yangon (UCSY), Myanmar
>> Ond?ej Bojar, Charles University, Prague, Czech Republic
>> Shantipriya Parida, Idiap Research Institute, Martigny, Switzerland
>> Isao Goto, Japan Broadcasting Corporation (NHK), Japan
>> Hidaya Mino, Japan Broadcasting Corporation (NHK), Japan
>> Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST), Japan
>> Sadao Kurohashi, Kyoto University, Japan
>> Pushpak Bhattacharyya, Indian Institute of Technology Bombay (IITB), India
>>
>> CONTACT
>> -------
>>
>> wat-organizer@googlegroups.com
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
> --
> Raj Dabre.
> Researcher at NICT, Japan.
> Ph.D., Graduate School of Informatics, Kyoto University.
> M.Tech., IIT Bombay.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200913/e8494d87/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 167, Issue 1
*********************************************

Moses-support Digest, Vol 167, Issue 1

0 Response to "Moses-support Digest, Vol 167, Issue 1"

Post a Comment