Moses-support Digest, Vol 166, Issue 1

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: CFP: WAT2020 (The 7th Workshop on Asian Translation)
(Raj Dabre)

----------------------------------------------------------------------

Message: 1
Date: Sat, 1 Aug 2020 21:00:54 +0900
From: Raj Dabre <prajdabre@gmail.com>
Subject: Re: [Moses-support] CFP: WAT2020 (The 7th Workshop on Asian
Translation)
To: moses-support@mit.edu
Message-ID:
<CAB3gfjCGv8RqWTa5jk2cNgjDkqGk44-=byPfN-ETaMbE0deYkw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear potential WAT participants,

If you are participating in the Indic languages task then kindly note that
the English side of the Bengali--English test set was faulty.
We have fixed the issue.

Kindly check the updated evaluation set on the Indic task page:
http://lotus.kuee.kyoto-u.ac.jp/WAT/indic-multilingual/

Sorry if this caused any inconvenience.

Thanks and Regards.

On Wed, Jul 8, 2020 at 8:53 AM Toshiaki Nakazawa <
nakazawa@logos.t.u-tokyo.ac.jp> wrote:

> Dear all MT researchers/users,
>
> I'm Toshiaki Nakazawa from The University of Tokyo, Japan. This is the
> call for participation for the MT shared tasks and research papers to
> the 7th Workshop on Asian Translation (WAT2020), workshop of
> AACL-IJCNLP 2020. Those who are working on machine translation, please
> join us.
>
> IMPORTANT DATES
> ---------------
>
> August 28, 2020 Translation Task Submission Deadline
> September 18, 2020 ? Research Paper Submission Deadline
> October 23, 2020 ? Notification of Acceptance for Research Papers
> October 23, 2020 System Description Paper Submission Deadline
> October 30, 2020 Review Feedback of System Description Papers
> November 6, 2020 - Camera-ready Deadline
> December 4-7, 2020 Workshop Dates (one of these days)
>
> * All deadlines are calculated at 11:59PM UTC-12
>
> Best regards,
>
> ---------------------------------------------------------------------------
> WAT2020
> (The 7th Workshop on Asian Translation)
> in conjunction with AACL-IJCNLP2020
> http://lotus.kuee.kyoto-u.ac.jp/WAT/
> December 4-7, 2020, Suzhou, China (ONLINE)
>
> Following the success of the previous WAT workshops (WAT2014 --
> WAT2019), WAT2020 will bring together machine translation researchers
> and users to try, evaluate, share and discuss brand-new ideas about
> machine translation. For the 7th WAT, we will include the following
> new translation tasks:
>
> * Japanese <--> English multimodal task
> * Document-level test set for Japanese <--> English newswire task
> * Hindi/Thai/Malay/Indonesian <--> English IT-domain and Wikinews task
> * Odia <--> English mixed-domain task
>
> together with the following continuing tasks:
>
> * English/Chinese <--> Japanese scientific paper task
> * English/Chinese/Korean <--> Japanese patent task
> * English <--> Japanese newswire task
> * Russian <--> Japanese news commentary task
> * Myanmar <--> English mixed-domain task
> * Khmer <--> English mixed-domain task
> * Indian language <--> English mixed-domain multilingual translation task
>
> * English --> Hindi multimodal task
>
> In addition to the shared tasks, the workshop will also feature
> scientific papers on topics related to the machine translation,
> especially for Asian languages. Topics of interest include, but are
> not limited to:
>
> - analysis of the automatic/human evaluation results in the past WAT
> workshops
> - word-/phrase-/syntax-/semantics-/rule-based, neural and hybrid
> machine translation
> - Asian language processing
> - incorporating linguistic information into machine translation
> - decoding algorithms
> - system combination
> - error analysis
> - manual and automatic machine translation evaluation
> - machine translation applications
> - quality estimation
> - domain adaptation
> - machine translation for low resource languages
> - language resources
>
> ************************* IMPORTANT NOTICE *************************
> Participants of the previous workshop are also required to sign up to
> WAT2020
> ********************************************************************
>
> TRANSLATION TASKS
> -----------------
>
> The task is to improve the text translation quality for scientific
> papers and patent documents. Participants choose any of the subtasks
> in which they would like to participate and translate the test data
> using their machine translation systems. The WAT organizers will
> evaluate the results submitted using automatic evaluation and human
> evaluation. We will also provide a baseline machine translation.
>
> Tasks:
> Scientific Paper: [Asian Scientific Paper Excerpt Corpus (ASPEC)]
> English/Chinese <--> Japanese
> Patent: [Japan Patent Office Patent Corpus 2.0 (JPC2)]
> English/Chinese/Korean <--> Japanese
> Newswire: [JIJI Corpus] (document-level testset is newly added)
> Japanese <--> English
> News Commentary:
> Japanese <--> Russian (Japanese <--> English and English <-->
> Russian included)
> IT Documentation and Wikinews: [SAP-NICT Corpus]
> Hindi/Thai/Malay/Indonesian <--> English [ALT and other mixed corpora]
> NEW!!
> Mixed domain:
> Myanmar <--> English [UCSY and ALT corpora]
> Khmer <--> English [ECCC and ALT corpora]
> Indic:
> Indian Language <--> English multilingual [Assorted Corpus from
> various sources]
> Odia <--> English [UFAL (EnOdia) corpus] NEW!!
> Multimodal:
> Hindi --> English Multimodal [Hindi Visual Genome corpus]
> Japanese <--> English Multimodal [Flickr30kEnt-JP corpus] NEW!!
>
> Dataset:
>
> * Scientific paper
>
> WAT uses ASPEC for the dataset including training, development,
> development test and test data. Participants of the scientific papers
> subtask must get a copy of ASPEC by themselves. ASPEC consists of
> approximately 3 million Japanese-English parallel sentences from paper
> abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese
> paper excerpts (ASPEC-JC)
>
> * Patent
>
> WAT uses JPO Patent Corpus, which is constructed by Japan Patent
> Office (JPO). This corpus consists of 1 million English-Japanese
> parallel sentences, 1 million Chinese-Japanese parallel sentences, and
> 1 million Korean-Japanese parallel sentences from patent description
> with four categories. Participants of patent tasks are required to get
> it on WAT2019 site of JPO Patent Corpus.
>
> - English/Chinese/Korean <--> Japanese:
> These tasks evaluate performance of a translation model similarly as
> the other translation tasks. Differing from the previous tasks at
> WAT2015, WAT2016 and WAT2017, new test sets of these tasks consists
> of (a) patent documents published between 2011 and 2013, which were
> used in the past years' WAT, and (b) ones published between 2016 and
> 2017 for each language pair. We will also evaluate performance of the
> section (a) so as to compare systems submitted in the past years'
> WAT.
>
> - Chinese -> Japanese expression pattern task:
> This task evaluates performance of a translation model for each
> predifined category of expression patterns, which corresponds to
> title of invention (TIT), abstract (ABS), scope of claim (CLM) or
> description (DES). Test set of this task consists of sentences each
> of which is annotated with a corresponding category of expression
> patterns.
>
> * Newswire
>
> WAT uses JIJI Corpus, which is constructed by Jiji Press Ltd. in
> collaboration with the National Institute of Information and
> Communications Technology (NICT). This corpus consists of a
> Japanese-English news corpus of 200K parallel sentences, from Jiji
> Press news with various categories. At WAT2020, the organizers newly
> added a new document-level translation testset, which consists of
> manually filtered test and reference sentences and document-level
> context of the test sentences. Participants of the newswire subtask
> are required to get it on WAT2020 site of JIJI Corpus.
>
> * News Commentary
>
> WAT uses a manually aligned and cleaned Japanese <--> Russian corpus
> from the News Commentary domain to study extremely low resource
> situations for distant language pairs. The parallel corpus contains
> around 12,000 lines. This year, we invite participants to utilize any
> existing monolingual or parallel corpora from WMT 2020 in addition to
> those listed on the WAT website. In particular, solutions focusing on
> monolingual pretraining and multilingualism are encouraged.
>
> * IT and Wikinews
>
> - Hindi/Thai/Malay/Indonesian <--> English
>
> In collaboration with SAP and NICT, WAT is organising a pilot
> translation task to/from English to/from Hindi, Thai, Malay and
> Indonesian. The evaluation data belongs to the IT domain (Software
> Documentation) and Wikinews domain (Asian Language Treebank).
> Participants will be expected to train systems and submit translations
> for all language pairs (to and from English) and both domains using
> any existing monolingual or parallel data. Given the growing focus on
> a universal translation model for multiple languages and domains, WAT
> encourages a single multilingual and multi-domain model for all
> language pairs and both domains (IT as well as Wikinews). Additional
> details will be given on the WAT 2020 website.
>
> * Mixed domain
>
> - Myanmar (Burmese) <--> English
> WAT uses UCSY Corpus and ALT Corpus. The UCSY corpus and a portion of
> the ALT corpus are use as training data, which are around 220,000
> lines of sentences and phrases. The development and test data are
> from the ALT corpus.
>
> - Khmer <--> English
> WAT uses ECCC Corpus and ALT Corpus. The ECCC corpus and a portion of
> the ALT corpus are use as training data, which are around 120,000
> lines of sentences and phrases. The development and test data are
> from the ALT corpus.
>
> * Indic
>
> - Odia <--> English
> For the first time, WAT organizing a translation task for the low
> resource language Odia. WAT will use the OdiEnCorp2.0 corpus collected
> by researchers at Idiap Research Institute and UFAL. The training data
> contains around 98K parallel sentences covering different domains.
>
> - Indian language <--> English multilingual translation task. This
> task is being revived after 2018 with major revisions. There has been
> an increase in the available datasets for Indian languages in the last
> couple of years along with major advances in multilingual learning.
> The task will involve training a single model for multiple Indian
> languages to English (and vice-versa) translation. The goal is to
> encourage exploration of methods which utilize language relatedness to
> improve translation quality for low-resource languages while having a
> single, compact translation model. The training set would be compiled
> from many publicly available datasets spanning 7-8 Indian languages.
>
> * Multimodal
>
> - Hindi --> English Multimodal (Visual Genome) After a warm
> response from the participants for the ?WAT 2019 Multimodal
> Translation Task?, WAT will continue organizing a multimodal English
> --> Hindi translation task where the input will be text and an Image
> and the output will be a caption (text). The training set contains
> around 30,000 segments. Additional details will be given on the task
> website.
>
> - Japanese <--> English Multimodal (Flickr30kEnt-JP)
> Details of this task will be announced later. We will use the
> Flickr30kEnt-JP corpus for this task.
> https://github.com/nlab-mpg/Flickr30kEnt-JP
>
> EVALUATION
> ----------
>
> Automatic evaluation:
> We are providing an automatic evaluation server. It is free for
> everyone, but you need to create an account for evaluation. Just
> showing the list of evaluation results does not require an account.
>
> Sign-up: http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2020/
> Eval. result: http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/
>
> Human evaluation:
> Both crowdsourcing evaluation and JPO adequacy evaluation will be
> carried out for selected subtasks and selected submitted systems (the
> details will be announced later).
>
> INVITED TALK
> ------------
>
> TBA
>
> ORGANIZERS
> ----------
>
> Toshiaki Nakazawa, The University of Tokyo, Japan
> Hideki Nakayama, The University of Tokyo, Japan
> Chenchen Ding, National Institute of Information and Communications
> Technology (NICT), Japan
> Raj Dabre, National Institute of Information and Communications
> Technology (NICT), Japan
> Hiroshi Manabe, National Institute of Information and Communications
> Technology (NICT), Japan
> Anoop Kunchukuttan, Microsoft, India
> Win Pa Pa, University of Computer Studies, Yangon (UCSY), Myanmar
> Ond?ej Bojar, Charles University, Prague, Czech Republic
> Shantipriya Parida, Idiap Research Institute, Martigny, Switzerland
> Isao Goto, Japan Broadcasting Corporation (NHK), Japan
> Hidaya Mino, Japan Broadcasting Corporation (NHK), Japan
> Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST), Japan
> Sadao Kurohashi, Kyoto University, Japan
> Pushpak Bhattacharyya, Indian Institute of Technology Bombay (IITB), India
>
> CONTACT
> -------
>
> wat-organizer@googlegroups.com
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

--
Raj Dabre.
Researcher at NICT, Japan.
Ph.D., Graduate School of Informatics, Kyoto University.
M.Tech., IIT Bombay.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200801/308fbe9a/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 166, Issue 1
*********************************************

Moses-support Digest, Vol 166, Issue 1

0 Response to "Moses-support Digest, Vol 166, Issue 1"

Post a Comment