Moses-support Digest, Vol 152, Issue 7

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. CFP: WAT2019 (The 6th Workshop on Asian Translation)
(Toshiaki Nakazawa)
2. WeCNLP 2019 Last Call for Abstracts + Deadline Extension
(Juan Miguel Pino)

----------------------------------------------------------------------

Message: 1
Date: Sat, 15 Jun 2019 11:43:46 +0900
From: Toshiaki Nakazawa <nakazawa@logos.t.u-tokyo.ac.jp>
Subject: [Moses-support] CFP: WAT2019 (The 6th Workshop on Asian
Translation)
To: <moses-support@mit.edu>
Message-ID:
<CAMMh7mpodsjGY6GLADNXe-LuZtVATFB+GjyzSv8XcfRKSeQh9w@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Dear all MT researchers/users,

I'm Toshiaki Nakazawa from The University of Tokyo, Japan. This is the
call for participation for the MT shared tasks and research papers to
the 6th Workshop on Asian Translation (WAT2019), workshop of
EMNLP-IJCNLP2019. Those who are working on machine translation, please
join us.

IMPORTANT DATES
---------------

Jul. 26, 2019 Translation Task Submission Deadline
Aug. 19, 2019 Research Paper Submission Deadline
Sep. 13, 2019 System Description Paper Submission Deadline
Sep. 17, 2019 Notification of Acceptance for Research Papers
Sep. 20, 2019 Review Feedback of System Description
Sep. 30, 2019 Camera-ready Deadline
Nov. 3-4, 2019 Workshop Dates

* All deadlines are calculated at 11:59PM UTC-7

Best regards,

---------------------------------------------------------------------------
WAT2019
(The 6th Workshop on Asian Translation)
in conjunction with EMNLP-IJCNLP2019
http://lotus.kuee.kyoto-u.ac.jp/WAT/
November 3or4, 2019, Hong Kong, China

Following the success of the previous WAT workshops (WAT2014 --
WAT2018), WAT2019 will bring together machine translation researchers
and users to try, evaluate, share and discuss brand-new ideas about
machine translation. For the 6th WAT, we will include the following
new translation tasks:

* Japanese <--> English timely disclosure documents task
* Khmer <--> English Mixed-domain task
* Tamil <--> English Mixed-domain task
* Russian <--> Japanese News Commentary task
* English --> Hindi multimodal task

In addition to the shared tasks, the workshop will also feature
scientific papers on topics related to the machine translation,
especially for Asian languages. Topics of interest include, but are
not limited to:

- analysis of the automatic/human evaluation results in the past WAT workshops
- word-/phrase-/syntax-/semantics-/rule-based, neural and hybrid
machine translation
- Asian language processing
- incorporating linguistic information into machine translation
- decoding algorithms
- system combination
- error analysis
- manual and automatic machine translation evaluation
- machine translation applications
- quality estimation
- domain adaptation
- machine translation for low resource languages
- language resources

************************* IMPORTANT NOTICE *************************
Participants of the previous workshop are also required to sign up to
WAT2019
********************************************************************

TRANSLATION TASKS
-----------------

The task is to improve the text translation quality for scientific
papers and patent documents. Participants choose any of the subtasks
in which they would like to participate and translate the test data
using their machine translation systems. The WAT organizers will
evaluate the results submitted using automatic evaluation and human
evaluation. We will also provide a baseline machine translation.

Tasks:
Scientific Paper: [Asian Scientific Paper Excerpt Corpus (ASPEC)]
English/Chinese <--> Japanese
Patent: [Japan Patent Office Patent Corpus 2.0 (JPC2)]
English/Chinese/Korean <--> Japanese
Timely Disclosure: [Timely Disclosure Documents Corpus] NEW!!
Japanese <--> English
Newswire: [JIJI Corpus]
Japanese <--> English
News Commentary: NEW!!
Japanese <--> Russian (Japanese <--> English and English <-->
Russian included)
Mixed domain:
Myanmar <--> English [UCSY and ALT corpora]
Khmer <--> English [ECCC and ALT corpora] NEW!!
Indic:
Hindi <--> English [IIT Bombay (IITB) corpus]
Tamil <--> English [UFAL (EnTam) corpus] NEW!!
Hindi --> English Multimodal NEW!!

Dataset:

* Scientific paper

WAT uses ASPEC for the dataset including training, development,
development test and test data. Participants of the scientific papers
subtask must get a copy of ASPEC by themselves. ASPEC consists of
approximately 3 million Japanese-English parallel sentences from paper
abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese
paper excerpts (ASPEC-JC)

* Patent

WAT uses JPO Patent Corpus, which is constructed by Japan Patent
Office (JPO). This corpus consists of 1 million English-Japanese
parallel sentences, 1 million Chinese-Japanese parallel sentences, and
1 million Korean-Japanese parallel sentences from patent description
with four categories. Participants of patent tasks are required to get
it on WAT2019 site of JPO Patent Corpus.

- English/Chinese/Korean <--> Japanese:
These tasks evaluate performance of a translation model similarly as
the other translation tasks. Differing from the previous tasks at
WAT2015, WAT2016 and WAT2017, new test sets of these tasks consists
of (a) patent documents published between 2011 and 2013, which were
used in the past years' WAT, and (b) ones published between 2016 and
2017 for each language pair. We will also evaluate performance of the
section (a) so as to compare systems submitted in the past years'
WAT.

- Chinese -> Japanese expression pattern task:
This task evaluates performance of a translation model for each
predifined category of expression patterns, which corresponds to
title of invention (TIT), abstract (ABS), scope of claim (CLM) or
description (DES). Test set of this task consists of sentences each
of which is annotated with a corresponding category of expression
patterns.

* Timely Disclosure

WAT uses Timely Disclosure Documents Corpus, which is constructed by
was constructed by Japan Exchange Group (JPX). This corpus consists of
a Japanese-English timely disclosure corpus of 1.4M parallel
sentences. Participants of Timely Disclosure tasks are required to get
it on WAT2019 site of Timely Disclosure Documents Corpus.

* Newswire

WAT uses JIJI Corpus, which is constructed by Jiji Press Ltd. in
collaboration with the National Institute of Information and
Communications Technology (NICT). This corpus consists of a
Japanese-English news corpus of 200K parallel sentences, from Jiji
Press news with various categories. Participants of patents subtask
are required to get it on WAT2019 site of JIJI Corpus.

* News Commentary

WAT uses a manually aligned and cleaned Japanese <--> Russian corpus
from the News Commentary domain to study extremely low resource
situations for distant language pairs. The parallel corpus contains
around 12,000 lines and additionally we will provide Japanese <->
English and Russian <--> English in-domain and out-of-domain corpora
along with monolingual corpora. The corpus will be available after
18th May, 2019.

* Mixed domain

- Myanmar (Burmese) <--> English
WAT uses UCSY Corpus and ALT Corpus. The UCSY corpus and a portion of
the ALT corpus are use as training data, which are around 220,000
lines of sentences and phrases. The development and test data are
from the ALT corpus.

- Khmer <--> English
WAT uses ECCC Corpus and ALT Corpus. The ECCC corpus and a portion of
the ALT corpus are use as training data, which are around 120,000
lines of sentences and phrases. The development and test data are
from the ALT corpus.

* Indic

- Hindi <--> English
WAT uses IITB Corpus for the dataset for training, development,
development test and test data. The training corpus is mixed domain
and contains around 1 million lines of sentences and phrases. In
order to access the corpus participants should sign the following
agreement, scan and send it to the addresss mentioned in it. The
training corpus is a mixed domain corpus. The development and test
set are from the News domain and are exactly the same as the ones in
WMT 2014.

-- Vanilla subtask
Develop Hindi-English and English-Hindi MT system using only the
provided IITB English-Hindi Parallel and Monolingual corpora.

-- Multilingual NMT subtask
Multilingual NMT using additional XX-En corpus to improve Hi-En
translation task. Multilingual NMT can be done using Transfer
Learning (Zoph et al. 2016) or using Joint Learning (Johnson et
al. 2016). The choice of the additional corpus is up to the
participant. One possible choice is Arabic-English UN corpus of
approximately 11 million lines.

- Tamil <--> English
WAT will use the EnTam Corpus corpus collected by researchers at
UFAL. The training data contains around 160,000 lines of parallel
corpora. The data belongs to three domains: Cinema, News and Bible.

- Hindi --> English Multimodal (Visual Genome)
For the first time WAT will be organizing a multimodal English -->
Hindi translation task where the input will be text and an Image and
the output will be a caption (text). The training set contains around
30,000 segments. Additional details will be given on the task
website.

EVALUATION
----------

Automatic evaluation:
We are providing an automatic evaluation server. It is for free for
everyone, but you need to create an account for evaluation. Just
showing the list of evaluation results does not require an account.

Sign-up: http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2019/index.html
Eval. result: http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/index.html

Human evaluation:
Both crowdsourcing evaluation and JPO adequacy evaluation will be
carried out for selected subtasks and selected submitted systems (the
details will be announced later).

INVITED TALK
------------

TBA

ORGANIZERS
----------

Toshiaki Nakazawa, The University of Tokyo, Japan
Chenchen Ding, National Institute of Information and Communications
Technology (NICT), Japan
Raj Dabre, National Institute of Information and Communications
Technology (NICT), Japan
Anoop Kunchukuttan, Microsoft AI and Research, India
Win Pa Pa, University of Computer Studies, Yangon (UCSY), Myanmar
Nobushige Doi, Japan Exchange Group (JPX), Japan
Yusuke Oda, Google, Japan
Ond?ej Bojar, Charles University, Prague, Czech Republic
Shantipriya Parida, Idiap Research Institute, Martigny, Switzerland
Isao Goto, Japan Broadcasting Corporation (NHK), Japan
Hidaya Mino, Japan Broadcasting Corporation (NHK), Japan
Hiroshi Manabe, National Institute of Information and Communications
Technology (NICT), Japan
Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST), Japan
Sadao Kurohashi, Kyoto University, Japan
Pushpak Bhattacharyya, Indian Institute of Technology Patna (IITP), India

CONTACT
-------

wat-organizer@googlegroups.com

------------------------------

Message: 2
Date: Sat, 15 Jun 2019 15:27:05 +0000
From: Juan Miguel Pino <juancarabina@fb.com>
Subject: [Moses-support] WeCNLP 2019 Last Call for Abstracts +
Deadline Extension
To: "mt-list@eamt.org" <mt-list@eamt.org>, "corpora@lists.uib.no"
<corpora@lists.uib.no>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID: <B73EC8EE-5CB0-4EE1-B3C7-07B34EAC2A72@fb.com>
Content-Type: text/plain; charset="utf-8"

[Apologies for cross posting]

* Please note the deadline extension to June 30 (UTC-12).
* Please note the multiple submission policy: WeCNLP 2019 is non archival and welcomes concurrent submissions such as EMNLP or CoNLL.

The second annual WeCNLP (West Coast NLP) Summit is an opportunity to foster discussions and collaborations between NLP researchers in academia and industry. The event will include talks and a panel from research leaders on the latest advances in NLP technologies. The day will conclude with a poster session.

We invite submissions in areas of NLP related, but not limited to, the following topics:

* bias and ethics in NLP.
* multimodal NLP.
* dialog and conversational AI.
* low resource scenarios in NLP.

We encourage NLP researchers to submit an abstract describing new, previously, or concurrently published research. We welcome abstract submissions, in theory, methodology, as well as applications. Abstracts may describe completed research or work-in-progress. We also welcome abstract submissions on negative results as well as challenges faced in industrial or academic applications. Authors of accepted abstracts will be asked to present their work in a poster session. Selected abstracts will be presented both as a lightning talk and at the poster session. Submissions will be peer-reviewed in a double-blind setting.

Important Dates
* June 30, 2019 ? Poster abstract submission deadline
* July 22 2019 ? Notification of abstract acceptance
* September 6 2019 ? WeCNLP Summit

Please refer to the site https://www.wecnlp.ai/ for more information and please send email to juancarabina@fb.com<mailto:juancarabina@fb.com> or cmoghbel@fb.com<mailto:cmoghbel@fb.com> for further questions.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20190615/868a6323/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 152, Issue 7
*********************************************

Moses-support Digest, Vol 152, Issue 7

0 Response to "Moses-support Digest, Vol 152, Issue 7"

Post a Comment