Moses-support Digest, Vol 185, Issue 6

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
https://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. CFP: WAT2022 (The 9th Workshop on Asian Translation)
(Toshiaki Nakazawa)
2. EAMT 2022: Bursaries for Translators (Carol Scarton)


----------------------------------------------------------------------

Message: 1
Date: Thu, 28 Apr 2022 11:28:47 +0900
From: Toshiaki Nakazawa <zawa13@gmail.com>
To: moses-support@mit.edu
Subject: [Moses-support] CFP: WAT2022 (The 9th Workshop on Asian
Translation)
Message-ID:
<CAMMh7mqRWcRscK5oiyUfdevBqeH05ppQDgo7S+2qTttZRbGFng@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

This is the call for participation for the MT shared tasks and
research papers to the 9th Workshop on Asian Translation (WAT2022),
workshop of COLING 2022. Those who are working on machine translation,
please join us.

IMPORTANT DATES
---------------

July 11 - Shared Task Submission Deadline
July 11 - Research Paper Submission Deadline
August 1 - System Description Paper for Shared Tasks Submission Deadline
August 22 - Notification of Acceptance for Research Papers
August 29 - Review Feedback of System Description Papers
September 5 - Camera-ready Deadline (both Research and System
Description Papers)
September 19 - Workshop Proceedings Deadline
October 12-17 - Workshop Date

* All deadlines are calculated at 11:59PM UTC-12

Best regards,

---------------------------------------------------------------------------
WAT2022
(The 9th Workshop on Asian Translation)
in conjunction with COLING2022
http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2022
OCTOBER 12-17, 2022 / GYEONGJU, REPUBLIC OF KOREA

Following the success of the previous WAT workshops (WAT2014 --
WAT2021), WAT2022 will bring together machine translation researchers
and users to try, evaluate, share and discuss brand-new ideas about
machine translation. For the 9th WAT, we will include the following
new tasks:

* English <--> Japanese Parallel Corpus Filtering Task
* Khmer --> English/French Speech Translation Task
* Chinese <--> Japanese Restricted Translation Task
* Japanese --> English Video Guided Translation Task
* English --> Bengali Multi-Modal Translation Task
* Sinhala, Nepali, Assamese, Sindhi, Urdu <--> English (5 new
languages in the MultiIndicMT task)
* English <--> Japanese/Korean/Chinese NICT-SAP structured document
translation Task
* English <--> Vietnamese (new pair added to the NICT-SAP multilingual
multi-domain Task)

together with the following continuing tasks:

* Document-level Translation Tasks
English/Chinese <--> Japanese scientific paper
English <--> Japanese newswire
English <--> Japanese business scene dialogue
* English/Chinese/Korean <--> Japanese patent task
* English --> Hindi/Malayalam Multi-Modal Translation Task
* English <--> Japanese Restricted Translation Task

In addition to the shared tasks, the workshop will also feature
scientific papers on topics related to machine translation, especially
for Asian languages. Topics of interest include, but are not limited
to:

- analysis of the automatic/human evaluation results in the past WAT workshops
- word-/phrase-/syntax-/semantics-/rule-based, neural and hybrid
machine translation
- Asian language processing
- incorporating linguistic information into machine translation
- decoding algorithms
- system combination
- error analysis
- manual and automatic machine translation evaluation
- machine translation applications
- quality estimation
- domain adaptation
- machine translation for low resource languages
- language resources

************************* IMPORTANT NOTICE *************************
Participants of the previous workshop are also required to sign up to WAT2022
********************************************************************

TRANSLATION TASKS
-----------------

The task is to improve the text translation quality for scientific
papers and patent documents. Participants choose any of the subtasks
in which they would like to participate and translate the test data
using their machine translation systems. The WAT organizers will
evaluate the results submitted using automatic evaluation and human
evaluation. We will also provide a baseline machine translation.

Tasks:
* Document-level translation tasks:
- ASPEC+ParaNatCom: English --> Japanese Scientific Paper
- BSD Corpus: English <--> Japanese Business Scene Dialogue
- JIJI Corpus: English <--> Japanese Newswire
- NICT-SAP: Hindi/Thai/Malay/Indonesian/Vietnamese <--> English
- NICT-SAP: Japanese/Korean/Chinese <--> English (structured
document translation)
* Multimodal translation tasks:
- Visual Genome: English --> Hindi/Malayalam/Bengali
- Ambiguous MS COCO: English <--> Japanese
* Video Guided Translation task:
- VISA: Japanese --> English
* Indic tasks:
- MultiIndicMT:
Assamese/Bengali/Gujarati/Hindi/Kannada/Malayalam/Marathi/Nepali/Odia/Punjabi/Tamil/Telugu/Urdu/Sindhi/Sinhala
<--> English
* Patent task:
- JPC3: English/Chinese/Korean <--> Japanese
* Restricted Translation tasks:
- English/Chinese <--> Japanese
* Parallel Corpus Filtering task:
- English <--> Japanese

Dataset:

* Scientific paper

WAT uses ASPEC for the dataset including training, development,
development test and test data. Participants of the scientific papers
subtask must get a copy of ASPEC by themselves. ASPEC consists of
approximately 3 million Japanese-English parallel sentences from paper
abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese
paper excerpts (ASPEC-JC)

* Patent

WAT uses JPO Patent Corpus, which is constructed by Japan Patent
Office (JPO). This corpus consists of 1 million English-Japanese
parallel sentences, 1 million Chinese-Japanese parallel sentences, and
1 million Korean-Japanese parallel sentences from patent description
with four categories. Participants of patent tasks are required to get
it on WAT site of JPO Patent Corpus. Differing from the previous tasks
at WAT2018-2021, new test-N4 sets will be additionally used, and
previous test-N sets will be replaced by new test-2022 sets..


* IT and Wikinews

- Hindi/Thai/Malay/Indonesian/Vietnamese <--> English

In collaboration with SAP and NICT, WAT will continue the translation
task for English to/from Hindi, Thai, Malay and Indonesian.
Additionally, this year English to/from Vietnamese evaluation data is
also available. The evaluation data belongs to the IT domain (Software
Documentation) and Wikinews domain (Asian Language Treebank).
Participants will be expected to train systems and submit translations
for all language pairs (to and from English) and both domains using
any existing monolingual or parallel data. Given the growing focus on
a universal translation model for multiple languages and domains, WAT
encourages a single multilingual and multi-domain model for all
language pairs and both domains (IT as well as Wikinews). Additional
details will be given on the WAT 2022 website.

- Japanese/Chinese/Korean <--> English

In addition to the task above, WAT will offer a new task for English
to/from Japanese, Chinese and Korean structured document translation.
Structured pages/documents contain sentences annotated with rich meta
information. For example: "This is a <b>sentence</b>." is an example
of a sentence in a structured document. Its translation in Spanish
should be: "Esta es una <b>frase</b>." where the <b> tag appropriately
encloses the translation of the word "sentence". Structured document
translation is challenging as the translation system will have to deal
with the alignment of the content enclosed in tags, especially when
training data without structure information is unavailable. Additional
details will be given on the WAT 2022 website.

* Newswire

WAT uses JIJI Corpus, which is constructed by Jiji Press Ltd. in
collaboration with the National Institute of Information and
Communications Technology (NICT). This corpus consists of a
Japanese-English news corpus of 200K parallel sentences, from Jiji
Press news with various categories. Participants of the newswire
subtask are required to get it on WAT2022 site of JIJI Corpus.

* Indic

- Indian language <--> English multilingual translation task. This
task is a successor to the 2018, 2020, and 2021 tasks with major
improvements. There has been an increase in the available datasets for
Indian languages in the past few years along with major advances in
multilingual learning. The task will involve training a multilingual
model for 15 Indian languages to English (and vice-versa) translation.
5 new languages, Urdu, Nepali, Sindhi, Sinhala and Assameae, have been
added this year. The goal is to encourage exploration of methods which
utilize multilingualism and language relatedness to improve
translation quality for low-resource languages while having a single,
compact translation model. The evaluation set is 15-way parallel
enabling the potential evaluation of non-English centric language
pairs, some of which we will evaluate.

* Multimodal
Given the growing interest in multimodal NLP and the warm response
from the participants for the ?WAT 2019 and 2020 Multimodal
Translation Tasks?, WAT will evaluate the following multimodal tasks:

- English --> Hindi Multimodal (Visual Genome) WAT will continue
organizing the multimodal English --> Hindi translation task where
the input will be text and an Image and the output will be a caption
(text). The training set contains around 30,000 segments. Additional
details will be given on the task website.

- English --> Malayalam Multimodal (Visual Genome) WAT will
continue organizing the multimodal English --> Malayalam
translation task where the input will be text and an Image and the
output will be a caption (text). The training set contains around
30,000 segments. Additional details will be given on the task website.

- English --> Bengali Multimodal (Visual Genome) WAT will continue
organizeing a new the multimodal English --> Bengali translation task
where the input will be text and an Image and the output will be a
caption (text). The training set contains around 30,000 segments.
Additional details will be given on the task website.

- Japanese <--> English Multimodal (Ambiguous MS COCO) WAT will
organize an additional multimodal Japanese <--> English translation
task where the evaluation set, Ambiguous MS COCO, will focus on
translation of ambiguous words and sentences. Along with the
Flickr30kEnt-JP dataset, the MS COCO English data may also be used.
Additional details will be given on the task website.

* Parallel Corpus Filtering
We also plan to add parallel corpus filtering tasks, which ask
participants to clean noisy parallel corpus, then train the models
with a fixed setting and evaluate their accuracy. Competitors are
required to improve translation accuracy by only removing training
data that may hurt the model. This year, we will provide a noisy
parallel corpus on Japanese-English, which is not focused on other
shared-tasks yet.

EVALUATION
----------

Automatic evaluation:
We are providing an automatic evaluation server. It is free for
everyone, but you need to create an account for evaluation. Just
showing the list of evaluation results does not require an account.

Sign-up: http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2022/
Eval. result: http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/

Human evaluation:
Both crowdsourcing evaluation and JPO adequacy evaluation will be
carried out for selected subtasks and selected submitted systems (the
details will be announced later).

ORGANIZERS
----------

- Toshiaki Nakazawa, The University of Tokyo, Japan [GENERAL,
ASPEC+ParaNatCom, BSD]
- Isao Goto, Japan Broadcasting Corporation (NHK), Japan [GENERAL, JIJI]
- Hidaya Mino, Japan Broadcasting Corporation (NHK), Japan [GENERAL, JIJI]
- Chenchen Ding, National Institute of Information and Communications
Technology (NICT), Japan [GENERAL]
- Raj Dabre, National Institute of Information and Communications
Technology (NICT), Japan [MultiIndicMT, NICT-SAP]
- Anoop Kunchookuttan, Microsoft AI and Research, India [MultiIndicMT]
- Shohei Higashiyama, National Institute of Information and
Communications Technology (NICT), Japan [JPC]
- Hiroshi Manabe, National Institute of Information and Communications
Technology (NICT), Japan [GENERAL]
- Shantipriya Parida,Silo AI, Finland [Hindi Visual Genome, Malayalam
Visual Genome, Bengali Visual Genome]
- Ond?ej Bojar, Charles University, Prague, Czech Republic [Hindi
Visual Genome, Malayalam Visual Genome, Bengali Visual Genome]
- Chenhui Chu, Kyoto University, Japan [Ambiguous MS COCO]
- Akiko Eriguchi, Microsoft, USA [Restricted Translation]
- Kaori Abe,Tohoku University, Japan [Restricted Translation]
- Yusuke Oda, LegalForce, Japan [Restricted Translation, Parallel
Corpus Filtering]
- Makoto Morishita, NTT, Japan [Parallel Corpus Filtering]
- Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST),
Japan [GENERAL]
- Sadao Kurohashi, Kyoto University, Japan [GENERAL]
- Pushpak Bhattacharyya, Indian Institute of Technology Patna (IITP),
India [GENERAL]

CONTACT
-------

wat-organizer@googlegroups.com



------------------------------

Message: 2
Date: Thu, 28 Apr 2022 16:30:49 +0100
From: Carol Scarton <carol.scarton@gmail.com>
To: wmt-tasks@googlegroups.com, mt-list@eamt.org,
moses-support@mit.edu
Subject: [Moses-support] EAMT 2022: Bursaries for Translators
Message-ID:
<CACDJ-d2A_74HXHZeRXfG5tRbK06oMQ-8H54oXCrFvj_Fr9hAsg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

************************************************
EAMT 2022: Bursaries for Translators
************************************************

== Call for Participation ==

The European Association for Machine Translation (EAMT) is an organisation
that serves the growing community of people interested in MT and
translation tools, including translators, users, developers, and
researchers of this increasingly viable technology.

As part of its commitment to promote research, development and awareness
about translation technologies, the EAMT opens a call for a small number of
bursaries to support translators and Translation Studies' students, in
attending the 23rd Annual Conference of the European Association for
Machine Translation (EAMT 2022) conference, to be held in Ghent, Belgium,
from June 1st to June 3rd.

== Purpose of the Call ==

This call is dedicated to support translators and Translation Studies'
students, working or studying in European, Middle-Eastern or African
countries, that do not have fundings to attend the conference.

The EAMT particularly encourages applications from early-career translators.

All applications will be screened by the EAMT President (Dr Helena Moniz)
and the EAMT Secretary (Dr Carolina Scarton) and possibly a few appointed
Executive Committee members if necessary.

== Application information ==

* Eligibility requirements

In order to qualify for this call, the individual must be a translator or
enrolled in a Master or PhD course in Translation Studies. The support is
only available to individuals working or studying in European,
Middle-Eastern or African countries. Freelance translators and students
will have priority.

* Selection criteria

- The selection will be made based on the information submitted to the
provided Google Forms (link below).
- One of the fields in the form is a "motivation letter", where you should
describe your motivation for attending the EAMT 2022 conference and explain
why you do not have other funds to sponsor your attendance.
- You should also submit a CV, highlighting your years of experience in the
translation area and your experience working with MT.
- For students: you should also submit an official proof of student status,
signed by your University.

== Bursaries ==

EAMT anticipates funding several applications. Selected participants will
be announced on the 5th of May 2022 and will receive complimentary
membership in the EAMT for 2022 and 2023, free registration at the EAMT
2022 conference and paid accommodation in Ghent.

== Contact for enquiries ==

Carolina Scarton
EAMT Secretary
e-mail: c.scarton@sheffield.ac.uk

== Applications ==

Candidates should submit their applications via a Google Form:
https://forms.gle/DiKwNEp9Ed1PtiDw7

== Important Dates ==

Circulation of the Call: April 28, 2022
Submission deadline for applications: May 3rd, 2022, 23:59 CEST
Notification: May 5th, 2022

== Additional provisions ==

- Only complete applications will be reviewed.
- All information submitted with applications will be regarded as
confidential and will only be used in the context of this call.
- You may be asked to share the accommodation room with other awardees.
However, we will commit to respect any requirements / concerns that you
inform us (e.g. religion, gender, etc).

== No obligation to award the bursaries ==

The EAMT shall be under no obligation to fund the applications pursuant to
this call for participation. EAMT shall not be liable for any compensation
with respect to candidates whose applications have not been approved. Nor
shall it be liable in the event of it deciding not to award the bursaries.

--
*Carolina Scarton*
Lecturer in Natural Language Processing
Department of Computer Science
University of Sheffield
http://staffwww.dcs.shef.ac.uk/people/C.Scarton/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.mit.edu/mailman/private/moses-support/attachments/20220428/e1a67a76/attachment.htm>

------------------------------

Subject: Digest Footer

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
https://mailman.mit.edu/mailman/listinfo/moses-support


------------------------------

End of Moses-support Digest, Vol 185, Issue 6
*********************************************

0 Response to "Moses-support Digest, Vol 185, Issue 6"

Post a Comment