Moses-support Digest, Vol 199, Issue 1

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
https://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Call for Participation: WMT 2023 Shared Task on Parallel Data
Curation (Philipp Koehn)


----------------------------------------------------------------------

Message: 1
Date: Fri, 2 Jun 2023 17:03:06 -0400
From: Philipp Koehn <phkoehn@gmail.com>
To: moses-support <moses-support@mit.edu>, wmt-tasks@googlegroups.com,
"corpora@uib.no" <CORPORA@uib.no>, mt-list@eamt.org, Priscilla
Rasmussen <acl@aclweb.org>
Subject: [Moses-support] Call for Participation: WMT 2023 Shared Task
on Parallel Data Curation
Message-ID:
<CAAFADDBFpkhGz-ZwrAPnDj6UrieiRXPBXvRoUO1JTd8P-28BvQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Call for Participation:WMT 2023 Shared Task on Parallel Data Curation

We introduce a new shared task that aims to evaluate the parallel data
curation methods. The goal of the task is to find the best MT training data
within a provided pile of webcrawled data. We encourage submissions that
address any aspect of: document alignment, sentence alignment, comparable
corpora bitext filtering, language ID, or related fields.
Important dates

Organizers release data

June 15, 2023

Submissions deadline

September 1

System paper submission deadline

September 22

Organizers release final results

September 25

Camera-ready paper deadline

October 9

WMT Conference

December 6-7

All deadlines and release dates are Anywhere on Earth.
Overview/Motivation

A Machine Translation system is only as good as its training data. The web
provides vast amounts of translations that can be used as training data.
The challenge is to find pairs of sentences or documents that are
translations of each other, which can be used to train the best possible MT
system.

For this shared task, the organizers will provide

1.

Web-crawled data
2.

Intermediate outputs from the baseline to participants to focus on
specific aspects of task
3.

MT Training and eval scripts


The participants task is to find the best possible set of training data
within the provided web-crawled data to train a downstream MT model, using
the provided model training scripts. Downstream MT performance will be
judged using automatic MT metrics.

This shared task builds on prior shared tasks on document alignment (WMT
2016) and sentence filtering (WMT 2018-2020).

Participants may use only pre-trained models and datasets publicly released
with a research-friendly license on or before May 1, 2023. All participants
are required to submit a system description paper.
Data

We provide web crawl data in Estonian-Lithuanian from a single recent
snapshot of CommonCrawl. We choose this language pair to balance the desire
to have enough training data large enough to train a reasonable Estonian ->
Lithuanian MT model, but small enough to make the task more accessible to
participants with limited compute. For this reason, we also release
pre-computed intermediate steps from a baseline (e.g. laser embeddings,
sentence pairs from FAISS search, etc), so participants can choose to focus
on one aspect of the task (e.g. sentence filtering).
Organizers

-

Tobias Domhan (Amazon)
-

Thamme Gowda (Microsoft)
-

Huda Khayrallah (Microsoft
-

Philipp Koehn (Johns Hopkins University)
-

Steve Sloto (Microsoft)
-

Brian Thompson (Amazon)

To reach the organizers, please email:
wmt-data-task-organizers@googlegroups.com
To get updates about the shared task, please join this mailing list:
https://groups.google.com/g/wmt-data-task/


More information and data will be posted to this website:
statmt.org/wmt23/data-task.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.mit.edu/mailman/private/moses-support/attachments/20230602/25e444ca/attachment.htm>

------------------------------

Subject: Digest Footer

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
https://mailman.mit.edu/mailman/listinfo/moses-support


------------------------------

End of Moses-support Digest, Vol 199, Issue 1
*********************************************

0 Response to "Moses-support Digest, Vol 199, Issue 1"

Post a Comment