Moses-support Digest, Vol 207, Issue 3

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
https://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Call for shared task participation: Non-Repetitive
Translation Task in WMT 2024 (Kazutaka Kinugawa)


----------------------------------------------------------------------

Message: 1
Date: Fri, 14 Jun 2024 10:53:36 +0000
From: Kazutaka Kinugawa <kinugawa.k-jg@nhk.or.jp>
To: "moses-support@mit.edu" <moses-support@mit.edu>
Subject: [Moses-support] Call for shared task participation:
Non-Repetitive Translation Task in WMT 2024
Message-ID:
<TYWPR01MB10917CDB23016D154DE5E6330FCC22@TYWPR01MB10917.jpnprd01.prod.outlook.com>

Content-Type: text/plain; charset="gb2312"

Apologies for cross-posting

We cordially invite all researchers and practitioners to participate in the Non-Repetitive Translation task in WMT 2024.



This task focuses on lexical choice in machine translation. If interested, see the details through this link: https://www2.statmt.org/wmt24/non-repetitive-translation-task.html



*Important Dates*

Submission deadline for the task: July 21st
Paper submission deadline: August 20th
Notification of acceptance: September 20th
Camera-ready deadline: October 3rd
Conference: November 15-16

*Task Description*

This task focuses on lexical choice in machine translation, especially choice regarding repeated words in a source sentence. Generally, the repetition of the same words can create a monotonous or awkward impression in English, and it should be appropriately avoided. Typical workarounds in monolingual writing are to (1) remove redundant terms if possible (reduction) or (2) use alternative words such as synonyms as substitutes (substitution). These techniques are also observed in human translations.
The goal of this task is to study how these techniques can be incorporated into machine translation systems to enrich lexical choice capabilities. From a practical standpoint, such capability would be important, for example, in news production, where high quality text that goes beyond robotic word-by-word translation is required. Specifically, participants are required to control a machine translation system using reduction or substitution so that it does not output the same words for certain repeated words in a source sentence. The translation direction is Japanese to English.



*Challenges*



The challenges underlying this task include the following:
- Maintaining the balance between translation quality and controlling the output: The translation quality can be degraded when the non-repetitive style is inappropriately enforced.
- Avoiding bias toward high-frequency bilingual word pairs: In general, for a given source word, high-frequency target words associated with it are more likely to be output. This can make it difficult to determine appropriate substitutions for some words.
- Predicting which words can be reduced or substituted: Predicting which source words can be reduced or substituted appropriately is not an easy problem because it depends on the context within the sentence.
- Mining training instances: Translations with reduction can be especially difficult to identify in noisy corpora because of the challenge of discriminating them from undertranslations.



*Data Set*



We provide development and test sets for this task. In both data sets, all Japanese sentences contain some repeated words that are translated into English with reduction or substitution. We collected these data from Jiji Japanese?English news articles. Specifically, we first automatically created sentence pairs based on lexical similarities, and then manually selected instances suited for this task. These sentence pairs include not only one-to-one pairs but two-to-two pairs. Both the development and test sets contain raw and tagged parallel data. In the tagged data, repeated words in the source sentence and their counterparts in the target sentence are marked with tags, which indicates that these words are evaluation targets. Note that not all words repeated in the source sentence are evaluation targets. This is because some words, such as proper nouns and technical terms, should be translated consistently, even if they are repeated in the sentence.
Tagged development data are provided to help tune the model during training. However, participants cannot use tagged test data and must use raw test data when submitting the system results. In this task, the systems must detect repeated words which can be reduced or substituted on their own.
To reduce the negative effects of imbalanced content in the source and target sentences, the Japanese sentences in the development and test data were manually translated from the English while preserving as much of the vocabulary of the original Japanese sentences as possible.



As for the training data, we also provide all the data from the WAT 2020 Newswire tasks, which were also constructed from Jiji news articles and have been continuously used in WAT from 2020. These data are a regular parallel corpus and have not been annotated specifically for this task, but are in exactly the same domain as this task. Although the development and test data from the WAT 2020 Newswire tasks are not directly related to the evaluation of this task, these can be used to measure basic translation performance during training. In addition, participants can also use any other publicly available corpora, such as the data from the general MT task in WMT, for training. When using external data, be sure to include an explanation about the data in your paper.



*Organizers*
Kazutaka Kinugawa (kinugawa.k-jg@nhk.or.jp<mailto:kinugawa.k-jg@nhk.or.jp>), NHK
Hideya Mino, NHK
Naoto Shirai, NHK
Isao Goto, Ehime University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.mit.edu/mailman/private/moses-support/attachments/20240614/56cd1e82/attachment.htm>

------------------------------

Subject: Digest Footer

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
https://mailman.mit.edu/mailman/listinfo/moses-support


------------------------------

End of Moses-support Digest, Vol 207, Issue 3
*********************************************

0 Response to "Moses-support Digest, Vol 207, Issue 3"

Post a Comment