Moses-support Digest, Vol 112, Issue 21

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Deadline Extension to February 24: 9thWorkshop onBuilding and
Using Comparable Corpora (BUCC) at LREC 2016 (Reinhard Rapp)
2. Call for participation: 1st Translation Memory Cleaning
Shared Task (Carla Parra)

----------------------------------------------------------------------

Message: 1
Date: Thu, 11 Feb 2016 17:19:44 +0100
From: "Reinhard Rapp" <reinhardrapp@gmx.de>
Subject: [Moses-support] Deadline Extension to February 24:
9thWorkshop onBuilding and Using Comparable Corpora (BUCC) at LREC
2016
To: <IRList@lists.shef.ac.uk>, <listmaster@loria.fr>,
<lr_egroup@mail.iiit.ac.in>, <moses-support@mit.edu>,
<news@multilingual.com>
Message-ID: <D77D97D0541E4A7ABB084BF97082E6EC@ASUSPC>
Content-Type: text/plain; charset="windows-1252"

***** EXTENSION OF SUBMISSION DEADLINE: February 24, 2016 *****

============================================================

Call for Papers

9th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA

Special Topic: Continuous Vector Space Models and Comparable Corpora

Shared Task: Identifying Parallel Sentences in Comparable Corpora

https://comparable.limsi.fr/bucc2016/

Monday, May 23, 2016

Co-located with LREC 2016, Portoro?, Slovenia

============================================================

MOTIVATION

In the language engineering and the linguistics communities, research
on comparable corpora has been motivated by two main reasons. In
language engineering, on the one hand, it is chiefly motivated by the
need to use comparable corpora as training data for statistical
Natural Language Processing applications such as statistical machine
translation or cross-lingual retrieval. In linguistics, on the other
hand, comparable corpora are of interest in themselves by making
possible inter-linguistic discoveries and comparisons. It is generally
accepted in both communities that comparable corpora are documents in
one or several languages that are comparable in content and form in
various degrees and dimensions. We believe that the linguistic
definitions and observations related to comparable corpora can improve
methods to mine such corpora for applications of statistical NLP. As
such, it is of great interest to bring together builders and users of
such corpora.

SHARED TASK

There will be a shared task on "Identifying Parallel Sentences in
Comparable Corpora" whose details will be described on the
workshop website (URL see above).

TOPICS

Beyond this year's special topic "Continuous Vector Space Models and
Comparable Corpora" and the shared task on "Identifying Parallel
Sentences in Comparable Corpora", we solicit contributions including
but not limited to the following topics:

Building comparable corpora:

* Human translations
* Automatic and semi-automatic methods
* Methods to mine parallel and non-parallel corpora from the Web
* Tools and criteria to evaluate the comparability of corpora
* Parallel vs non-parallel corpora, monolingual corpora
* Rare and minority languages, across language families
* Multi-media/multi-modal comparable corpora

Applications of comparable corpora:

* Human translations
* Language learning
* Cross-language information retrieval & document categorization
* Bilingual projections
* Machine translation
* Writing assistance

Mining from comparable corpora:

* Cross-language distributional semantics
* Extraction of parallel segments or paraphrases from comparable corpora
* Extraction of translations of single words and multi-word expressions,
proper names, named entities, etc.

IMPORTANT DATES

February 24, 2016 Deadline for submission of full papers (extended)
March 10, 2016 Notification of acceptance
March 25, 2016 Camera-ready papers due
May 23, 2016 Workshop date

SUBMISSION INFORMATION

Papers should follow the LREC main conference formatting details (to be
announced on the conference website http://lrec2016.lrec-conf.org/en/ )
and should be submitted as a PDF-file via the START workshop manager at

https://www.softconf.com/lrec2016/BUCC2016/

Contributions can be short or long papers. Short paper submission must
describe original and unpublished work without exceeding six (6)
pages. Characteristics of short papers include: a small, focused
contribution; work in progress; a negative result; an opinion piece;
an interesting application nugget. Long paper submissions must
describe substantial, original, completed and unpublished work without
exceeding ten (10) pages.

Reviewing will be double blind, so the papers should not reveal the
authors' identity. Accepted papers will be published in the workshop
proceedings.

Double submission policy: Parallel submission to other meetings or
publications is possible but must be immediately notified to the
workshop organizers.

Please also observe the following two paragraphs which are applicable
to all LREC workshops as well as to the main conference:

Describing your LRs in the LRE Map is now a normal practice in the
submission procedure of LREC (introduced in 2010 and adopted by other
conferences). To continue the efforts initiated at LREC 2014 about
?Sharing LRs? (data, tools, web-services, etc.), authors will have
the possibility, when submitting a paper, to upload LRs in a special
LREC repository. This effort of sharing LRs, linked to the LRE Map
for their description, may become a new ?regular? feature for conferences
in our field, thus contributing to creating a common repository where
everyone can deposit and share data.

As scientific work requires accurate citations of referenced work so
as to allow the community to understand the whole context and also
replicate the experiments conducted by other researchers, LREC 2016
endorses the need to uniquely Identify LRs through the use of the
International Standard Language Resource Number (ISLRN, www.islrn.org),
a Persistent Unique Identifier to be assigned to each Language Resource.
The assignment of ISLRNs to LRs cited in LREC papers will be offered at
submission time.

ORGANISERS

Reinhard Rapp, University of Mainz (Germany)
Pierre Zweigenbaum, LIMSI, CNRS, Orsay (France)
Serge Sharoff, University of Leeds (UK)

FURTHER INFORMATION

Reinhard Rapp: reinhardrapp (at) gmx (dot) de

SCIENTIFIC COMMITTEE

* Ahmet Aker, University of Sheffield (UK)
* Herv? D?jean (Xerox Research Centre Europe, Grenoble, France)
* ?ric Gaussier (Universit? Joseph Fourier, Grenoble, France)
* Vishal Goyal (Punjabi University, Patiala, India)
* Gregory Grefenstette (INRIA, Saclay, France)
* Silvia Hansen-Schirra (University of Mainz, Germany)
* Hitoshi Isahara (Toyohashi University of Technology)
* Kyo Kageura (University of Tokyo, Japan)
* Philippe Langlais (Universit? de Montr?al, Canada)
* Shervin Malmasi (Harvard Medical School, Boston, MA, USA)
* Michael Mohler (Language Computer Corp., USA)
* Emmanuel Morin (Universit? de Nantes, France)
* Lene Offersgaard (University of Copenhagen, Denmark)
* Dragos Stefan Munteanu (Language Weaver, Inc., US)
* Ted Pedersen (University of Minnesota, Duluth, US)
* Reinhard Rapp (University of Mainz, Germany)
* Serge Sharoff (University of Leeds, UK)
* Michel Simard (National Research Council Canada)
* Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160211/3854a500/attachment-0001.html

------------------------------

Message: 2
Date: Mon, 08 Feb 2016 13:32:40 +0100
From: Carla Parra <carla.parra@hermestrans.com>
Subject: [Moses-support] Call for participation: 1st Translation
Memory Cleaning Shared Task
To: Nlp4tm2016 <nlp4tm2016@gmail.com>
Message-ID: <8e2a5ed2bc9b1910dc3c6e5e699ce688@hermestrans.com>
Content-Type: text/plain; charset="utf-8"

(apologies for cross-posting)

CALL FOR PARTICIPATION IN THE 1ST TRANSLATION MEMORY CLEANING SHARED
TASK
organised at the 2nd Workshop on Natural Language Processing for
Translation Memories (NLP4TM 2016)
to be held at LREC 2016 (Portoro?, Slovenia), May 28, 2016

http://rgcl.wlv.ac.uk/nlp4tm2016/shared-task/ [1]

The NLP4TM 2016 workshop proposes a shared task on cleaning translation
memories. Participants in this task will be required to take pairs of
source and target segments from translation memories and decide whether
they are right translations. For the first task three language pairs
have been prepared: EN-ES, EN-IT and EN-DE.

The data was annotated with information on whether the source and target
content of each TM segment represent a valid translation. In particular,
the following 3 point scale has been applied:
(1) The translation is correct.
(2) The translation is correct, but there are a few orthotypographic
mistakes so some minor post-editing is required
(3) The translation is not correct (content missing/added, wrong
meaning, etc.).

The annotation guidelines are available on the task's website.
For each language pair, 2/3 of the annotated segments are provided for
training and 1/3 will be provided for testing during the evaluation
phase.

1. TASKS PROPOSED
The participating teams can choose to participate in either or both of
the following three tasks:

* Binary Classification (I)

In this task, it is only required to determine whether a segment is
right or wrong. For the first binary classification option, only tag (1)
is considered correct because the translators do not need to make any
modification, whilst tags (2) and (3) are considered wrong translations.

* Binary Classification (II)

As in the first task, in this task it is only required to determine
whether the segment is right or wrong. However, in contrast to the first
task, a segment is considered correct if it was labelled by annotators
as (1) or (2). Segments labelled (3) are considered wrong because they
require major post-editing.

* Fine-grained Classification:

In this task, the participating teams have to classify the segments
according to the annotation provided in the training data: correct
translations (1), correct translations with few orthotypographic errors
(2), and wrong (3).

2. SUBMISSION AND EVALUATION INFORMATION
Participants are required to register their intention to participate by
filling in the following form before 1st April 2016:
http://goo.gl/forms/ELStRtrw9J [2]

The organisers will provide the training and test set to the
participating teams and they will be asked to submit the output of their
systems in a format similar to the training set. The exact modality and
formatting of submissions will be communicated to participants at a
later stage.

For evaluation, standard measures like precision, recall, f-measure will
be used. In addition, the organisers may perform some manual error
analysis. The extent of this analysis will depend on the number of
systems submitted. For this reason, even though we do not plan to limit
the numbers of runs submitted by participants, they will be required to
indicate their primary (and secondary, if relevant) runs.

The participants are encouraged to release their systems and make them
publicly available for future use. They are also encouraged not to use
machine translation as one of the factors used to determine the class of
a segment. This is because we are trying to encourage development of
methods that can be run on large datasets without requiring a lot of
computational resources.

In addition to submitting the output of their system, the participants
will be asked to submit short contributions in the form of working notes
describing their systems. They will be published on the workshop's
website and submissions that are not accompanied by a description will
not be considered.

All systems will be presented in a demo session during the workshop.

3. IMPORTANT DATES

* Release of training data: second week of February 2016
* End of registration: 1st April 2016
* Evaluation phase: 14th - 27th April 2016
* Ranking of systems and release of the test set annotations: 4th May
2016
* Submission of working notes: 16th May 2016
* Workshop date: 28th May 2016

4. ORGANISING COMMITTEE

Eduard Barbu, Translated, Italy
Carla Parra, Hermes, Spain
Luca Mastrostefano, Translated, Italy
Matteo Negri, FBK, Italy
Marco Turchi, FBK, Italy
Luisa Bentivogli, FBK, Italy
Constantin Orasan, University of Wolverhampton, UK

The organisers can be contacted by sending an email to
nlp4tm2016@gmail.com.

--

DR. CARLA PARRA ESCART?N

Especialista en tecnolog?a aplicada - Investigadora Marie Curie -
EXPERT ITN [3]

Applied Technology Engineer - Marie Curie Experienced Researcher -
EXPERT ITN [3]

www.hermestrans.com [4]

(+34) 91 640 7640 (Madrid)

(+34) 95 202 0525 (M?laga)

AVISO LEGAL: Este mensaje est? dirigido ?nicamente a su destinatario.
Contiene informaci?n CONFIDENCIAL sometida a secreto profesional o cuya
divulgaci?n est? prohibida por la ley. Si ha recibido este mensaje por
error, debe saber que su lectura, copia y uso no est?n autorizados. Le
rogamos que nos lo comunique inmediatamente por esta misma v?a y proceda
a su destrucci?n. El correo electr?nico mediante Internet no permite
asegurar la confidencialidad de los mensajes que se transmiten ni su
integridad o correcta recepci?n. Hermes Traducciones y Servicios
Ling??sticos, SL no asume responsabilidad alguna por estas
circunstancias y se reserva el derecho a ejercer las acciones legales
que le correspondan contra todo tercero que acceda de forma ileg?tima al
contenido de este mensaje y al de los archivos en ?l contenidos. Si el
destinatario de este mensaje no consintiera la utilizaci?n del correo
electr?nico por Internet y la grabaci?n de los mensajes, rogamos que lo
ponga en nuestro conocimiento de forma inmediata.

LEGAL NOTICE: This message is only intended for the addressee. It
contains CONFIDENTIAL information protected by professional secrecy.
Dissemination of such information is prohibited by law. If you have
received his message by mistake, please be aware that you are not
authorised to read, copy or use it. Please notify us immediately via
this means and destroy it. E-mail over the Internet does not allow to
ensure the confidentiality, integrity or correct reception of the
messages that are sent. Hermes Traducciones y Servicios Ling??sticos, SL
does not accept liability for these circumstances and reserves the right
to take the legal measures to which it is entitled against any third
party that unlawfully accesses the content of this message and the files
attached here to. If the addressee of this message does not consent to
the use of e-mail via the Internet and to messages being saved, please
notify us on an immediate basis.

Links:
------
[1] http://rgcl.wlv.ac.uk/nlp4tm2016/shared-task/
[2] http://goo.gl/forms/ELStRtrw9J
[3] http://expert-itn.eu/
[4] http://www.hermestrans.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160208/0e588506/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 112, Issue 21
**********************************************

Moses-support Digest, Vol 112, Issue 21

0 Response to "Moses-support Digest, Vol 112, Issue 21"

Post a Comment