Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. TSD 2014 - First Call for Papers (TSD 2014)
2. Re: Repositioning of non-translatable lexemes (Sorin Slavescu)
3. Final CfP: LREC 2014 Workshop on Building and Using
Comparable Corpora (7th BUCC) (Reinhard Rapp)
----------------------------------------------------------------------
Message: 1
Date: Tue, 04 Feb 2014 22:37:47 +0100
From: TSD 2014 <xrambous@aurora.fi.muni.cz>
Subject: [Moses-support] TSD 2014 - First Call for Papers
To: tsd2014@tsdconference.org
Message-ID: <E1WAngh-0007Lp-0B@aurora.fi.muni.cz>
*********************************************************
TSD 2014 - FIRST CALL FOR PAPERS
*********************************************************
Seventeenth International Conference on TEXT, SPEECH and DIALOGUE (TSD 2014)
Brno, Czech Republic, 8-12 September 2014
http://www.tsdconference.org/
The conference is organized by the Faculty of Informatics, Masaryk
University, Brno, and the Faculty of Applied Sciences, University of
West Bohemia, Pilsen. The conference is supported by International
Speech Communication Association.
Venue: Brno, Czech Republic
THE SUBMISSION DEADLINES:
March 15 2014 ............ Submission of abstracts
March 22 2014 ............ Submission of full papers
Submission of abstract serves for better organization of the review
process only - for the actual review a full paper submission is
necessary.
TSD SERIES
TSD series evolved as a prime forum for interaction between researchers in
both spoken and written language processing from all over the world.
Proceedings of TSD form a book published by Springer-Verlag in their
Lecture Notes in Artificial Intelligence (LNAI) series. TSD Proceedings
are regularly indexed by Thomson Reuters Conference Proceedings Citation
Index. Moreover, LNAI series are listed in all major citation databases
such as DBLP, SCOPUS, EI, INSPEC or COMPENDEX.
TOPICS
Topics of the conference will include (but are not limited to):
Corpora and Language Resources (monolingual, multilingual,
text and spoken corpora, large web corpora, disambiguation,
specialized lexicons, dictionaries)
Speech Recognition (multilingual, continuous, emotional
speech, handicapped speaker, out-of-vocabulary words,
alternative way of feature extraction, new models for
acoustic and language modelling)
Tagging, Classification and Parsing of Text and Speech
(morphological and syntactic analysis, synthesis and
disambiguation, multilingual processing, sentiment analysis,
credibility analysis, automatic text labeling, summarization,
authorship attribution)
Speech and Spoken Language Generation (multilingual, high
fidelity speech synthesis, computer singing)
Semantic Processing of Text and Speech (information
extraction, information retrieval, data mining, semantic web,
knowledge representation, inference, ontologies, sense
disambiguation, plagiarism detection)
Integrating Applications of Text and Speech Processing
(machine translation, natural language understanding,
question-answering strategies, assistive technologies)
Automatic Dialogue Systems (self-learning, multilingual,
question-answering systems, dialogue strategies, prosody in
dialogues)
Multimodal Techniques and Modelling (video processing, facial
animation, visual speech synthesis, user modelling, emotions
and personality modelling)
Papers on processing of languages other than English are strongly
encouraged.
PROGRAM COMMITTEE
Hynek Hermansky, USA (general chair)
Eneko Agirre, Spain
Genevieve Baudoin, France
Paul Cook, Australia
Jan Cernocky, Czech Republic
Simon Dobrisek, Slovenia
Karina Evgrafova, Russia
Darja Fiser, Slovenia
Radovan Garabik, Slovakia
Alexander Gelbukh, Mexico
Louise Guthrie, GB
Jan Hajic, Czech Republic
Eva Hajicova, Czech Republic
Yannis Haralambous, France
Ludwig Hitzenberger, Germany
Jaroslava Hlavacova, Czech Republic
Ales Horak, Czech Republic
Eduard Hovy, USA
Maria Khokhlova, Russia
Daniil Kocharov, Russia
Ivan Kopecek, Czech Republic
Valia Kordoni, Germany
Steven Krauwer, The Netherlands
Siegfried Kunzmann, Germany
Natalija Loukachevitch, Russia
Vaclav Matousek, Czech Republic
Diana McCarthy, United Kingdom
France Mihelic, Slovenia
Hermann Ney, Germany
Elmar Noeth, Germany
Karel Oliva, Czech Republic
Karel Pala, Czech Republic
Nikola Pavesic, Slovenia
Fabio Pianesi, Italy
Maciej Piasecki, Poland
Adam Przepiorkowski, Poland
Josef Psutka, Czech Republic
James Pustejovsky, USA
German Rigau, Spain
Leon Rothkrantz, The Netherlands
Anna Rumshisky, USA
Milan Rusko, Slovakia
Mykola Sazhok, Ukraine
Pavel Skrelin, Russia
Pavel Smrz, Czech Republic
Petr Sojka, Czech Republic
Stefan Steidl, Germany
Georg Stemmer, Germany
Marko Tadic, Croatia
Tamas Varadi, Hungary
Zygmunt Vetulani, Poland
Pascal Wiggers, The Netherlands
Yorick Wilks, GB
Marcin Wolinski, Poland
Victor Zakharov, Russia
KEYNOTE SPEAKERS
Ralph Grishman, New York University, USA
Bernardo Magnini, FBK - Fondazione Bruno Kessler, Italy
Salim Roukos, IBM, USA
FORMAT OF THE CONFERENCE
The conference program will include presentation of invited papers,
oral presentations, and poster/demonstration sessions. Papers will
be presented in plenary or topic oriented sessions.
Social events including a trip in the vicinity of Brno will allow
for additional informal interactions.
SUBMISSION OF PAPERS
Authors are invited to submit a full paper not exceeding 8 pages
formatted in the LNCS style (see below). Those accepted will be
presented either orally or as posters. The decision about the
presentation format will be based on the recommendation of the
reviewers. The authors are asked to submit their papers using the
on-line form accessible from the conference website.
Papers submitted to TSD 2014 must not be under review by any other
conference or publication during the TSD review cycle, and must not be
previously published or accepted for publication elsewhere.
As reviewing will be blind, the paper should not include the authors'
names and affiliations. Furthermore, self-references that reveal the
author's identity, e.g., "We previously showed (Smith, 1991) ...",
should be avoided. Instead, use citations such as "Smith previously
showed (Smith, 1991) ...". Papers that do not conform to the
requirements above are subject to be rejected without review.
The authors are strongly encouraged to write their papers in TeX or
LaTeX formats. These formats are necessary for the final versions of
the papers that will be published in the Springer Lecture Notes.
Authors using a WORD compatible software for the final version must
use the LNCS template for WORD and within the submit process ask the
Proceedings Editors to convert the paper to LaTeX format. For this
service a service-and-license fee of CZK 1500 will be levied
automatically.
The paper format for review has to be either PDF or PostScript file
with all required fonts included. Upon notification of acceptance,
presenters will receive further information on submitting their
camera-ready and electronic sources (for detailed instructions on the
final paper format see
http://www.springer.de/comp/lncs/authors.html#Proceedings, Sample File
typeinst.zip).
Authors are also invited to present actual projects, developed
software or interesting material relevant to the topics of the
conference. The presenters of demonstrations should provide an
abstract not exceeding one page. The demonstration abstracts will not
appear in the conference proceedings.
IMPORTANT DATES
March 15 2014 ............ Submission of abstracts
March 22 2014 ............ Submission of full papers
May 15 2014 .............. Notification of acceptance
May 31 2014 .............. Final papers (camera ready) and registration
August 3 2014 ............ Submission of demonstration abstracts
August 10 2014 ........... Notification of acceptance for
demonstrations sent to the authors
September 8-12 2014 ...... Conference date
Submission of abstracts serves for better organization of the review
process only - for the actual review a full paper submission is
necessary.
The accepted conference contributions will be published in proceedings
that will be made available to participants at the time of the
conference.
OFFICIAL LANGUAGE
The official language of the conference is English.
ACCOMMODATION
The organizing committee will arrange discounts on accommodation in
the 4-star hotel at the conference venue. The current prices of the
accommodation will be available at the conference website.
ADDRESS
All correspondence regarding the conference should be
addressed to
Ales Horak, TSD 2014
Faculty of Informatics, Masaryk University
Botanicka 68a, 602 00 Brno, Czech Republic
phone: +420-5-49 49 18 63
fax: +420-5-49 49 18 20
email: tsd2014@tsdconference.org
The official TSD 2014 homepage is: http://www.tsdconference.org/
LOCATION
Brno is the second largest city in the Czech Republic with a
population of almost 400.000 and is the country's judiciary and
trade-fair center. Brno is the capital of South Moravia, which is
located in the south-east part of the Czech Republic and is known
for a wide range of cultural, natural, and technical sights.
South Moravia is a traditional wine region. Brno had been a Royal
City since 1347 and with its six universities it forms a cultural
center of the region.
Brno can be reached easily by direct flights from London, Moscow,
and Eindhoven, and by trains or buses from Prague (200 km) or Vienna
(130 km).
For the participants with some extra time, nearby places may
also be of interest. Local ones include: Brno Castle now called
Spilberk, Veveri Castle, the Old and New City Halls, the
Augustine Monastery with St. Thomas Church and crypt of Moravian
Margraves, Church of St. James, Cathedral of St. Peter & Paul,
Cartesian Monastery in Kralovo Pole, the famous Villa Tugendhat
designed by Mies van der Rohe along with other important
buildings of between-war Czech architecture.
For those willing to venture out of Brno, Moravian Karst with
Macocha Chasm and Punkva caves, battlefield of the Battle of
three emperors (Napoleon, Russian Alexander and Austrian Franz
- Battle by Austerlitz), Chateau of Slavkov (Austerlitz),
Pernstejn Castle, Buchlov Castle, Lednice Chateau, Buchlovice
Chateau, Letovice Chateau, Mikulov with one of the largest Jewish
cemeteries in Central Europe, Telc - a town on the UNESCO
heritage list, and many others are all within easy reach.
------------------------------
Message: 2
Date: Wed, 05 Feb 2014 11:39:50 +0000
From: Sorin Slavescu <sorin.slavescu@oracle.com>
Subject: Re: [Moses-support] Repositioning of non-translatable lexemes
To: moses-support@mit.edu
Message-ID: <52F22306.3070301@oracle.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Thanks a lot folks. The placeholders functionality for numbers and dates
is something very interesting and it crossed my mind to use it for tags
too but unfortunately it doesn't link most formatting tags with the
words it should.
I'll have a look at that paper Hieu mentioned
(http://www.mtsummit2013.info/files/proceedings/wptp2-joanis.pdf) that
seems to be more along the lines of what I've been trying to achieve.
Thanks,
Sorin
On 04/02/14 19:24, Achim Ruopp wrote:
> The new placeholder feature in Moses v2.1 is documented here:
> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc61
> This is for placeholders that have semantic meaning, e.g. numbers and other
> named entities rather than tags.
>
> Yes, this needs to be combined with tag handling and evaluated. Working on
> this now.
>
> Achim
>
> -----Original Message-----
> From: moses-support-bounces@mit.edu [mailto:moses-support-bounces@mit.edu]
> On Behalf Of Tom Hoar
> Sent: Tuesday, February 04, 2014 11:45 AM
> To: moses-support@mit.edu
> Subject: Re: [Moses-support] Repositioning of non-translatable lexemes
>
> Hieu, is the new XML-markup feature you and Achim developed a better match?
> I can't find the reference.
>
> Tom.
>
>
>
> On 02/04/2014 11:25 PM, Barry Haddow wrote:
>> Hi Sorin
>>
>> You should check out m4loc (https://code.google.com/p/m4loc/), whose
>> features include "Word-alignment based tag reinsertion".
>>
>> There is also a web translation tool in Moses
>> (http://www.statmt.org/moses/?n=Moses.WebTranslation) that handles the
>> re-insertion of html markup, but this has not been updated for a while
>> and may or may not work with the current version of Moses,
>>
>> cheers - Barry
>>
>> On 04/02/14 15:06, Sorin Slavescu wrote:
>>> Hi all,
>>>
>>> Is there any research, tools or libraries to address the issue of
>>> repositioning non-translatable content like tags, placeholders,
>>> entities into the translated sentence?
>>> For example if the source sentence is "This is a <b>test</b>" and
>>> the translation is "C'est un test" to reposition the <b> tag into the
>>> translation in the right spot to become "C'est un <b>test</b>"
>>>
>>> Thanks,
>>> Sorin
>>> --
>>>
>>>
>>> ORACLE <http://www.oracle.com>
>>> Sorin Slavescu | Principal Software Engineer
>>> Phone: +35318031937 | E-mail: sorin.slavescu@oracle.com Oracle
>>> Worldwide Product Translation (WPTG) - Tools Block P5, East Point
>>> Business Park | Dublin 3, Ireland Oracle is committed to developing
>>> practices and products that help protect the environment
>>> <http://www.oracle.com/commitment>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
------------------------------
Message: 3
Date: Wed, 5 Feb 2014 14:06:23 +0100
From: "Reinhard Rapp" <reinhardrapp@gmx.de>
Subject: [Moses-support] Final CfP: LREC 2014 Workshop on Building and
Using Comparable Corpora (7th BUCC)
To: <IRList@lists.shef.ac.uk>, <listmaster@loria.fr>, <ln@cines.fr>,
<lr_egroup@mail.iiit.ac.in>, <moses-support@MIT.EDU>,
<news@multilingual.com>
Message-ID: <231BF614593D4F2A923714DCB28EF473@ASUSPC>
Content-Type: text/plain; charset="windows-1252"
We apologize for multiple postings
Please distribute to interested colleagues
============================================================
Final Call for Papers
7th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA
Building Resources for Machine Translation Research
http://comparable.limsi.fr/bucc2014/
May 27, 2014
Co-located with LREC 2014
Harpa Conference Centre, Reykjavik (Iceland)
DEADLINE FOR PAPERS: February 10, 2014
https://www.softconf.com/lrec2014/BUCC2014/
*** INVITED SPEAKER ***
Chris Callison-Burch (University of Pennsylvania)
============================================================
MOTIVATION
In the language engineering and the linguistics communities, research
in comparable corpora has been motivated by two main reasons. In
language engineering, on the one hand, it is chiefly motivated by the
need to use comparable corpora as training data for statistical
Natural Language Processing applications such as statistical machine
translation or cross-lingual retrieval. In linguistics, on the other
hand, comparable corpora are of interest in themselves by making
possible inter-linguistic discoveries and comparisons. It is generally
accepted in both communities that comparable corpora are documents in
one or several languages that are comparable in content and form in
various degrees and dimensions. We believe that the linguistic
definitions and observations related to comparable corpora can improve
methods to mine such corpora for applications of statistical NLP. As
such, it is of great interest to bring together builders and users of
such corpora.
The scarcity of parallel corpora has motivated research concerning
the use of comparable corpora: pairs of monolingual corpora selected
according to the same set of criteria, but in different languages
or language varieties. Non-parallel yet comparable corpora overcome
the two limitations of parallel corpora, since sources for original,
monolingual texts are much more abundant than translated texts.
However, because of their nature, mining translations in comparable
corpora is much more challenging than in parallel corpora. What
constitutes a good comparable corpus, for a given task or per se,
also requires specific attention: while the definition of a parallel
corpus is fairly straightforward, building a non-parallel corpus
requires control over the selection of source texts in both languages.
Parallel corpora are a key resource as training data for statistical
machine translation, and for building or extending bilingual lexicons
and terminologies. However, beyond a few language pairs such as
English- French or English-Chinese and a few contexts such as
parliamentary debates or legal texts, they remain a scarce resource,
despite the creation of automated methods to collect parallel corpora
from the Web. To exemplify such issues in a practical setting, this
year's special focus will be on
Building Resources for Machine Translation Research
This special topic aims to address the need for:
(1) Machine Translation training and testing data such as spoken or
written monolingual, comparable or parallel data collections, and
(2) methods and tools used for collecting, annotating, and verifying
MT data such as Web crawling, crowdsourcing, tools for language
experts and for finding MT data in comparable corpora.
TOPICS
We solicit contributions including but not limited to the following topics:
Topics related to the special theme:
* Methods and tools for collecting and processing MT data,
including crowdsourcing
* Methods and tools for quality control
* Tools for efficient annotation
* Bilingual term and named entity collections
* Multilingual treebanks, wordnets, propbanks, etc.
* Comparable corpora with parallel units annotated
* Comparable corpora for under-resourced languages and specific domains
* Multilingual corpora with rich annotations:
POS tags, NEs, dependencies, semantic roles, etc.
* Data for special applications: patent translation, movie
subtitles, MOOCs, meetings, chat-rooms, social media, etc.
* Legal issues with collecting and redistributing data
and generating derivatives
Building comparable corpora:
* Human translations
* Automatic and semi-automatic methods
* Methods to mine parallel and non-parallel corpora from the Web
* Tools and criteria to evaluate the comparability of corpora
* Parallel vs non-parallel corpora, monolingual corpora
* Rare and minority languages, across language families
* Multi-media/multi-modal comparable corpora
Applications of comparable corpora:
* Human translations
* Language learning
* Cross-language information retrieval & document categorization
* Bilingual projections
* Machine translation
* Writing assistance
Mining from comparable corpora:
* Extraction of parallel segments or paraphrases from comparable corpora
* Extraction of bilingual and multilingual translations of single words
and multi-word expressions; proper names, named entities, etc.
IMPORTANT DATES
February 10, 2014 Deadline for submission of full papers
March 10, 2014 Notification of acceptance
March 27, 2014 Camera-ready papers due
May 27, 2014 Workshop date
SUBMISSION INFORMATION
Papers should follow the LREC main conference formatting details (to be
announced on the conference website http://lrec2014.lrec-conf.org/en/ )
and should be submitted as a PDF-file via the START workshop manager at
https://www.softconf.com/lrec2014/BUCC2014/
Contributions can be short or long papers. Short paper submission must
describe original and unpublished work without exceeding six (6)
pages. Characteristics of short papers include: a small, focused
contribution; work in progress; a negative result; an opinion piece;
an interesting application nugget. Long paper submissions must
describe substantial, original, completed and unpublished work without
exceeding ten (10) pages.
Reviewing will be double blind, so the papers should not reveal the
authors' identity. Accepted papers will be published in the workshop
proceedings.
Double submission policy: Parallel submission to other meetings or
publications is possible but must be immediately notified to the
workshop organizers.
When submitting a paper from the START page, authors will be asked to
provide essential information about resources (in a broad sense,
i.e. also technologies, standards, evaluation kits, etc.) that have
been used for the work described in the paper or are a new result of
your research. Moreover, ELRA encourages all LREC authors to share
the described LRs (data, tools, services, etc.), to enable their
reuse, replicability of experiments, including evaluation ones, etc.
JOURNAL SPECIAL ISSUE
Authors of selected papers will be encouraged to submit substantially
extended versions of their manuscripts to an upcoming special issue
on ?Machine Translation Using Comparable Corpora? of the Journal
of Natural Language Engineering.
ORGANISERS
Pierre Zweigenbaum, LIMSI, CNRS, Orsay (France)
Ahmet Aker, University of Sheffield (UK)
Serge Sharoff, University of Leeds (UK)
Stephan Vogel, QCRI (Qatar)
Reinhard Rapp, Universities of Mainz (Germany) and Aix-Marseille (France)
FURTHER INFORMATION
Pierre Zweigenbaum: pz (at) limsi (dot) fr
SCIENTIFIC COMMITTEE
* Ahmet Aker, University of Sheffield (UK)
* Srinivas Bangalore (AT&T Labs, US)
* Caroline Barri?re (CRIM, Montr?al, Canada)
* Chris Biemann (TU Darmstadt, Germany)
* Herv? D?jean (Xerox Research Centre Europe, Grenoble, France)
* Kurt Eberle (Lingenio, Heidelberg, Germany)
* Andreas Eisele (European Commission, Luxembourg)
* ?ric Gaussier (Universit? Joseph Fourier, Grenoble, France)
* Gregory Grefenstette (INRIA, Saclay, France)
* Silvia Hansen-Schirra (University of Mainz, Germany)
* Hitoshi Isahara (Toyohashi University of Technology)
* Kyo Kageura (University of Tokyo, Japan)
* Adam Kilgarriff (Lexical Computing Ltd, UK)
* Natalie K?bler (Universit? Paris Diderot, France)
* Philippe Langlais (Universit? de Montr?al, Canada)
* Michael Mohler (Language Computer Corp., US)
* Emmanuel Morin (Universit? de Nantes, France)
* Dragos Stefan Munteanu (Language Weaver, Inc., US)
* Lene Offersgaard (University of Copenhagen, Denmark)
* Ted Pedersen (University of Minnesota, Duluth, US)
* Reinhard Rapp (Universit? Aix-Marseille, France)
* Sujith Ravi (Google, Mountain View, US)
* Serge Sharoff (University of Leeds, UK)
* Michel Simard (National Research Council Canada)
* Richard Sproat (OGI School of Science & Technology, US)
* Tim Van de Cruys (IRIT-CNRS, Toulouse, France)
* Stephan Vogel (QCRI, Qatar)
* Guillaume Wisniewski (Universit? Paris Sud & LIMSI-CNRS, Orsay, France)
* Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140205/12dd2688/attachment.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 88, Issue 9
********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 88, Issue 9"
Post a Comment