Moses-support Digest, Vol 93, Issue 34

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. PhD thesis offer in France/Learning from Post-Edition in
Machine Translation / LIFL (Lille) and LIG (Grenoble)
(Laurent Besacier)
2. Predetermined translations in training data (Roee Aharoni)

----------------------------------------------------------------------

Message: 1
Date: Sat, 26 Jul 2014 00:03:24 +0200
From: Laurent Besacier <laurent.besacier@imag.fr>
Subject: [Moses-support] PhD thesis offer in France/Learning from
Post-Edition in Machine Translation / LIFL (Lille) and LIG (Grenoble)
To: "'moses-support'" <moses-support@mit.edu>
Message-ID: <8C781306-4AF0-449E-870D-491B9AFD59B7@imag.fr>
Content-Type: text/plain; charset="utf-8"

>
>
> PhD thesis offer in France/ Learning from Post-Edition in Machine
> Translation / LIFL (Lille) and LIG (Grenoble)
>
>
> Contacts : Olivier Pietquin : olivier.pietquin@univ-lille1.fr Laurent
> Besacier : laurent.besacier@imag.fr
>
>
> Problem
>
> Statistical Machine Translation (SMT) is the process by which texts are
> automatically translated from a source language to a target language by
> a machine that has been trained on corpora in both languages. Thanks to
> progress in the training of SMT engines, machine translation has become
> good enough so that it has become advantageous for translators to
> post-edit machine outputs rather than translate from scratch. However,
> current enhancement of SMT systems from human post-edition (PE) are
> rather basic: the post-edited output is added to the training corpus and
> the translation model and language model are re-trained, with no clear
> view of how much has been improved and how much is left to be
> improved. Moreover, the final PE result is the only feedback used:
> available technologies do not take advantage of logged sequences of
> post-edition actions, which inform on the cognitive processes of the
> post-editor.
>
> The proposed thesis aims at using the post-edition process as a
> demonstration of how an expert translator modifies the SMT result to
> produce a perfect translation. Learning from demonstration is an
> emerging field in machine learning, mostly applied to robotics [1] that
> will thus be explored further in the particular framework of SMT.
>
> Topic of research
>
> A novel approach to SMT training will be adopted in this thesis, i.e.
> considering the post-edition process as a sequential decision making
> process performed by human experts who should be imitated. This thesis?
> first fundamental contribution to SMT will be to reformulate the problem
> of post-edition in SMT as a sequential decision making problem
> [4]. Indeed, the hypothesis selection and ranking process occurring in
> an SMT system can be seen as an action selection strategy, choosing
> after each post-edition step amongst a large number of actions (all
> possible hypotheses and rankings). This strategy has to be modified
> according to post-edition results arising sequentially and being
> influenced by previous actions (hypothesis selection) of the system.
>
> From this, SMT will be casted into an imitation learning problem, that
> is learning from demonstrations made by an expert: post-edition results
> can be seen as examples of what the system should do, again in a
> sequential decision making process and not in a static one such as
> supervised learning. Indeed, SMT decoding, whether it is based on
> phrases or chunks, can be seen as a sequential decision making
> process. The sequences of decisions taken by an expert during the
> post-edition process can be seen as a target for the system, which will
> try to imitate them in similar situations. To do so, we will extend the
> work described in [2], that modelled semantic parsing as an Inverse
> Reinforcement Learning (IRL) [3].
>
> In addition, the question of automatically selecting the sentences that
> should be used for post-edition and further learning will be addressed.
> Especially, this will be studied under the active learning
> paradigm. Large and diversified amounts of post-edited data, collected
> in an industrial setting, will be made available for the research
> project.
>
>
> Profile
>
> The applicants must hold an Engineering or a Master degree in
> Computational Linguistics or computer science, preferably with
> experience in the fields of statistical machine learning and/or natural
> language processing. Good background in programming will also be
> required. He/she will also be involved in a research project, funded by
> the French National Agency for Research, involving 2 research labs (LIFL
> in Lille and LIG in Grenoble) and a company (Lingua & Machina). For this
> reason good English level is required (good command of French being a
> plus). Finally effective communication skills in English, both written
> and verbal are mandatory.
>
> Context
>
> The candidate will be hired by University Lille 1 in the framework of a
> national research project. S/he will mainly be hosted in the SequeL (
> Sequential Learning) team of the Laboratoire d?Informatique Fondamentale
> de Lille (LIFL). SequeL is also a common team-project with INRIA
> (national institute for research in computer science and mathematics)
> and espe- cially the INRIA Lille - Nord Europe Center. The group
> involves around 25 researchers working on sequential learning and is
> internationally recognized. Lille is the largest city of the north of
> France, a metropolis with 1 million inhabitants, with excellent train
> connections to Brussels (30 min), Paris (1h) and London (1h30).
>
> This thesis will be supervised in strong collaboration with the GETALP
> team of Laboratoire d?Informatique de Grenoble (LIG), widely renowned
> for its research on natural language and speech processing. Grenoble is
> a high-tech city with 4 universities. It is located at the heart of the
> Alps, in outstanding scientific and natural surroundings. It is 3h by
> train from Paris ; 2h from Geneva ; 1h from Lyon ; 2h from Torino and is
> less than 1h from Lyon international airport.
>
> The PhD thesis will be co-supervised by Olivier Pietquin in Lille and
> Laurent Besacier in Grenoble.
>
> Contacts
>
> Interviews will be held in Sept 2014. Meetings during Interspeech 2014
> in Singapore can be also organized. For further info, please contact:
>
> Olivier Pietquin : olivier.pietquin@univ-lille1.fr
> Laurent Besacier : laurent.besacier@imag.fr
>
> References
>
> [1] Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett
> Browning. A survey of robot learning from demonstration. Robotics and
> Autonomous Systems, 57(5):469?483, May 2009.
>
> [2] Gergely Neu and Csaba Szepesv??ri. Training parsers by inverse
> reinforcement learning. Machine Learning, 77(2-3):303?337, 2009.
>
> [3] Andrew Y. Ng and Stuart J. Russell. Algorithms for inverse
> reinforcement learning. In Proceedings of the Seventeenth International
> Conference on Machine Learning, ICML ?00, pages 663?670, San Francisco,
> CA, USA, 2000. Morgan Kaufmann Publishers Inc.
>
> [4] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An
> Introduction. The MIT Press, 3rd edition, March 1998.
>
>
>
>
>

------------------------
Laurent Besacier
Professeur ? l'Universit? Joseph Fourier (Grenoble 1)
Laboratoire d'Informatique de Grenoble (LIG)
Membre Junior de l'Institut Universitaire de France (IUF 2012-2017)
laurent.besacier@imag.fr
-------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140726/760aed72/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1879 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20140726/760aed72/attachment-0001.bin

------------------------------

Message: 2
Date: Sat, 26 Jul 2014 02:05:57 -0700 (PDT)
From: "Roee Aharoni" <roee.aharoni@gmail.com>
Subject: [Moses-support] Predetermined translations in training data
To: moses-support@mit.edu
Message-ID: <1406365552829.a43234ca@Nodemailer>
Content-Type: text/plain; charset="utf-8"

Hi all,
We use the predetermined translations feature in our system, using the <n translation="..."> XML tags. My question is whether the training script train-moses.perl knows to handle or ignore those tags, or what are the consequences of using those tags in the training data.

Thanks in advance,
Roee

Roee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140726/0cc9b210/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 93, Issue 34
*********************************************

Moses-support Digest, Vol 93, Issue 34

0 Response to "Moses-support Digest, Vol 93, Issue 34"

Post a Comment