Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: How to tell EMS to concatenate training corpora
(Rico Sennrich)
2. Unexpected behaviour of placeables (Carla Parra)
3. Re: How to tell EMS to concatenate training corpora
(Lane Schwartz)
4. CFP Deadline extension: HyTra-4, in conjunction with
ACL-2015, Beijing: 31 July 2015 (Marta Ruiz)
----------------------------------------------------------------------
Message: 1
Date: Mon, 18 May 2015 11:26:27 +0000 (UTC)
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] How to tell EMS to concatenate training
corpora
To: moses-support@mit.edu
Message-ID: <loom.20150518T132052-43@post.gmane.org>
Content-Type: text/plain; charset=us-ascii
Lane Schwartz <dowobeha@...> writes:
>
> I have a number of distinct monolingual corpora. I've been training them
as separate LMs. I now want to run a variant where they are all concatenated
together, and then trained as a single LM. The EMS walkthrough says this
should be possible
(http://www.statmt.org/moses/?n=FactoredTraining.EMS#ntoc19), but doesn't
give the requisite syntax. What is the EMS syntax to do this?
>
> Thanks,
> Lane
Hi Lane,
check commit 27fd45d - it implements basic support for concatenation of LM
corpora in EMS. Feel free to tinker with it to make it more configurable -
being able to override which corpora to concatenate would be nice, for example.
best wishes,
Rico
------------------------------
Message: 2
Date: Mon, 18 May 2015 13:35:46 +0200
From: Carla Parra <carla.parra@hermestrans.com>
Subject: [Moses-support] Unexpected behaviour of placeables
To: Moses Support <moses-support@mit.edu>
Message-ID: <82015590f5b036d68fdeee44112205fd@hermestrans.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Dear all,
we just finished some experiments using placeables, and we have observed
several issues that may be worth sharing. I don't know if someone has
experienced the same, or you were already aware of this, but just in
case:
(1) Special characters must be scaped in the "entity" value field.
Otherwise, the cause XML parsing errors at tuning (not at training,
though!), and wrong values are retrieved from the tags (e.g. we had text
with additional quotation marks, and this caused that the translation
stopped at the first quotation mark, not yielding the complete "entity"
value we had encoded).
(2) <ne> tags are added to sentences as if they were computed as tokens
during training. (i.e. not ignored, as they just contain the
placeables).
As an example, the English sentence "Allow simple password", is
translated as "Permitir simple contrase?a <ne translation="@tag@"
entity="</1>">@tag@</ne> ."
While the first issue is our fault, we do not know what causes the
second one. We have followed the instructions at the MOSES advanced
features site and thus specified "extract-settings = "--Placeholder
@tag@"" in training and "-placeholder-factor 1 -xml-input exclusive" in
the decoder and evaluation. Has anyone experienced the same thing and/or
know how to solve this issue?
Thank you very much. Best regards,
Carla
--
Carla Parra Escart?n
Marie Curie Experienced Researcher - EXPERT ITN
http://expert-itn.eu/
Hermes Traducciones
------------------------------
Message: 3
Date: Mon, 18 May 2015 07:28:41 -0500
From: Lane Schwartz <dowobeha@gmail.com>
Subject: Re: [Moses-support] How to tell EMS to concatenate training
corpora
To: Rico Sennrich <rico.sennrich@gmx.ch>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CABv3vZm2NaFO9X8sceTSxC63qUhHkcQRy32pQKJ-5xq5KTxRZA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Thanks!
On Monday, May 18, 2015, Rico Sennrich <rico.sennrich@gmx.ch> wrote:
> Lane Schwartz <dowobeha@...> writes:
>
> >
> > I have a number of distinct monolingual corpora. I've been training them
> as separate LMs. I now want to run a variant where they are all
> concatenated
> together, and then trained as a single LM. The EMS walkthrough says this
> should be possible
> (http://www.statmt.org/moses/?n=FactoredTraining.EMS#ntoc19), but doesn't
> give the requisite syntax. What is the EMS syntax to do this?
> >
> > Thanks,
> > Lane
>
> Hi Lane,
>
> check commit 27fd45d - it implements basic support for concatenation of LM
> corpora in EMS. Feel free to tinker with it to make it more configurable -
> being able to override which corpora to concatenate would be nice, for
> example.
>
> best wishes,
> Rico
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <javascript:;>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
--
When a place gets crowded enough to require ID's, social collapse is not
far away. It is time to go elsewhere. The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150518/4d8f8d45/attachment-0001.htm
------------------------------
Message: 4
Date: Mon, 18 May 2015 10:25:38 -0500
From: Marta Ruiz <martaruizcostajussa@gmail.com>
Subject: [Moses-support] CFP Deadline extension: HyTra-4, in
conjunction with ACL-2015, Beijing: 31 July 2015
To: moses-support@mit.edu
Message-ID:
<CABEBqH+NA-68ofmG5OG7OAPk0OCwsDXSa_aRv7jNOxg+s8b1YQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
**************************************************************
Deadline Extension:
Workshop 'Hybrid Approaches to Translation', 4th Edition
HyTra-4
in conjunction with ACL-2015, Beijin: 31 July 2015
Project website: http://glicom.upf.edu/hytra2015/
**************************************************************
Link to Paper Submissions: https://www.softconf.com/acl2015/HyTra/
===============
Important Dates
===============
**Paper submission: Sun, 24 May 2015 [EXTENDED]**
Notification of acceptance: Mon, 8 June 2015
Camera-ready papers due: Sun, 21 June 2015
Workshop: 31 July 2015
[[4th call for papers]]
HyTra-4, in conjunction with ACL-2015, Beijing: 31 July 2015
The Workshop HyTra-4 (Hybrid Approaches to Translation, 4th Edition) aims
at providing a communication platform and informing research agenda around
theoretical and practical issues of Hybrid MT, and specifically ? the
problems, methodologies, resources and theoretical ideas which originate
outside the mainstream MT paradigm, but have potential to enhance the
quality of state-of-the-art MT systems. by bringing together diverse range
of technologies, methods and tools into MT domain.
We solicit contributions including but not limited to the following topics:
? ways and techniques of hybridization
? architectures for the rapid development of hybrid MT systems
? applications of hybrid systems
? hybrid systems dealing with under-resourced languages
? hybrid systems dealing with morphologically rich languages
? using linguistic information (morphology, syntax, semantics) to
enhance statistical MT (e.g. with hierarchical or factored models)
? bootstrapping rule-based systems from corpora
? hybrid methods in spoken language translation
? extraction of dictionaries from parallel and comparable corpora
? induction of morphological, grammatical, and translation rules from
corpora
? machine learning techniques for hybrid MT
? describing structural mappings between languages (e.g. tree-structures
using synchronous/transduction grammars)
? heuristics for limiting the search space in hybrid MT
? alternative methods for the fair evaluation of the output of different
types of MT systems (e.g. relying on linguistic criteria)
? system combination approaches such as multi-engine MT (parallel) or
automatic post- editing (sequential)
? open source tools and free language resources for hybrid MT, or
developed on the basis of hybrid approaches
? use of Hybrid MT techniques for other computational linguistics tasks,
such as translation lexicography, contrastive morphology, parallel grammar
induction
Contributions can be short or long papers. Short paper submission must
describe original and unpublished work without exceeding five pages of
content plus one extra page for references. Characteristics of short papers
include: a small, focused contribution; work in progress; a negative
result; an opinion piece; an interesting application nugget. Long paper
submissions must describe substantial, original, completed and unpublished
work without exceeding eight pages of content plus two extra pages for
references. Submissions will be judged according to the criteria of the
main conference (ACL 2015).
Submission Instructions
Authors are invited to submit papers on original and previously unpublished
work. Formatting should be according to ACL 2015 specifications using LaTeX
or MS-Word style files. Reviewing of papers will be double-blind, so the
submissions should not reveal the authors? identity.
Submission is electronic in PDF format using the START submission system at
the following url:
https://www.softconf.com/acl2015/HyTra/
Double submission policy: Parallel submission to other meetings or
publications are possible but must be immediately notified to the workshop
contact person (see below). If accepted, withdrawals are only possible
within two days after notification.
For an accepted paper to appear in the proceedings, at least one author
must register for the workshop and actually present the paper. The papers
will be published in the workshop proceedings which will be made available
via the ACL Anthology.
Important dates:
** 24 May 2015, Sunday: Workshop Paper Due Date [EXTENDED]**
*8 June 2015, Monday: Notification of Acceptance*
21 June 2015: Camera-ready papers due
31 July 2015: 1-day Workshop HyTra-4
Website (follow the news and updates)
http://glicom.upf.edu/hytra2015
Organizers
Bogdan Babych (University of Leeds)
Kurt Eberle (Lingenio GmbH, Heidelberg)
Marta R. Costa-juss? (Instituto Polit?cnico Nacional, Mexico)
Rafael E. Banchs (Institute for Infocomm Research, Singapore)
Patrik Lambert (Pompeu Fabra University, Barcelona)
Reinhard Rapp (University of Mainz)
Programme Committee
Ahmet Aker, Sheffield, UK
Bogdan Babych, Leeds, UK
Rafael E. Banchs, Singapore
Alexey Baytin, Yandex, Moscow, Russia
N?ria Bel, Universitat Pompeu Fabra, Barcelona, Spain
Anja Belz, Brighton, UK
Pierrette Bouillon, ISSCO/TIM/ETI, University of Geneva, Switzerland
Michael Carl, Copenhagen Business School, Denmark
Marta R. Costa-juss?, Mexico City, Mexico
Oliver C?ulo, University of Mainz, Germany
Kurt Eberle, Heidelberg, Germany
Christian Federmann, Microsoft Research, Seattle, USA
Maxim Khalilov, Berlin, Germany
Udo Kruschwitz, University of Essex, UK
Patrik Lambert, Barcelona, Spain
Yannick Parmentier, Orleans, France
Reinhard Rapp, Mainz, Germany
Serge Sharoff, University of Leeds, UK
George Tambouratzis, Institute for Language and Speech Processing, Athens,
Greece.
J?rg Tiedemann, University of Uppsala, Sweden
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150518/15039c7e/attachment.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 103, Issue 42
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 103, Issue 42"
Post a Comment