Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Detokenizer (Barry Haddow)
2. Call for Posters Translating and the Computer 36 London, 27
and 28 November 2014 (Rohit Gupta)
----------------------------------------------------------------------
Message: 1
Date: Wed, 16 Jul 2014 09:29:24 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Detokenizer
To: Judah Schvimer <judah.schvimer@mongodb.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <53C637E4.9080900@staffmail.ed.ac.uk>
Content-Type: text/plain; charset="utf-8"
Hi Judah
The tokeniser also escapes some characters which have special meaning
for Moses, and at decoding time the most important of these is the pipe
(|). A stray pipe probably caused Moses to fail for you, but URLs
shouldn't contain pipes.
cheers - Barry
On 15/07/14 13:59, Judah Schvimer wrote:
> HI,
>
> Thank you very much! That's incredibly helpful. My one concern is that
> before I tokenized the input to the decoder it was crashing. Do you
> know what tokens would cause that behavior if left in? Would you
> recommend just not tokenizing path names and urls and leaving
> everything else?
>
> Judah
>
>
> On Tue, Jul 15, 2014 at 4:02 AM, Barry Haddow
> <bhaddow@staffmail.ed.ac.uk <mailto:bhaddow@staffmail.ed.ac.uk>> wrote:
>
> Hi Judah
>
> The actual problem here is that you do not want path names split
> by the tokeniser. It's only really set up to deal with regular
> text, but what you can do is ask it to "protect" certain patterns
> by using the
>
> -protected <filename>
>
> argument. The file <filename> should contain a list of regular
> expressions (one per line), and the tokeniser will not split apart
> any tokens which match these REs. I'm guessing that in the example
> below you don't want "tutorial" translated into the target
> language, and if the tokeniser doesn't split the path then the
> whole thing will pass through as an OOV,
>
> cheers - Barry
>
>
> On 14/07/14 16:53, Judah Schvimer wrote:
>
> Hi,
>
> When I'm using the decoder I have to tokenize my target
> sentences before I translate them. However, when I detokenize
> them it leaves awkward spaces around what was tokenized. is
> there any way to fix this? It seems to be mainly around
> slashes and colons
>
> Source: :doc:`/tutorial/aggregation-zip-code-data-set`
> Target: : Doc: '/ tutorial / aggregation-zip-code-data-set'
>
> Thanks,
> Judah
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140716/e9bce2b4/attachment-0001.htm
------------------------------
Message: 2
Date: Wed, 16 Jul 2014 11:21:37 +0100
From: Rohit Gupta <enggrohitgupta@gmail.com>
Subject: [Moses-support] Call for Posters Translating and the Computer
36 London, 27 and 28 November 2014
To: moses-support@mit.edu, mt-list@eamt.org
Message-ID:
<CAB-CSF89jQucMTjtUEwxMOuU_EVtXPcMjwBLW2c-T6LGK3zZPA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
(Apologies if you received multiple copies. Please, distribute it among
potentially interested colleagues.)
Call for Posters
*Translating and the Computer 36*
*London, 27 and 28 November 2014*
The Translating and the Computer conference (
http://www.translatingandthecomputer.com/) encourages submissions for
poster presentations to supplement the regular presentations of the
conference. Posters are expected to present ongoing and not necessarily
completed research, teaching or training activity, practical work, software
programs, projects or developments in general related to translation,
interpretation and terminology, and to the related industries.
The Translating and the Computer conference is a unique forum for
researchers, developers and users. It brings together academics involved in
language technology research and in teaching translation and terminology
with those who develop and market tools for language transformation and
both of these groups with users: translators, terminologists, interpreters,
and voice-over specialists, whether freelancers or working in translation
departments of large organisations such as those of the European
Parliament, European courts and the European Patent Office, the United
Nations family, international companies and other organisations, and
Language Services Providers (LSPs), large and small.
In its 36th session *Translating and the Computer* has moved from ASLIB to
*ASLING*. The conference often referred to as the ?ASLIB Conference? is now
the *ASLING Translating and the Computer Conference. *One of the new
developments is also the launch of a poster session in addition to the
regular presentation slots.
Poster proposals in the form of poster abstracts not exceeding 500 words
(the final versions of the accepted posters can be up to 1,500 words) must
be submitted using the START system at the following address:
*https://www.softconf.com/e/tc2014
<https://www.softconf.com/e/tc2014>**, *adding the text ?Poster:? at the
start of the ?Title of Submission: ? field in the online submission form.
Accepted poster papers will be included (and will have the have the same
status as regular papers) in the conference proceedings only after the
registration fee for at least one presenter of the paper has been paid.
*Important dates*
Deadline for poster submissions: 8 August 2014
Notification of acceptance or rejection: 22 August 2014
Camera-ready poster papers due: 3 October
Conference: 29 and 30 November 2014
*Chairs*
- Juliet Macan, Arancho Doc srl. (Lead Chair 2014)
- Jo?o Esteves-Ferreira, Tradulex, International Association for Quality
Translation
- Ruslan Mitkov, University of Wolverhampton
- Olaf-Michael Stefanov, United Nations (ret), JIAMCATT
*Programme Committee*
- David Chambers, World Intellectual Property Organisation (ret)
- Gloria Corpas Pastor, University of Malaga
- Estelle Delpeche, Nomao
- Alain D?silets, National Research Council of Canada (NRC)
- David Filip, LRC, CNGL, LT-Web, University of Limerick
- Pamela Mayorcas, FITI
- Paola Valli, University of Trieste
*Conference Manager:*
- Nicole Adamides
*AsLing.org*
*Association internationale pour la promotion des technologies
linguistiques*
*International Association for Advancement in Language Technology*
*Bologna, Gen?ve, London, Wien, Wolverhampton*
------------------------
Regards,
*Rohit Gupta*
*Marie Curie Early Stage Researcher, EXPERT Project*Research Group in
Computational Linguistics
Research Institute of Information and Language Processing
University of Wolverhampton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140716/cf75eb99/attachment-0001.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 93, Issue 20
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 93, Issue 20"
Post a Comment