Moses-support Digest, Vol 87, Issue 54

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: word alignment-words' indexes and sentences' length
(amir haghighi)
2. Re: word alignment-words' indexes and sentences' length
(Barry Haddow)
3. Re: word alignment-words' indexes and sentences' length
(Amin Farajian)
4. Short-term contract in SMT in WIPO (Pouliquen, Bruno)


----------------------------------------------------------------------

Message: 1
Date: Fri, 24 Jan 2014 12:28:52 +0330
From: amir haghighi <amir.haghighi.64@gmail.com>
Subject: Re: [Moses-support] word alignment-words' indexes and
sentences' length
To: moses-support@mit.edu
Message-ID:
<CA+UVbEjXmkkVcb2ETeaGDR_fs+NExK-kKH4qyNkdVS6VezZ-8A@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

I use the built-in tokenizer in the Moses.
how can I change this tokenizer? should I change the source code?

Regards
Amir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140124/e09e9a51/attachment-0001.htm

------------------------------

Message: 2
Date: Fri, 24 Jan 2014 09:21:11 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] word alignment-words' indexes and
sentences' length
To: amir haghighi <amir.haghighi.64@gmail.com>, moses-support@mit.edu
Message-ID: <52E23087.6000607@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Amir

You can use this tokeniser as a basis for creating your own tokeniser,
or you can swap in your own tokeniser. For EMS a tokeniser should read
from stdin and write to stdout, so you can run it like this

tokeniser [options] < input > output

cheers - Barry

On 24/01/14 08:58, amir haghighi wrote:
> I use the built-in tokenizer in the Moses.
> how can I change this tokenizer? should I change the source code?
>
> Regards
> Amir
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 3
Date: Fri, 24 Jan 2014 10:36:18 +0100
From: Amin Farajian <ma.farajian@gmail.com>
Subject: Re: [Moses-support] word alignment-words' indexes and
sentences' length
To: moses-support@mit.edu
Message-ID: <52E23412.3000401@gmail.com>
Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140124/2f6da6ab/attachment-0001.htm

------------------------------

Message: 4
Date: Fri, 24 Jan 2014 10:49:57 +0000
From: "Pouliquen, Bruno" <bruno.pouliquen@wipo.int>
Subject: [Moses-support] Short-term contract in SMT in WIPO
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<0E4F6F34CE741B4BA0AF6BEA0925EABB4A869958@WICHM02.WIECSP.UNICC.ORG>
Content-Type: text/plain; charset="us-ascii"

Title: Short-term contract(s) in the field of Statistical Machine Translation (2 months full time or 12 months part-time)



Wipo is looking for candidates who could work for a short period of time on specific tasks related to SMT.



We are looking for one (or more) skilled computer scientist(s) to assist our work in Statistical Machine Translation at the World Intellectual Property Organization in Geneva (Switzerland).



Project:

The WIPO Global Databases Service is working in the area of automatically translating Patent documents using open source Moses (http://www.statmt.org/moses/) and is especially looking for a candidate(s) that could contribute in one or more of the following fields:

- Core Moses components: improving the speed and/or memory usage for big models (good C++ knowledge required)

- Pre and Post processing texts in different languages to improve translation quality (Especially for Chinese, Korean, Japanese and German)

- Work on Lucene framework to improve the tokenization of various languages (CJK languages but also Russian, Arabic and other European languages).

- Experimenting with different models of Moses (eg. tools to combine various models)

- Graphical User Interface (good knowledge of Java/Swing and/or Web/JSF 2.0 required) to provide an improved means for accessing various proposals for translations



Our Organization:

The World Intellectual Property Organization (WIPO, see http://www.wipo.int/) is a specialized agency of the United Nations. It is dedicated to developing a balanced and accessible international Intellectual Property (IP) system. As part of its mandate, WIPO translates Patent applications and disseminates information about published patent applications using PATENSCOPE search engine: http://www.wipo.int/patentscope. To make this information available world wide, WIPO is looking for techniques to help the translation of patents in various languages. A first prototype for patent translation is available at: http://www.wipo.int/patentscope/translate. Various related publications are listed at the end of document: http://patentscope.wipo.int/translate/wtapta-user-manual-en.pdf







Required skills:

- The candidate must have a strong background in computer science (minimum master level), ideally with specialization in computational linguistics, and preferably familiar with Moses (or other SMT tools).

- Depending on the field, a strong knowledge of C++ and/or Java is required

- Applicants should have excellent written and spoken English or French, a working knowledge of other official languages of WIPO (German, Spanish, Portuguese, Russian, Arabic, Chinese, Japanese or Korean) would be an advantage. We particularly welcome written knowledge of Chinese, Japanese or Korean.

- Good knowledge of web technology, databases, statistics and search engines (Lucene / Solr), scripting languages (Perl, bash) and Unix would also be an advantage.







Additional information:



Application deadline: 3/2/2014 (maybe extended)



Expected starting date: as soon as possible (hence the short notice)



Duration: 2 to 11 months. Please note that this is a short term position (the candidate may work remotely and/or part-time depending on the project).



Please send your questions and/or detailed CV by email to: patentscope@wipo.int<mailto:patentscope@wipo.int>



---

Bruno Pouliquen

Global Databases Service

WIPO




World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140124/442ed2fd/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 87, Issue 54
*********************************************

0 Response to "Moses-support Digest, Vol 87, Issue 54"

Post a Comment