Moses-support Digest, Vol 123, Issue 23

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. RELEASE-3.0 data (Juraj Pancik)
2. First Call - EAMT 2017 Workshop on Social Media and User
Generated Content Machine Translation (Social MT 2017) (Haithem AFLI)
3. Re: German compound splitter (Tom Hoar)


----------------------------------------------------------------------

Message: 1
Date: Tue, 31 Jan 2017 20:51:39 +0100
From: Juraj Pancik <juraj@pancik.com>
Subject: [Moses-support] RELEASE-3.0 data
To: moses-support@mit.edu
Message-ID:
<CAGKxtGqCAO0h0YfCnR-=QGgfRQse2Jh2uhYoe8-YqRMRv-=+Jw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,

I'm using Moses for my Bachelor's thesis at Masaryk University, Czech
Republic to create a smart autocomplete web application for professional
translators.

I would like to ask, what happened to
http://www.statmt.org/moses/RELEASE-3.0/, which contained trained models
and was very helpful for me, since I didn't have to train the models by
myself. I was able to access the URL few weeks ago, but now I'm getting 404
response.

Has there been a newer release?

Thank you and with best regards,

Juraj Pancik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170131/5e2d4f6d/attachment-0001.html

------------------------------

Message: 2
Date: Tue, 31 Jan 2017 20:32:07 +0000
From: Haithem AFLI <aflihaithem@gmail.com>
Subject: [Moses-support] First Call - EAMT 2017 Workshop on Social
Media and User Generated Content Machine Translation (Social MT 2017)
To: moses-support@mit.edu
Message-ID:
<CALsfB69ur+rUEUA3VTrg17X5M2oSjbDGPpZh01GEq=np-=m80g@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

FIRST CALL FOR PAPERS
Apologies for multiple postings.

The first Workshop on Social Media and User Generated Content Machine
Translation (Social MT 2017) Co-located with EAMT 2017, Prague, Czech
Republic
For more information please visit:
https://sites.google.com/view/socialmt

CALL FOR PAPERS

With the widespread adoption of social media and online forums, individual
users have been able to actively participate in the generation of online
content in different languages and dialects. As a result, user-generated
content (UGC) has seen an enormous growth in the recent years. The nature
of UGC means that it can be generated at any time and in non-standard
language or formats. Compared to professionally edited text, it is often
more noisy, and likely to take some liberty with commonly established
grammar, punctuation and spelling norms. All this can make it difficult to
translate but UGC can also be incredibly valuable. This workshop will
explore the multifarious aspects of effective MT of data extracted from
social media.
The workshop aims to provide a research platform dedicated to new method
and techniques on translating user-generated content and exploring the use
of such transition on social media analytics. The workshop will solicit
original research contributions related to the theme, which includes (but
is not limited to):

- Models and Tools Development for Social MT
- Machine translation on Microblogs
- Multi-lingual social analytics
- Neural MT for UGC translation
- Multilingual crowdsourcing
- Building resources for UGC translation
- Sentiment translation of UGC
- Analyzing the diffusion of multilingual information
- Using MT for monitoring emergency responses among social crowds
- Multilingual Social-based web platform for disaster management
- Multilingual and language-specific Information Retrieval on Social Web
- Crosslingual document alignment using UGC data
- Named entity transliteration on social media content
- Code-mixed UGC translation
- MT for Big social data analysis

Submissions may include work in progress as well as finished work.
Submissions must have a clear focus on specific issues pertaining to UGC
and its translation. Descriptions of commercial systems are welcome, but
authors should be willing to discuss the details of their work.

IMPORTANT DATES

January 30, 2017: First Call for Workshop Papers
March 3, 2017: Second Call for Workshop Papers
March 24, 2017: Workshop Paper Due Date
April 14, 2017: Notification of Acceptance
May 12, 2017: Camera-ready papers due
May 31, 2017: Workshop Date (half-day workshop)

SUBMISSION FORMAT

Submissions must conform to the official style guidelines for EAMT 2017 (
https://ufal.mff.cuni.cz/pbml/instructions-authors).
Contributions can be short or long papers. Short paper submission must
describe original and unpublished work without exceeding eight (8) pages plus
any number of pages for references. Characteristics of short papers
include: a small, focused contribution; work in progress; a negative
result; an opinion piece; an interesting application nugget. Long paper
submissions must describe substantial, original, completed and unpublished
work without exceeding twelve (12) pages plus any number of pages for
references.
Reviewing will be double-blind, so the papers should not reveal the
authors? identity. Accepted papers will be published in the workshop
proceedings.
Double submission policy: Parallel submission to other meetings or
publications is possible but must be immediately notified to the workshop
organizers.
Submission Website: https://easychair.org/conferences/?conf=socialmt2017

Extended versions of the best papers will be published into an upcoming
special issue of ?Translating User Generated Content? on Machine
Translation Journal


WORKSHOP ORGANIZERS

General Chair: Andy Way (ADAPT Centre, Dublin City University)

Program Chair :Haithem Afli (ADAPT Centre, Dublin City University)

Program Committee
Lo?c Barrault (LIUM, Le Mans University)
Laurent Besacier (LIG, Grenoble University)
Philipp Koehn (University of Edinburgh / Johns Hopkins University)
Abdelkarim Mars (Grenoble University)
Matteo Negri (FBK)
Houda Bouamor (CMU Qatar)
Yvette Graham (ADAPT Centre, Dublin City University)
Dimitar Shterionov (KantanMT)
Marco Turchi (FBK)
Antonio Toral (University of Groningen)
Lucia Specia (University of Sheffield)
Kashif Shah (eBay)
Rejwanul Haque (Lingo24)
Barry Haddow (University of Edinburgh)
Jinhua Du (ADAPT Centre, Dublin City University)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170131/46a61577/attachment-0001.html

------------------------------

Message: 3
Date: Wed, 1 Feb 2017 08:36:37 +0700
From: Tom Hoar <tahoar@pttools.net>
Subject: Re: [Moses-support] German compound splitter
To: moses-support@mit.edu
Message-ID: <c560095c-0d97-f2bb-e119-c37c31644022@pttools.net>
Content-Type: text/plain; charset="windows-1252"

I'm sharing some feedback and asking new question.

I tried the SoMaJo German tokenizer. After considerable work with some
customers, we concluded it does not work as well for SMT as the built-in
Moses tokenizer.perl with German. So, back to the drawing board.

Rico, I'm revisiting your hybrid splitter and have some questions.

1. Are stemmed tokens in the output or only original tokens simply
split? It seems for SMT support, not stemming is applied. I just
want to verify because I can not use stemmed output.

2. I need the split output to be natural cased, i.e. not lower-cased.
Is this the purpose of the `-no-truecase` argument?

3. Can you confirm that the `-write-filler` argument marks the split
using " @@ "?

4. The command to train a model is simple enough:

`hybrid_compound_splitter.py -train -syntax -corpus INPUT_FILE
-model MODEL_FILE`

What state is German INPUT_FILE ? i.e. tokenized or not? lower-cased
or not?

In a separate but similar line, what is the current state of the art in
using compound-split corpus in the target language and then re-joining
the splits with proper casing for a final rendering?


Thanks!
Tom


On 8/26/2016 9:15 AM, moses-support-request@mit.edu wrote:
> Date: Thu, 25 Aug 2016 09:05:13 -0700
> From: Tom Hoar <tahoar@pttools.net>
> Subject: Re: [Moses-support] German compound splitter
> To:"moses-support@mit.edu" <moses-support@mit.edu>
>
> Thank you, Rico! Looks promising.
>
> I found this one on Python's Pypi repository:https://pypi.python.org/pypi/SoMaJo/1.1.2
>
> Does anyone have any experience with it?
>
> Tom
>
>
>
> On 8/25/2016 11:01 PM, moses-support-request@mit.edu wrote:
>
>> Date: Wed, 24 Aug 2016 17:23:22 +0100
>> From: Rico Sennrich<rico.sennr...@gmx.ch>
>> Subject: Re: [Moses-support] German compound splitter
>> To:moses-support@mit.edu
>>
>> Hi Tom,
>>
>> I've been using this one for the Edinburgh WMT submission (EN-DE
>> syntax-based) in the last 3 years:
>> https://github.com/rsennrich/wmt2014-scripts/blob/master/hybrid_compound_splitter.py
>>
>> It implements the hybrid (frequency-based and FST-based) algorithm by
>> Fritzinger & Fraser 2010: "How to Avoid Burning Ducks: Combining
>> Linguistic Analysis and Corpus Statistics for German Compound Processing"
>>
>> best wishes,
>> Rico
>>
>> On 24 August 2016 at 09:14, Tom Hoar <tahoar@pttools.net> wrote:
>>
>>> Does anyone recommend a German compound splitter? I know it's been
>>> discussed here before. Thanks.
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170131/f946854e/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 123, Issue 23
**********************************************

0 Response to "Moses-support Digest, Vol 123, Issue 23"

Post a Comment