Moses-support Digest, Vol 120, Issue 31

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Unknown Words In Moses (Emmanuel Dennis)
2. Re: Unknown Words In Moses (Hieu Hoang)
3. Re: Syntax-based Constrained Decoding (Hieu Hoang)
4. AMTA 2016 SeMaT Workshop - November 1, 2016 at Austin, TX
(Mahmoud Ghoneim)


----------------------------------------------------------------------

Message: 1
Date: Fri, 28 Oct 2016 10:10:59 +0300
From: Emmanuel Dennis <emmanueldennisb@gmail.com>
Subject: [Moses-support] Unknown Words In Moses
To: moses-support@mit.edu
Message-ID:
<CAJBn_brKp7XjzEMyoTvBt5LWuw=R_t6qKex7nLoNo48YUzreJw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi! How do you deal with unknown words in Moses?



Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20161028/682b52ec/attachment-0001.html

------------------------------

Message: 2
Date: Fri, 28 Oct 2016 08:15:01 -0500
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Unknown Words In Moses
To: Emmanuel Dennis <emmanueldennisb@gmail.com>, moses-support@mit.edu
Message-ID: <1274635c-1f13-da19-cfe2-148e3be840ad@gmail.com>
Content-Type: text/plain; charset="windows-1252"

by default, unknown words are copied to the output with no change.

you can also add

-drop-unknown

when running the decoder to no translate unknown words.

Or you can pre-process the input and transliterate unknowns.

More information here:

http://www.statmt.org/moses/?n=Advanced.OOVs


On 28/10/2016 02:10, Emmanuel Dennis wrote:
> Hi! How do you deal with unknown words in Moses?
>
>
>
> Thanks
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20161028/da060b47/attachment-0001.html

------------------------------

Message: 3
Date: Fri, 28 Oct 2016 08:27:35 -0500
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Syntax-based Constrained Decoding
To: Shuoyang Ding <mtsding@gmail.com>, Moses <moses-support@mit.edu>
Message-ID: <2fed89a7-492c-951c-6239-4f99fb551122@gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed

good point. The decoder is set up to translate quickly so there's a few
pruning parameters which throws out low scoring rules or hypotheses.

These are some of the pruning parameters you'll need to change (there
may be more):
1. [feature]
PhraseDictionaryWHATEVER table-limit=0
2. [cube-pruning-pop-limit]
1000000
3. [beam-threshold]
0
4. [stack]
1000000
Make the change 1 at a time in case it makes decoding too slow, even
with constrained decoding.

It may be that you have to run the decoder with phrase-tables that are
trained only on 1 sentence at a time.

I'll be interested in knowing how you get on so let me know how it goes

On 26/10/2016 13:56, Shuoyang Ding wrote:
> Hi All,
>
> I?m trying to do syntax-based constrained decoding on the same data from which I extracted my rules, and I?m getting very low coverage (~12%). I?m using GHKM rule extraction which in theory should be able to reconstruct the target translation even only with minimal rules.
>
> Judging from the search graph output, the decoder seems to prune out rules with very low scores, even if they are the only rule that can reconstruct the original reference.
>
> I?m curious if there is a way in the current constrained decoding implementation such that I can disable pruning? Or at least, if it is feasible to do so?
>
> Thanks!
>
> Regards,
> Shuoyang Ding
>
> Ph.D. Student
> Center for Language and Speech Processing
> Department of Computer Science
> Johns Hopkins University
>
> Hackerman Hall 225A
> 3400 N. Charles St.
> Baltimore, MD 21218
>
> http://cs.jhu.edu/~sding
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



------------------------------

Message: 4
Date: Fri, 28 Oct 2016 11:43:45 -0400
From: Mahmoud Ghoneim <mah.ghoneim@gmail.com>
Subject: [Moses-support] AMTA 2016 SeMaT Workshop - November 1, 2016
at Austin, TX
To: moses-support <moses-support@mit.edu>, mt-list@eamt.org,
grlmc@grlmc.com
Message-ID:
<CAK5zJ4+6dQzqBX8GBRVKG8vnARjUHRuc2UKaXzgqTDycNwTuBw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

[Apologies for multiple postings]
See you at Hilton Austin hotel on November 1st ..
AMTA 2016 Workshop
Semitic Machine Translation (SeMaT)
Collocated with EMNLP 2016
Hilton Austin hotel, Austin, Texas, USA
November 1st, 2016
http://semitic-mt.seas.gwu.edu

Background

Semitic languages are used by a significantly large population of native
speakers and belong to a family that includes classical Arabic, a large
number of Arabic dialects, Hebrew, Amharic, Maltese and other languages.
These languages are characterized by a system of word formation based on
roots and patterns, a rich and productive morphology (including
non-concatenative processes), a diversity of orthographic conventions and,
unfortunately, limited language resources suitable for computational
research and development.

Research on machine translation of Semitic languages is still in its early
stages. Accurate translation of Arabic, Hebrew, Amharic, Maltese and other
Semitic languages requires treatment of unique linguistic characteristics,
some of which are common to all Semitic languages, others specific to each
of these individual languages and their dialects.

The goal of this workshop is to bring together researchers and research
specifically concerned with issues pertaining to machine translation to,
from, and among Semitic languages. Furthermore, the workshop will be an
opportunity for the Special Interest Group on Computational Approaches to
Semitic Languages (the SIG) to meet and discuss future direction in
Computational Linguistics and Natural Language Processing approaches to
Semitic Languages.


Invited Speakers

*- Philipp Koehn,* Department of Computer Science, The Johns Hopkins
University
talking about "*Morphology in Low Resource Machine Translation*"

- *Marine Carpuat,* Computer Science, University of Maryland and UMIACS

talking about "*Domain and Other Data Divergences in Machine
Translation*"

- *Hassan Sajjad,* Arabic Language Technology Group, the Qatar Computing
Research Institute
talking about "*From Phrase-based to Neural Machine Translation*"



Program

November 1, 2016


*Opening Session*

09:00?09:15 *Welcome Remarks*

09:15?10:00 *From Phrase-based to Neural Machine Translation*

Keynote Talk by: *Hassan Sajjad*

10:00?10:30 *An Arabic-Hebrew parallel corpus of TED talks*

Mauro Cettolo


10:30?11:00 *Coffee Break*


*Session 2*

11:00?11:30 *Domain and Other Data Divergences in Machine Translation*

Keynote Talk by: *Marine Carpuat*

11:30?12:00 *Large-Scale Machine Translation between Arabic and Hebrew:
Available Corpora **and Initial Results*

Yonatan Belinkov and James Glass

12:00?12:30 *GISA: Giza++ Implementation over Spark by Apache*

John Cadigan and Yuval Marton


12:30?14:00 *Lunch*


*Session 3*

14:00?14:30 *Normalizing Mathematical Expressions to Improve the
Translation of Educational **Content*

Wajdi Zaghouani, Ahmed Abdelali, Francisco Guzm?n and
Hassan Sajjad

14:30?15:00 *Morphology in Low Resource Machine Translation*

Keynote Talk by: *Philipp Koehn*

15:00?15:15 *Closing Remarks*



Organizers

Mahmoud Ghoneim (The George Washington University)

Mona Diab (The George Washington University)

Houda Bouamor (Carnegie Mellon University Qatar)

Ahmed ElKholy (Microsoft USA)

Yuval Marton (Microsoft USA)


Program Committee

Alon Lavie, Carnegie Mellon University

Nizar Habash, New York University Abu Dhabi

Kamel Smaili, Laboratoire Lorrain de Recherche en Informatique et ses
Applications (LORIA)

Khaled Shaalan, The British University in Dubai (BUiD)

Francisco Guzm?n, Facebook

Hamdy Mubarak, Qatar Computing Research Institute (QCRI)

Michael Gasser, Indiana University

Fethi Bougares, Laboratoire d'Informatique de l'Universit? du Maine

Gorka Labaka, University of the Basque Country

Laurent Besacier, Laboratoire d?Informatique de Grenoble, ?quipe GETALP

Haithem Afli, DCU/CNGL laboratory, Dublin

Maryam Aminian, Columbia University


------
Thanks,
Mahmoud Ghoneim, PhD
Research Scientist
Computer Science Department
School of Engineering and Applied Science
The George Washington University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20161028/1be62ff7/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 120, Issue 31
**********************************************

0 Response to "Moses-support Digest, Vol 120, Issue 31"

Post a Comment