Moses-support Digest, Vol 93, Issue 31

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Phrase extraction with --IncludeSentenceId messes up
phrase table counts (Marcin Junczys-Dowmunt)
2. 2014 EAMT Best Thesis Award (Mikel Forcada)
3. 2014 EAMT call for proposals and internships (Mikel Forcada)
4. Deadline extension for SSST-8, 8th Workshop on Syntax,
Semantics and Structure in Statistical Translation (EMNLP 2014)
(Carpuat, Marine)

----------------------------------------------------------------------

Message: 1
Date: Wed, 23 Jul 2014 18:57:53 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Phrase extraction with
--IncludeSentenceId messes up phrase table counts
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>, Philipp Koehn
<pkoehn@inf.ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <53CFE991.1010304@amu.edu.pl>
Content-Type: text/plain; charset=UTF-8; format=flowed

Oh. Good! I guess there is a lesson to be learned somewhere.
Thanks.

W dniu 23.07.2014 18:06, Barry Haddow pisze:
> Hi Marcin
>
> It appears that there is an --IgnoreSentenceId argument already, added
> by Maria during last year's MTM
>
>> [gna]bhaddow: git blame ScoreFeature.cpp | grep Ignore
>> bff12363 (maria nadejde 2013-09-13 12:45:46 +0200 42) if (args[i] ==
>> "--IgnoreSentenceId") {
>
> cheers - Barry
>
> On 23/07/14 16:56, Marcin Junczys-Dowmunt wrote:
>> So, adding "--IgnoreSentenceId" to "score" might fix that without
>> messing up your stuff? I guess I can do that if you can't be
>> bothered, Hieu.
>>
>> W dniu 23.07.2014 17:53, Philipp Koehn pisze:
>>> Hi,
>>>
>>> this is how extract is called:
>>> extract corpus.en corpus.fr <http://corpus.fr> align extract 5
>>> --IncludeSentenceId
>>>
>>> this is how score is called:
>>> score extract lex.f2e phrase-table.half --GoodTuring
>>> --DomainIndicator domains.5
>>>
>>> phrase table looks fine to me
>>>
>>> -phi
>>>
>>>
>>> On Wed, Jul 23, 2014 at 11:42 AM, Marcin Junczys-Dowmunt
>>> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>> wrote:
>>>
>>> In a corpus sorted with sentences sorted by release date this
>>> could actually make sense :)
>>>
>>> W dniu 23.07.2014 17:40, Barry Haddow pisze:
>>>
>>> Because calculating translation probabilities from sentence
>>> ids is unexpectedly beneficial?
>>>
>>> On 23/07/14 16:34, Marcin Junczys-Dowmunt wrote:
>>>
>>>
>>> So, how come this is not damaging the Edinburgh system?
>>>
>>> W dniu 23.07.2014 17:32, Hieu Hoang pisze:
>>>
>>> ah ok.
>>>
>>> I thought it was just for debugging. I'm not gonna
>>> change it since it's gonna involve months of debugging.
>>>
>>> Ideally, the extract format should be fixed like the
>>> phrase-table, with the last column being key-value
>>> pairs. Also, way the key-value pairs are processed
>>> should be automatic like in the decoder.
>>>
>>> marcin - sorry mate. you're on your own
>>>
>>> On 23/07/14 16:20, Philipp Koehn wrote:
>>>
>>> Hi,
>>>
>>> the sentence ID is being used for the domain
>>> indicator features.
>>>
>>> If you run phrase-extract's score with specifying
>>> a domain file,
>>> it then it uses the sentence IDs to find out
>>> which domain the
>>> phrase pair was found in.
>>>
>>> This is a standard features in Edinburgh's
>>> phrase-based system
>>> for the last 1-2 years, so if you want to make
>>> changes, make
>>> sure that this functionality still works (see
>>> [1381-5] for an example
>>> with extract* files still in place).
>>>
>>> -phi
>>>
>>>
>>> On Wed, Jul 23, 2014 at 7:15 AM, Marcin
>>> Junczys-Dowmunt <junczys@amu.edu.pl
>>> <mailto:junczys@amu.edu.pl>
>>> <mailto:junczys@amu.edu.pl
>>> <mailto:junczys@amu.edu.pl>>> wrote:
>>>
>>> Key-value format would actually be fine.
>>>
>>> W dniu 23.07.2014 13:12, Marcin
>>> Junczys-Dowmunt pisze:
>>>
>>> I was planning to use it for a custom
>>> feature function later.
>>>
>>> W dniu 23.07.2014 13:11, Hieu Hoang pisze:
>>>
>>> i can change it so that the sentence
>>> id is put into a
>>> key-value field in the last column.
>>>
>>> what is the sentence id used for? is
>>> it just for debugging
>>> purposes?
>>>
>>>
>>> On 23 July 2014 11:36, Marcin
>>> Junczys-Dowmunt
>>> <junczys@amu.edu.pl
>>> <mailto:junczys@amu.edu.pl>
>>> <mailto:junczys@amu.edu.pl
>>> <mailto:junczys@amu.edu.pl>>> wrote:
>>>
>>> Hi,
>>> I am using train-model.perl with
>>>
>>> --extract-options="--IncludeSentenceId"
>>>
>>> and it seems that the sentence id
>>> is somehow getting into
>>> the phrase
>>> table as a count and later used
>>> for phrase translation weight
>>> calculation, for instance the
>>> extract (last column is the Id):
>>>
>>> #c the compound or process ||| #c
>>> verbindung oder
>>> verfahren ||| 0-0 2-1
>>> 3-2 4-3 ||| 1374618
>>> #c the compound or process ||| #c
>>> verbindung oder
>>> verfahren ||| 0-0 2-1
>>> 3-2 4-3 ||| 1374619
>>> #c the compound or process ||| #c
>>> verbindung oder
>>> verfahren ||| 0-0 2-1
>>> 3-2 4-3 ||| 1374620
>>> #c the compound or process ||| #c
>>> verbindung oder
>>> verfahren ||| 0-0 2-1
>>> 3-2 4-3 ||| 1374621
>>> #c the compound or process ||| #c
>>> verbindung oder
>>> verfahren ||| 0-0 2-1
>>> 3-2 4-3 ||| 1374622
>>> #c the compound or process ||| #c
>>> verbindung oder
>>> verfahren ||| 0-0 2-1
>>> 3-2 4-3 ||| 4587318
>>>
>>> results in a phrase table entry
>>> like this:
>>>
>>> #c the compound or process ||| #c
>>> verbindung oder
>>> verfahren ||| 1
>>> 0.0100206 5.23542e-07 0.524577
>>> ||| 0-0 2-1 3-2 4-3 ||| 6
>>> 1.14604e+07 6
>>> ||| |||
>>>
>>> The count is equal to the sum of
>>> sentence ids, which of
>>> course make the
>>> phrase probability useless.
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> <mailto:Moses-support@mit.edu>
>>> <mailto:Moses-support@mit.edu
>>> <mailto:Moses-support@mit.edu>>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>> -- Hieu Hoang
>>> Research Associate
>>> University of Edinburgh
>>> http://www.hoang.co.uk/hieu
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> <mailto:Moses-support@mit.edu>
>>> <mailto:Moses-support@mit.edu
>>> <mailto:Moses-support@mit.edu>>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> <mailto:Moses-support@mit.edu>
>>> <mailto:Moses-support@mit.edu
>>> <mailto:Moses-support@mit.edu>>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>>
>>
>
>

------------------------------

Message: 2
Date: Wed, 23 Jul 2014 19:37:28 +0200
From: Mikel Forcada <mlf@dlsi.ua.es>
Subject: [Moses-support] 2014 EAMT Best Thesis Award
To: moses-support@mit.edu
Message-ID: <53CFF2D8.90000@dlsi.ua.es>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dear Moses Support list members:

the European Association for Machine Translation (EAMT) has published
the call for candidacies to the 2014 EAMT Best Thesis Award. For
details, please visit the following URL:

http://www.eamt.org/news/news_best_thesis2014.php

Best regards,

Mikel L. Forcada
EAMT Secretary

--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes Inform?tics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326

------------------------------

Message: 3
Date: Wed, 23 Jul 2014 19:40:46 +0200
From: Mikel Forcada <mlf@dlsi.ua.es>
Subject: [Moses-support] 2014 EAMT call for proposals and internships
To: moses-support@mit.edu
Message-ID: <53CFF39E.9070905@dlsi.ua.es>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dear list members:

the European Association for Machine Translation (EAMT) has published
the 2014 call for proposals and the 2015 call for student internships.
For details, please visit the following URLs:

http://www.eamt.org/news/news_call_for_proposals2014.php
http://www.eamt.org/news/news_summer_internships_2015.php

Best regards,

Mikel L. Forcada
EAMT Secretary

--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes Inform?tics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326

------------------------------

Message: 4
Date: Wed, 23 Jul 2014 13:56:57 -0400
From: "Carpuat, Marine" <Marine.Carpuat@cnrc-nrc.gc.ca>
Subject: [Moses-support] Deadline extension for SSST-8, 8th Workshop
on Syntax, Semantics and Structure in Statistical Translation (EMNLP
2014)
To: "corpora@uib.no" <corpora@uib.no>, "mt-list@eamt.org"
<mt-list@eamt.org>, "ln@cines.fr" <ln@cines.fr>,
"moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<D7548FA9B5763F408F5EB57EE28383621B282E0772@NRCCENMB1.nrc.ca>
Content-Type: text/plain; charset="iso-8859-1"

Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8)
EMNLP 2014 / SIGMT / SIGLEX Workshop
Oct 2014, Doha, Qatar
http://www.cse.ust.hk/~dekai/ssst/

*** New submission deadline for papers and abstracts: August 1st, 2014 ***
*** Special theme: Compositional Distributional Semantics and Machine Translation ***

The Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8) seeks to bring together a large number of researchers working on diverse aspects of structure, semantics and representation in relation to statistical machine translation. Since its first edition in 2006, its program each year has comprised high-quality papers discussing current work spanning topics including: new grammatical models of translation; new learning methods for syntax- and semantics-based models; formal properties of synchronous/transduction grammars (hereafter S/TGs); discriminative training of models incorporating linguistic features; using S/TGs for semantics and generation; and syntax- and semantics-based evaluation of machine translation.

We invite two types of submissions this year:

1. Extended abstracts for poster or hands-on presentations on the special theme
2. Full papers spanning all areas of interest for SSST

===========================
Special Theme Extended Abstracts
===========================

This year, the special theme of semantics of the past three editions of SSST takes a new step with a "working workshop" bringing together researchers interested in compositional distributional semantics, distributed representations, and continuous vector space models in MT, with tutorials bridging both directions, as well as discussions and hands-on work on relevant tasks with real data. Such models have proven beneficial for a number of NLP tasks, for example phrasal similarity, lexical entailment, modeling semantic deviance, detecting order restrictions in recursive structures, or improving NP bracketing in parsing. However, they have not received as much attention in MT.

Extended abstracts of at most two (2) pages should describe poster or hands-on presentations that will stimulate discussions on the special theme of compositional distributional semantics and machine translation, including position papers, recent work, pilot studies, negative results. We encourage the presentation of relevant work that has been published or submitted elsewhere, as well as new work in progress.

=========
Full Papers
=========

The need for structural mappings between languages is widely recognized in the fields of statistical machine translation and spoken language translation, and there is now wide consensus that these mappings are appropriately represented using a family of formalisms that includes synchronous/transduction grammars and similar notational equivalents. To date, flat-structured models, such as the word-based IBM models of the early 1990s or the more recent phrase-based models, remain widely used. But tree-structured mappings arguably offer a much greater potential for learning valid generalizations about relationships between languages.

Within this area of research there is a rich diversity of approaches. There is active research ranging from formal properties of S/TGs to large-scale end-to-end systems. There are approaches that make heavy use of linguistic theory, and approaches that use little or none. There is theoretical work characterizing the expressiveness and complexity of particular formalisms, as well as empirical work assessing their modeling accuracy and descriptive adequacy across various language pairs. There is work being done to invent better translation models, and work to design better algorithms. Recent years have seen significant progress on all these fronts. In particular, systems based on these formalisms are now top contenders in MT evaluations.

At the same time, SMT has seen a movement toward semantics over the past few years, which has been reflected at recent SSST workshops, including the last three editions which had semantics for SMT as a special theme. The issues of deep syntax and shallow semantics are closely linked and SSST-8 continues to encourage submissions on semantics for MT in a number of directions, including semantic role labeling, sense disambiguation, and compositional distributional semantics for translation and evaluation.

We invite papers on:
syntax-based / semantics-based / tree-structured SMT
machine learning techniques for inducing structured translation models
algorithms for training, decoding, and scoring with semantic representation structure
empirical studies on adequacy and efficiency of formalisms
creation and usefulness of syntactic/semantic resources for MT
formal properties of synchronous/transduction grammars
learning semantic information from monolingual, parallel or comparable corpora
unsupervised and semi-supervised word sense induction and disambiguation methods for MT
lexical substitution, word sense induction and disambiguation, semantic role labeling, textual entailment, paraphrase and other semantic tasks for MT
semantic features for MT models (word alignment, translation lexicons, language models, etc.)
evaluation of syntactic/semantic components within MT (task-based evaluation)
scalability of structured translation methods to small or large data
applications of S/TGs to related areas including:
speech translation
formal semantics and semantic parsing
paraphrases and textual entailment
information retrieval and extraction
syntactically- and semantically-motivated evaluation of MT
compositional distributional semantics in MT
distributed representations and continuous vector space models in MT

=========
Organizers
=========
Dekai WU, Hong Kong University of Science and Technology (HKUST)
Marine CARPUAT, National Research Council (NRC) Canada
Xavier CARRERAS, Universitat Polit?cnica de Catalunya (UPC)
Eva Maria VECCHI, Cambridge University

=============
Important Dates
=============

Submission deadline for papers and extended abstracts: 1 Aug 2014
Notification to authors: 26 Aug 2014
Camera copy deadline: 15 Sep 2014

For more information
http://www.cse.ust.hk/~dekai/ssst/

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 93, Issue 31
*********************************************

Moses-support Digest, Vol 93, Issue 31

0 Response to "Moses-support Digest, Vol 93, Issue 31"

Post a Comment