Moses-support Digest, Vol 86, Issue 32

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. EAMT 2014: First Call for Papers (Philipp Koehn)
2. Re: using Moses in Monolingual dialogue setting (Read, James C)
3. Re: word alignment viewer (Hieu Hoang)

----------------------------------------------------------------------

Message: 1
Date: Tue, 10 Dec 2013 00:17:21 +0000
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: [Moses-support] EAMT 2014: First Call for Papers
To: "corpora@uib.no" <CORPORA@uib.no>, "moses-support@mit.edu"
<moses-support@mit.edu>, mt-list@eamt.org
Message-ID:
<CAAFADDA8HRgVg681NjvLYAN0T9oZiPsO=O27WNKrkyqVC6jnAw@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

EAMT 2014 Call for Papers

The European Association for Machine Translation (EAMT) invites
everyone interested in machine translation and translation-related
tools and resources to participate in this conference -- developers,
researchers, users (including professional translators and
translation/localisation managers): anyone who has a stake in the
vision of an information world in which language issues become less
visible to the information consumer. We especially invite researchers
to describe the state of the art and demonstrate their cutting-edge
results and avid MT users to share their experiences.

We expect to receive manuscripts in these three categories:

===========================================================================

(R) Research papers: Long-paper submissions (8 pages) are invited for
reports of significant research results in any aspect of machine
translation and related areas. Such reports should include a
substantial evaluation component. Contributions are welcome on all
topics in the area of Machine Translation or translation-related
technologies, including:

MT methodologies and techniques
Speech translation: speech to text, speech to speech
Translation aids (translation memory, terminology databases, etc.)
Translation environments (workflow, support tools, conversion
tools for lexica, etc.)
Practical MT systems (MT for professionals, MT for multilingual
eCommerce, MT for localization, etc.)
MT in multilingual public service (eGovernment etc.)
MT for the web
MT embedded in other services
MT evaluation techniques and evaluation results
Dictionaries and lexica for MT
Text and speech corpora for MT
Standards in text and lexicon encoding for MT
Human factors in MT and user interfaces
Related multilingual technologies (natural language generation,
information retrieval, text categorization, text summarization,
information extraction, etc.)

Papers should describe original work. They should emphasize completed
work rather than intended work, and should indicate clearly the state
of completion of the reported results. Where appropriate, concrete
evaluation results should be included.

===========================================================================

(U) User studies: Short-paper submissions (2-4 pages) are invited for
reports on users' experiences with MT, be it in small or medium size
business (SMB), enterprise, government, or NGOs. Contributions are
welcome on:

Integrating MT and computer-assisted translation into a
translation production workflow (e.g. transforming terminology
glossaries into MT resources, optimizing TM/MT thresholds, mixing
online and offline tools, using interactive MT, dealing with MT
confidence scores)
Use of MT to improve translation or localization workflows (e.g.
reducing turnaround times, improving translation consistency,
increasing the scope of globalization projects)
Managing change when implementing and using MT (e.g. switching
between multiple MT systems, limiting degradations when updating or
upgrading an MT system)
Implementing open-source MT in the SMB or enterprise (e.g.
strategies to get support, reports on taking pilot results into full
deployment, examples of advance customisation sought and obtained
thanks to the open-source paradigm)
Evaluation of MT in a real-world setting (e.g. error detection
strategies employed, metrics used, productivity or translation quality
gains achieved)
Post-editing strategies and tools (e.g. limitations of traditional
translation quality assurance tools, challenges associated with
post-editing guidelines)
Legal issues associated with MT, especially MT in the cloud (e.g.
copyright, privacy)
Use of MT in social networking or real-time communication (e.g.
enterprise support chat)
Use of MT to process multilingual content for assimilation
purposes (e.g. cross-lingual information retrieval, MT for
e-discovery, MT for spam detection)
Use of standards for MT

Papers should highlight problems and solutions and not merely describe
MT integration process or project settings. Where solutions do not
seem to exist, suggestions for MT researchers and developers should be
clearly emphasized. For user papers produced by academics, we require
co-authorship with the actual users.

(P) Project/Product description: Abstract submissions (1 page) are
invited to report new, interesting:

Tools for machine translation, computer aided translation, and the
like (including commercial products and open source software). The
authors should be ready to present the tools in the form of demos or
posters during the conference.
Research projects related to machine translation. The authors
should be ready to present the projects in the form of posters during
the conference. This follows on from the successful 'project villages'
held at the last two EAMT conferences.

The important dates are:

* Paper submission: March 14, 2014
* Notification to authors: April 18, 2014
* Camera-ready deadline: May 9, 2014
* Conference: June 16-18, 2014

Johann Roturier and Philipp Koehn
on behalf of the EAMT 2014 Organising Committee
http://eamt2014.ffzg.hr/

------------------------------

Message: 2
Date: Tue, 10 Dec 2013 10:03:52 +0000
From: "Read, James C" <jcread@essex.ac.uk>
Subject: Re: [Moses-support] using Moses in Monolingual dialogue
setting
To: Andrew <ravenyj@hotmail.com>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID:
<F00840E41983C645928E21E3C35F4EB1012CF8058B@mbx1-node2.essex.ac.uk>
Content-Type: text/plain; charset="iso-8859-1"

Yep, you've hit the nail right on the head. This is why I said my main concern would be the inordinate amounts of training data you would need to get something useful up and running.

When translating sentences from one language to another there can be a lot of variance but there can also be a lot of consistency at some level and so it possible to identify a limited number of patterns. The domain you are trying to train seems to me to be so much more open to variance that I would expect you would need much larger training sets and/or much more intelligent learning algorithms to be able to extract useful generalisations.

Of course, I could be wrong. The only way to tell would be to suck it and see. We would need to set up some kind of empirical pipeline to train and test the system with varying amounts and types of data to see how it performs. I'm not sure how we would test such a system.

I guess a quick approximation of performance of your translation model would be to see how highly the output sentences score on a well trained language model. This would give you an idea of how fluent the utterances generated are but would give you no idea of how appropriate a user would rate the responses. I guess you could use one of the bag of metrics to measure the distance of output sentences from responses in a test corpus. Again, I'm not sure how good a predictor of user judgements this would be.

I suppose you could measure the average time a user is willing to chat with your bot to get an idea of how well it's performing. But if the output is particularly bad then some users may keep chatting with the bot just for the comical value.

Have you got a system running yet? Could you show us some sample output?

James

________________________________
From: Andrew [ravenyj@hotmail.com]
Sent: 09 December 2013 21:46
To: Read, James C; moses-support@mit.edu
Subject: RE: [Moses-support] using Moses in Monolingual dialogue setting

Thanx for the insights.

I've already done approach 2, and the result didn't seem bad to me,
so I became curious if it would've made significant difference had I chosen the first approach.
I was worried that approach 2 might've resulted in over-training, but judging from your comments, I guess it's only a matter of having broader entries. (or could it have been over-trained?)

> I suppose my main concern would be the inordinate amounts of training data you would need to get something useful up and running.

This leads me to my next question.
I trained my system with about 650k pairs of stimulus-response collected from Twitter.
Each pair is part of a conversation which consists of 3~10 utterances.
For example, suppose we have a conversation that has 4 utterances labeled A,B,C,D where A is the "root"of the conversation, and B is the response to A, C is the response to B, and D is the response to C.
Following my second approach, A and B, B and C, C and D are pairs, so source file will contain A,B,C and target file will contain B,C,D, making 3 pairs from 1 conversation. In this way, I have 650k pairs from about 80k conversations.

I've seen that when you use Moses for actual translation task, say German to English, the amount of training data seems pretty low, somewhere around 50K. So my 650k is already much bigger than this. However, in the paper that I mentioned http://aritter.github.io/mt_chat.pdf the author used about 1.3M pairs, which is twice bigger than mine, and I've seen research in similar setting http://www.aclweb.org/anthology/P13-1095 which used 4M pairs.(!)

So, given the unpredictable nature of monolingual conversation setting, what would you think is the appropriate, or minimum amount of training data? And how much would the quality of the response-generation task depend on the amount of training data?

I know this is out-of-nowhere question which may be hard to answer, but even a rough guess would great me assist me. Thank you very much in advance.

> From: jcread@essex.ac.uk
> To: kgimpel@cs.cmu.edu
> Date: Mon, 9 Dec 2013 17:33:00 +0000
> CC: moses-support@mit.edu
> Subject: Re: [Moses-support] using Moses in Monolingual dialogue setting
>
> I guess if you were to change the subject and ask a question from a list of well formed common questions if the probability of the response is below some sensible threshold then you could make a system which fools a user some of the time.
>
> James
>
> ________________________________________
> From: moses-support-bounces@mit.edu [moses-support-bounces@mit.edu] on behalf of Read, James C [jcread@essex.ac.uk]
> Sent: 09 December 2013 17:14
> To: Kevin Gimpel
> Cc: moses-support@mit.edu
> Subject: Re: [Moses-support] using Moses in Monolingual dialogue setting
>
> I'm guessing he wants to make a conversational agent that produces a most likely response based on the stimulus.
>
> In any case, the distinction between 1 and 2 is probably redundant if GIZA++ is being used to train in both directions. The two phrase tables could be merged I guess. I guess the advantage of 2 over 1 is that you don't need to worry about the merging logic at the cost of more training time.
>
> I'm not sure I understand the question of A1~B3. Unless I'm reading his question wrong I don't see how this could happen.
>
> I suppose my main concern would be the inordinate amounts of training data you would need to get something useful up and running.
>
> James
>
> ________________________________
> From: kgimpel@gmail.com [kgimpel@gmail.com] on behalf of Kevin Gimpel [kgimpel@cs.cmu.edu]
> Sent: 09 December 2013 15:17
> To: Read, James C
> Cc: Andrew; moses-support@mit.edu
> Subject: Re: [Moses-support] using Moses in Monolingual dialogue setting
>
> Hi Andrew, it's an interesting idea.. I would guess that it would depend on what the data look like. If the A's and B's are of fundamentally different type (e.g., they are utterances in an automatic dialogue system, where A's are always questions and B's are always responses), then approach 2 seems a bit odd as it will conflate A's and B's utterances. However, if the A's and B's are just part of a conversation, e.g., in IM chats, then they are of the same "type" and approach 2 would make sense. In fact, I think approach 2 would make more sense than approach 1 in that case. It also of course depends on how you want to use the resulting translation system.
> Kevin
>
>
>
> On Mon, Dec 9, 2013 at 5:18 AM, Read, James C <jcread@essex.ac.uk<mailto:jcread@essex.ac.uk>> wrote:
> Are you trying to figure out the probability of a response given a stimulus?
>
> Given that GIZA++ aligns words and makes heavy use of co-occurrence statistics I doubt this is likely to produce very fruitful results. How big is your data set?
>
> Give it a whirl and see what happens. I would be interested to hear what comes of it?
>
> James
>
> ________________________________
> From: moses-support-bounces@mit.edu<mailto:moses-support-bounces@mit.edu> [moses-support-bounces@mit.edu<mailto:moses-support-bounces@mit.edu>] on behalf of Andrew [ravenyj@hotmail.com<mailto:ravenyj@hotmail.com>]
> Sent: 08 December 2013 20:10
> To: moses-support@mit.edu<mailto:moses-support@mit.edu>
> Subject: [Moses-support] using Moses in Monolingual dialogue setting
>
>
> Hi,
>
>
> I'm using Moses in monolingual dialogue setting as in http://aritter.github.io/mt_chat.pdf,
> where source and target are both in English and target is a response to source.
> I'd like to propose a little thought experiment in this setting, and hear what you think would happen.
>
>
> Suppose we have a conversation with six utterances, A1,B1,A2,B2,A3,B3 where A and B indicate speakers,
> and the number indicates n-th statement by the speaker. They are all in one conversation of continuous topic.
>
> Now suppose we train it using Moses in two different ways as following:
> 1) Source file contains A1, A2, A3 and target contains B1, B2, B3 so that A1-B1 is a pair and so on.
> 2) Source contains A1,B1,A2,B2,A3 and target contains B1,A2,B2,A3,B3, taking advantage of the fact that response is a stimulus to the next response.
>
> Then, How will the results be different and why?
> Since GIZA++ gets alignment in both directions, will 2) result in any of A1~B3 being the translation of any other?
>
>
> This may be a strange question, but I would really like to get your insight.
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu<mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

Message: 3
Date: Tue, 10 Dec 2013 10:46:07 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] word alignment viewer
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbjAyqexikh0HgoxEGqPRf37uem9FHJFFt0qf25YvhO+BA@mail.gmail.com>
Content-Type: text/plain; charset="windows-1252"

Thanks everyone for all your suggestions. I've found 2 programs which were
complimentary and perfect for my needs:
1. Picaro by Jason Riesa. Displays the alignments as a matrix on the
command line. Now included in Moses
https://github.com/moses-smt/mosesdecoder/tree/master/contrib/picaro
2. Q****. Java gui, display parallel sentences in 2 rows with links
between the words. Not yet officially downloadable.

Most of the others seem to be for doing manual alignment, I'm just looking
for a visualiser. Tried Cairo, it had compile problems (easy to fix) and
didn't seem to run properly even when fixed.

On 9 December 2013 18:28, Jason Riesa <jason.riesa@gmail.com> wrote:

> Philipp, thanks. I sent Hieu the code you are referring to; ISI recently
> took my site offline, since I have moved to Google. I haven't had time to
> put something else up yet. Amin, if you're interested, I can also send to
> you.
>
> Best,
> Jason
>
>
> On Mon, Dec 9, 2013 at 9:24 AM, Philipp Koehn <pkoehn@inf.ed.ac.uk> wrote:
>
>> Hi,
>>
>> Jason Riesa has a nice command line word alignment visualization tool
>> http://nlg.isi.edu/demos/picaro/
>> but the download site is not available anymore.
>>
>> -phi
>>
>>
>> On Mon, Dec 9, 2013 at 5:10 PM, Amin Farajian <ma.farajian@gmail.com>wrote:
>>
>>> Dear Hieu,
>>>
>>> For this task we recently modified the tool implemented by chris
>>> callison-burch, which you can find the original code here:
>>> http://cs.jhu.edu/~ccb/interface-word-alignment.html
>>>
>>> The modified version of the code reads the source, target and word
>>> alignment information from the input files and enables the user to modify
>>> the alignment points.
>>>
>>> I've tried different tools, but found this one easy to use and very
>>> helpful.
>>> If you are interested, let me know to share the code with you.
>>>
>>> Bests,
>>> Amin
>>>
>>> PS. Here is the screen-shot of the tool:
>>>
>>>
>>>
>>>
>>> On 12/09/2013 05:37 PM, Matthias Huck wrote:
>>>
>>> It's called "Cairo":
>>>
>>> Cairo: An Alignment Visualization Tool. Noah A. Smith and Michael E.
>>> Jahr. In Proceedings of the Language Resources and Evaluation Conference
>>> (LREC 2000), pages 549?552, Athens, Greece, May/June 2000.http://www.cs.cmu.edu/~nasmith/papers/smith+jahr.lrec00.pdf
>>> http://old-site.clsp.jhu.edu/ws99/projects/mt/toolkit/cairo.tar.gz
>>>
>>> Never tried that one, though. The code seems to be kind of prehistoric.
>>>
>>>
>>> On Mon, 2013-12-09 at 11:15 -0500, Lane Schwartz wrote:
>>>
>>> I don't have a copy, but I believe that there was a tool called Chiro
>>> or Cairo that does this, that I'm told helped provide the Egypt theme
>>> to the Egypt-themed JHU summer workshop on machine translation.
>>>
>>> On Mon, Dec 9, 2013 at 10:25 AM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> <Hieu.Hoang@ed.ac.uk> wrote:
>>>
>>> does anyone have a nice GUI word alignment viewer they can share? ie. given
>>> the source, target, alignment files, display each parallel sentence with a
>>> link between the aligned words.
>>>
>>> No webapp or complicated install procedure would be best
>>>
>>> --
>>> Hieu Hoang
>>> Research Associate
>>> University of Edinburghhttp://www.hoang.co.uk/hieu
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131210/b67f0968/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 86, Issue 32
*********************************************

Moses-support Digest, Vol 86, Issue 32

0 Response to "Moses-support Digest, Vol 86, Issue 32"

Post a Comment