Moses-support Digest, Vol 86, Issue 31

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Help Needed in Developing Machine Translation System
(Philipp Koehn)
2. Re: word alignment viewer (Jason Riesa)
3. Increasing context scope during training (R?dolfs Mazurs)
4. Re: using Moses in Monolingual dialogue setting (Andrew)

----------------------------------------------------------------------

Message: 1
Date: Mon, 9 Dec 2013 17:59:36 +0000
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] Help Needed in Developing Machine
Translation System
To: "Asad A.Malik" <asad_12204@yahoo.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDD+voMGff0PipCrTbte5EXpG-oOmjh6kv3mZVE3SCJ60Q@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

we have fairly extensive tutorials on using Moses, especially
http://www.statmt.org/moses/?n=Moses.Tutorial

You first need to install the software
http://www.statmt.org/moses/?n=Development.GetStarted

-phi

On Mon, Dec 9, 2013 at 4:22 PM, Asad A.Malik <asad_12204@yahoo.com> wrote:
> Hi All,
> The reason I am emailing is that I want help and guidance in developing
> Machine Translation system.
>
> I am currently enrolled in MS. And for my semester project I have to develop
> one of the Machine Translation system. I have already taken time in
> submitting the proposal just because I in a situation that I don't know how
> and from where should I start. I have studied literature of the basic
> Machine Translation systems (Rule Based, Statistical and Example Based) and
> wanted to develop the system using these mentioned techniques.
>
> Now the problem is that I don't know any thing about the tools via which I
> will have to develop it. I've uses MOSES a little bit but can't understand
> on it.Also I don't know what resources will I be required in the
> development. Some times I think that the development of Rule Based will be
> easy and some times think that statistical will be easy. But actual I want
> (any one of three mentioned techniques) that is easy to develop and don't
> require much of human effort.
>
> Kindly help me in selecting the technique and show me the right path. I will
> really appreciate your help.
>
> Regards
>
> Asad A.Malik
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

------------------------------

Message: 2
Date: Mon, 9 Dec 2013 10:28:52 -0800
From: Jason Riesa <jason.riesa@gmail.com>
Subject: Re: [Moses-support] word alignment viewer
To: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CADg7yObi+h1XaPrseUFyZq45DzBjO-NQrTVHH1G8ah8Yes8S2w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Philipp, thanks. I sent Hieu the code you are referring to; ISI recently
took my site offline, since I have moved to Google. I haven't had time to
put something else up yet. Amin, if you're interested, I can also send to
you.

Best,
Jason

On Mon, Dec 9, 2013 at 9:24 AM, Philipp Koehn <pkoehn@inf.ed.ac.uk> wrote:

> Hi,
>
> Jason Riesa has a nice command line word alignment visualization tool
> http://nlg.isi.edu/demos/picaro/
> but the download site is not available anymore.
>
> -phi
>
>
> On Mon, Dec 9, 2013 at 5:10 PM, Amin Farajian <ma.farajian@gmail.com>wrote:
>
>> Dear Hieu,
>>
>> For this task we recently modified the tool implemented by chris
>> callison-burch, which you can find the original code here:
>> http://cs.jhu.edu/~ccb/interface-word-alignment.html
>>
>> The modified version of the code reads the source, target and word
>> alignment information from the input files and enables the user to modify
>> the alignment points.
>>
>> I've tried different tools, but found this one easy to use and very
>> helpful.
>> If you are interested, let me know to share the code with you.
>>
>> Bests,
>> Amin
>>
>> PS. Here is the screen-shot of the tool:
>>
>>
>>
>>
>> On 12/09/2013 05:37 PM, Matthias Huck wrote:
>>
>> It's called "Cairo":
>>
>> Cairo: An Alignment Visualization Tool. Noah A. Smith and Michael E.
>> Jahr. In Proceedings of the Language Resources and Evaluation Conference
>> (LREC 2000), pages 549?552, Athens, Greece, May/June 2000.http://www.cs.cmu.edu/~nasmith/papers/smith+jahr.lrec00.pdf
>> http://old-site.clsp.jhu.edu/ws99/projects/mt/toolkit/cairo.tar.gz
>>
>> Never tried that one, though. The code seems to be kind of prehistoric.
>>
>>
>> On Mon, 2013-12-09 at 11:15 -0500, Lane Schwartz wrote:
>>
>> I don't have a copy, but I believe that there was a tool called Chiro
>> or Cairo that does this, that I'm told helped provide the Egypt theme
>> to the Egypt-themed JHU summer workshop on machine translation.
>>
>> On Mon, Dec 9, 2013 at 10:25 AM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> <Hieu.Hoang@ed.ac.uk> wrote:
>>
>> does anyone have a nice GUI word alignment viewer they can share? ie. given
>> the source, target, alignment files, display each parallel sentence with a
>> link between the aligned words.
>>
>> No webapp or complicated install procedure would be best
>>
>> --
>> Hieu Hoang
>> Research Associate
>> University of Edinburghhttp://www.hoang.co.uk/hieu
>>
>>
>> _______________________________________________
>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131209/62621eb7/attachment-0001.htm

------------------------------

Message: 3
Date: Mon, 09 Dec 2013 23:21:20 +0200
From: R?dolfs Mazurs <rudolfs.mazurs@gmail.com>
Subject: [Moses-support] Increasing context scope during training
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <1386624080.7943.14.camel@steps>
Content-Type: text/plain; charset="UTF-8"

Hi all,

I am looking to improve quality of translation on my limited corpus.
During training process I noticed that ngrams only go up to 3. Is there
a way to increase the upper limit on ngram count? And is there a chance
it would improve results of translations?

--
R?dolfs Mazurs <rudolfs.mazurs@gmail.com>

------------------------------

Message: 4
Date: Tue, 10 Dec 2013 06:46:31 +0900
From: Andrew <ravenyj@hotmail.com>
Subject: Re: [Moses-support] using Moses in Monolingual dialogue
setting
To: "Read, James C" <jcread@essex.ac.uk>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID: <BLU171-W31831FD93643765C2449BDB2D30@phx.gbl>
Content-Type: text/plain; charset="iso-2022-jp"

Thanx for the insights.
I've already done approach 2, and the result didn't seem bad to me,so I became curious if it would've made significant difference had I chosen the first approach.I was worried that approach 2 might've resulted in over-training, but judging from your comments, I guess it's only a matter of having broader entries. (or could it have been over-trained?)
> I suppose my main concern would be the inordinate amounts of training data you would need to get something useful up and running.
This leads me to my next question.I trained my system with about 650k pairs of stimulus-response collected from Twitter.Each pair is part of a conversation which consists of 3~10 utterances.For example, suppose we have a conversation that has 4 utterances labeled A,B,C,D where A is the "root"of the conversation, and B is the response to A, C is the response to B, and D is the response to C.Following my second approach, A and B, B and C, C and D are pairs, so source file will contain A,B,C and target file will contain B,C,D, making 3 pairs from 1 conversation. In this way, I have 650k pairs from about 80k conversations.
I've seen that when you use Moses for actual translation task, say German to English, the amount of training data seems pretty low, somewhere around 50K. So my 650k is already much bigger than this. However, in the paper that I mentioned http://aritter.github.io/mt_chat.pdf the author used about 1.3M pairs, which is twice bigger than mine, and I've seen research in similar setting http://www.aclweb.org/anthology/P13-1095 which used 4M pairs.(!)
So, given the unpredictable nature of monolingual conversation setting, what would you think is the appropriate, or minimum amount of training data? And how much would the quality of the response-generation task depend on the amount of training data?
I know this is out-of-nowhere question which may be hard to answer, but even a rough guess would great me assist me. Thank you very much in advance.

> From: jcread@essex.ac.uk
> To: kgimpel@cs.cmu.edu
> Date: Mon, 9 Dec 2013 17:33:00 +0000
> CC: moses-support@mit.edu
> Subject: Re: [Moses-support] using Moses in Monolingual dialogue setting
>
> I guess if you were to change the subject and ask a question from a list of well formed common questions if the probability of the response is below some sensible threshold then you could make a system which fools a user some of the time.
>
> James
>
> ________________________________________
> From: moses-support-bounces@mit.edu [moses-support-bounces@mit.edu] on behalf of Read, James C [jcread@essex.ac.uk]
> Sent: 09 December 2013 17:14
> To: Kevin Gimpel
> Cc: moses-support@mit.edu
> Subject: Re: [Moses-support] using Moses in Monolingual dialogue setting
>
> I'm guessing he wants to make a conversational agent that produces a most likely response based on the stimulus.
>
> In any case, the distinction between 1 and 2 is probably redundant if GIZA++ is being used to train in both directions. The two phrase tables could be merged I guess. I guess the advantage of 2 over 1 is that you don't need to worry about the merging logic at the cost of more training time.
>
> I'm not sure I understand the question of A1~B3. Unless I'm reading his question wrong I don't see how this could happen.
>
> I suppose my main concern would be the inordinate amounts of training data you would need to get something useful up and running.
>
> James
>
> ________________________________
> From: kgimpel@gmail.com [kgimpel@gmail.com] on behalf of Kevin Gimpel [kgimpel@cs.cmu.edu]
> Sent: 09 December 2013 15:17
> To: Read, James C
> Cc: Andrew; moses-support@mit.edu
> Subject: Re: [Moses-support] using Moses in Monolingual dialogue setting
>
> Hi Andrew, it's an interesting idea.. I would guess that it would depend on what the data look like. If the A's and B's are of fundamentally different type (e.g., they are utterances in an automatic dialogue system, where A's are always questions and B's are always responses), then approach 2 seems a bit odd as it will conflate A's and B's utterances. However, if the A's and B's are just part of a conversation, e.g., in IM chats, then they are of the same "type" and approach 2 would make sense. In fact, I think approach 2 would make more sense than approach 1 in that case. It also of course depends on how you want to use the resulting translation system.
> Kevin
>
>
>
> On Mon, Dec 9, 2013 at 5:18 AM, Read, James C <jcread@essex.ac.uk<mailto:jcread@essex.ac.uk>> wrote:
> Are you trying to figure out the probability of a response given a stimulus?
>
> Given that GIZA++ aligns words and makes heavy use of co-occurrence statistics I doubt this is likely to produce very fruitful results. How big is your data set?
>
> Give it a whirl and see what happens. I would be interested to hear what comes of it?
>
> James
>
> ________________________________
> From: moses-support-bounces@mit.edu<mailto:moses-support-bounces@mit.edu> [moses-support-bounces@mit.edu<mailto:moses-support-bounces@mit.edu>] on behalf of Andrew [ravenyj@hotmail.com<mailto:ravenyj@hotmail.com>]
> Sent: 08 December 2013 20:10
> To: moses-support@mit.edu<mailto:moses-support@mit.edu>
> Subject: [Moses-support] using Moses in Monolingual dialogue setting
>
>
> Hi,
>
>
> I'm using Moses in monolingual dialogue setting as in http://aritter.github.io/mt_chat.pdf,
> where source and target are both in English and target is a response to source.
> I'd like to propose a little thought experiment in this setting, and hear what you think would happen.
>
>
> Suppose we have a conversation with six utterances, A1,B1,A2,B2,A3,B3 where A and B indicate speakers,
> and the number indicates n-th statement by the speaker. They are all in one conversation of continuous topic.
>
> Now suppose we train it using Moses in two different ways as following:
> 1) Source file contains A1, A2, A3 and target contains B1, B2, B3 so that A1-B1 is a pair and so on.
> 2) Source contains A1,B1,A2,B2,A3 and target contains B1,A2,B2,A3,B3, taking advantage of the fact that response is a stimulus to the next response.
>
> Then, How will the results be different and why?
> Since GIZA++ gets alignment in both directions, will 2) result in any of A1~B3 being the translation of any other?
>
>
> This may be a strange question, but I would really like to get your insight.
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu<mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131210/dc77f94b/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 86, Issue 31
*********************************************

Moses-support Digest, Vol 86, Issue 31

0 Response to "Moses-support Digest, Vol 86, Issue 31"

Post a Comment