Moses-support Digest, Vol 104, Issue 45

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. CFP: Tweet Translation Workshop 2015 (TweetMT) (Cristina)
2. Re: Major bug found in Moses (Matthias Huck)
3. please help me with the code - getting word index (amir haghighi)


----------------------------------------------------------------------

Message: 1
Date: Thu, 18 Jun 2015 15:13:50 +0200
From: Cristina <cristinae@cs.upc.edu>
Subject: [Moses-support] CFP: Tweet Translation Workshop 2015
(TweetMT)
To: corpora@uib.no, mt-list <mt-list@eamt.org>, moses-support@mit.edu,
talp@talp.upc.edu
Message-ID:
<CAL0MP8hn0aBFDO_JM4019aUqhbD3j=KYwnXodO1jHLL-o9JHkg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Apologies for multiple postings

**************************************************************
Call for papers

TWEETMT 2015
Tweet Translation Workshop
http://komunitatea.elhuyar.org/tweetmt/

co-located with the 31st Conference of the Spanish Society
for Natural Language Processing (SEPLN 2015)

Submission deadline: July 21, 2015
Date: September 15, 2015
Location: Alicante, Spain

**************************************************************


The TweetMT workshop will deal with the linguistic processing of tweets and
short texts in tasks related to machine translation, cross-lingual and
multilingual processing. It will bring together researchers within the
broad scope of using natural language processing techniques for processing
tweets in different languages, with the aim of discussing tasks required
for and depending on the development of machine translation tools for short
texts.

Topics of interest include, but are not limited to:
* Machine translation approaches for short and informal texts.
* Development and annotation of Twitter corpora for machine translation.
* Evaluation of machine translation of tweets.
* Language identification of tweets.
* Normalization of user-generated content.
* Language-specific tweet collection techniques.
* Cross-lingual tweet clustering and classification.
* Cross-lingual tweet retrieval.
* Multilingual event summarization from tweets.
* Multilingual tweet sentiment analysis.

Corpora
------------

* Machine translation: the TweetMT corpus is available for interested
authors by contacting the organizers at tweetmt@elhuyar.com
* Language identification: the TweetLID corpus can be obtained at
http://komunitatea.elhuyar.org/tweetlid/resources/#Downloads
* Tweet normalization: the TweetNorm corpus is available at
http://komunitatea.elhuyar.org/tweet-norm/resources/#Downloads

Paper submission
---------------------------

TweetMT accepts two different types of submissions:

* Position papers will have a maximum of 2 pages, including references, and
will report early results.
* Short papers will have between 4 and 6 pages, excluding references, and
will report work in progress.

Submissions can be made through the following Easychair address:
https://easychair.org/conferences/?conf=tweetmt2015

The papers will be formatted following the SEPLN journal style (
http://www.sepln.org/home-2/revista/instrucciones-autor/), and will have a
length according to the type of contribution.

We aim to publish the proceedings of the workshop using the ceur-ws.org
repository, and have them indexed by DBLP.

Important dates
-----------------------

* July 21: Paper submission deadline
* July 28: Notification to authors
* August 10: Camera ready submission deadline
* September 15: Workshop

Contact: tweetmt@elhuyar.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150618/d10a2773/attachment-0001.htm

------------------------------

Message: 2
Date: Thu, 18 Jun 2015 15:58:26 +0100
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] Major bug found in Moses
To: Moses-support <moses-support@mit.edu>
Message-ID: <1434639506.30904.1154.camel@portedgar>
Content-Type: text/plain; charset="UTF-8"

Hi,

Not sure whether this was mentioned in the vast number of replies:

I'd like to stress that simple histogram pruning of the phrase table is
implemented in Moses and every other SMT system I'm aware of.
(We know better pruning techniques, though:
http://anthology.aclweb.org/D/D12/D12-1089.pdf )

If you deactivate all non-local features (like the LM, the lexicalized
reordering model, the distance-based jump cost), run monotonic decoding,
and apply the features and scaling factors known to the decoder for
pruning as well, then it shouldn't matter how much you prune. If you
keep at least the best translation option per distinct source side, the
decoder should always output the very same Viterbi path.

A simple toy example should be sufficient to verify that the decoder
implements the argmax operation.

We frequently run a couple of basic regression tests:
http://statmt.org/moses/cruise/
I'm pretty sure that we would have noticed quickly in case a major bug
was introduced just recently.

The decoder maximizes model score, not BLEU. Tuning is required to
achieve a correlation of model score with BLEU (or the quality metric of
your choice).

Cheers,
Matthias




On Thu, 2015-06-18 at 07:50 +0700, Tom Hoar wrote:
> Amittai, I understand your point about sounding "almost belligerently
> confrontational." I also admire Jame's passion and the Moses team's
> patience to walk through his logic. As a non-scientific reader, this is
> the most educational exchange I've seen on this list for years. I'm
> learning a lot. Thank you everyone.
>
> James, as a non-scientific reader, let me say that Hieu's head bashing
> to solve the same puzzle shows you're in good company. Yet, the Moses
> "system" is defined, designed and works with two functionally different
> pieces, i.e. the front-end and back-end. The front-end creates a (an
> often wild) array of candidate hypotheses -- by design. Why is this
> piece designed this way? Because the system design includes a back-end
> that selects a final choice from amongst the candidates. The two halves
> share a symbiotic relationship. Together, the pieces form a system with
> a balance that can only be achieved by working together. In this
> context, this is not a "bug" (major or minor) and the "system" is not
> broken.
>
> I submit, as others have suggested, that you have conceived and are
> working with a new and different "system" that consists of two different
> halves. Your front-end reduces table to a focused set. Your back-end
> works much like today's translation table to select from the focused
> set. Major advances sometimes come by challenging the status quo. We
> have seen evidence here of both the challenge and the status quo.
>
> So, although I can not "admit the system is broke," I encourage you to
> advance your new system without trying to fix one that's not broken.
>
> Tom
>
>
> > Date: Wed, 17 Jun 2015 15:48:14 +0000
> > From: "Read, James C"<jcread@essex.ac.uk>
> > Subject: Re: [Moses-support] Major bug found in Moses
> > To: Marcin Junczys-Dowmunt<junczys@amu.edu.pl>
> > Cc:"moses-support@mit.edu" <moses-support@mit.edu>, "Arnold, Doug"<doug@essex.ac.uk>
> > Message-ID:<DB3PR06MB0713ADF9AF14EE5D93EC5BC485A60@DB3PR06MB0713.eurprd06.prod.outlook.com>
> > Content-Type: text/plain; charset="iso-8859-2"
> >
> > 1) So if I've understood you correctly you are saying we have a system that is purposefully designed to perform poorly with a disabled LM and this is the proof that the LM is the most fundamental part. Any attempt to prove otherwise by, e.g. filtering the phrase table to help the disfunctional search algorithm, does not constitute proof that the TM is the most fundamental component of the system and if designed correctly can perform just fine on its own but rather only evidence that the researcher is not using the system as intended (the intention being to break the TM to support the idea that the LM is the most fundamental part).
> >
> > 2) If you still feel that the LM is the most fundamental component I challenge you to disable the TM and perform LM only translations and see what kind of BLEU scores you get.
> >
> > In conclusion, I do hope that you don't feel that potential investors in MT systems lack the intelligence to see through these logical fallacies. Can we now just admit that the system is broke and get around to fixing it?
> >
> > James
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 3
Date: Thu, 18 Jun 2015 20:23:24 +0430
From: amir haghighi <amir.haghighi.64@gmail.com>
Subject: [Moses-support] please help me with the code - getting word
index
To: moses-support <moses-support@mit.edu>
Message-ID:
<CA+UVbEjaPbZUjoyxAujB6BiZNCOOhaq2JEXt9GLdvKf=PrZS7A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi everybody


I wrote the following code to get an ordered list from the source words
inside a hypothesis. It gets the words in their translation order, but I
need not only the words' strings, but also the index of each word in the
original sentence.

could you please help me how to get the index of each word in srcPhrase, in
the sentence?


void Amir::GetSourcePhrase2(const ChartHypothesis& cur_hypo,Phrase
&srcPhrase) const
{
AmirUtils utility;
TargetPhrase targetPh=cur_hypo.GetCurrTargetPhrase();
const Phrase *sourcePh=targetPh.GetRuleSource();
int targetWordsNum=cur_hypo.GetCurrTargetPhrase().GetSize();
std::vector <Word> source, orderedSource;
std::vector <int> alignmentVector;
std::vector <bool> isAligned;

std::vector <std::set <size_t> > sourcePosSets;

for(int targetP=0; targetP< targetWordsNum; targetP++ ){
//std::cerr<<"setting alignments for targetword: "<<targetP<<endl;

sourcePosSets.push_back(cur_hypo.GetCurrTargetPhrase().GetAlignTerm().GetAlignmentsForTarget(targetP));
}


for(int ii=targetWordsNum-1; ii>=0; ii--){
std::set <size_t> cur_srcPosSet=sourcePosSets[ii];
for (std::set <size_t>::const_iterator alignmet =
cur_srcPosSet.begin();alignmet != cur_srcPosSet.end(); ++alignmet) {
int alignmentElement=*alignmet;
for(int index=0; index<ii; index++ ){ //keep the rightmost one and
remove the othres
//remove it from the list
if(sourcePosSets[index].size()>0){
// std::cerr<<" removing "<<*alignmet<<endl;
//std::cerr<<" for set with size:
"<<sourcePosSets[index].size()<<endl;
sourcePosSets[index].erase(alignmentElement);
}

}
}
}

for (size_t posT = 0; posT < cur_hypo.GetCurrTargetPhrase().GetSize();
++posT) {
const Word &word = cur_hypo.GetCurrTargetPhrase().GetWord(posT);
if (word.IsNonTerminal()){
// non-term. fill out with prev hypo

size_t nonTermInd =
cur_hypo.GetCurrTargetPhrase().GetAlignNonTerm().GetNonTermIndexMap()[posT];
const ChartHypothesis *prevHypo = cur_hypo.GetPrevHypo(nonTermInd);

GetSourcePhrase2(*prevHypo,srcPhrase);
}
else{

for(std::set<size_t>::const_iterator
it=sourcePosSets[posT].begin();it != sourcePosSets[posT].end() ; it++
){
srcPhrase.AddWord(sourcePh->GetWord(*it));
}
}
}


}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150618/64f524d7/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 104, Issue 45
**********************************************

0 Response to "Moses-support Digest, Vol 104, Issue 45"

Post a Comment