Moses-support Digest, Vol 105, Issue 48

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Support for XML Markup with Confusion Network Input
(James H. Cross III)
2. Re: BLEU result on baseline EMS experiment (Vincent Nguyen)
3. DEADLINE EXTENSION: Tweet Translation Workshop 2015 (TweetMT)
(Cristina)

----------------------------------------------------------------------

Message: 1
Date: Wed, 22 Jul 2015 15:29:16 -0700
From: "James H. Cross III" <james.henry.cross.iii@gmail.com>
Subject: Re: [Moses-support] Support for XML Markup with Confusion
Network Input
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support@mit.edu
Message-ID:
<CACdKcAGHRj5HXQ6TCPjsKgFMS9EwK--DPr+TJoBsU4kvhROb5w@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

I will do that. Thanks, Hieu!

On Wed, Jul 22, 2015 at 1:46 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
> it should be threadsafe these days. if you find out otherwise, i'll do my
> best to fix it
>
> Don't use IRSTLM or the old binary phrase-table.
>
>
> On 23/07/2015 00:26, James H. Cross III wrote:
>>
>> Hi:
>>
>> I also noticed that decoding with lattice input is not known to be
>> thread safe ( http://www.statmt.org/moses/?n=Moses.Optimize ). Does
>> this concern also extend to confusion network input?
>>
>> Thanks again,
>> James
>>
>>
>> On Wed, Jul 22, 2015 at 12:22 PM, James H. Cross III
>> <james.henry.cross.iii@gmail.com> wrote:
>>>
>>> Definitely at least placeholders and constraints on reordering.
>>>
>>> On Wed, Jul 22, 2015 at 11:46 AM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>>>>
>>>> sounds ok. the xml markup is used for a number of things (forced
>>>> translation, placeholders, constraint on reordering). Which
>>>> functionality do
>>>> you want to implement?
>>>>
>>>> On 22/07/2015 22:20, James H. Cross III wrote:
>>>>>
>>>>> Hi Hieu:
>>>>>
>>>>> Thanks for the response! I would like to look into adding this
>>>>> functionality myself.
>>>>>
>>>>> After a first pass, it looks like a good starting point would be
>>>>> adding functionality for interpreting XML (e.g., where an input line
>>>>> could contain a single XML tag rather than a word column) to the
>>>>> ConfusionNet class, and then adding functionality to enforce the
>>>>> decoding constraints to the TranslationOptionCollectionConfusionNet
>>>>> class. Let me know if this impression seems correct to you.
>>>>>
>>>>> Any other advice or caveats regarding this undertaking would also be
>>>>> much appreciated!
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>> On Wed, Jul 22, 2015 at 5:22 AM, Hieu Hoang <hieuhoang@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> i guess lack of interest. XML markup is usually used by more
>>>>>> application-focused users who don't usually use complicated things
>>>>>> like
>>>>>> confusion networks, and confusion networks are used mainly by
>>>>>> researchers
>>>>>> who don't use xml markups
>>>>>>
>>>>>>
>>>>>> On 18/07/2015 00:35, James H. Cross III wrote:
>>>>>>>
>>>>>>> Hi:
>>>>>>>
>>>>>>> Is it still the case that XML markup is not supported for confusion
>>>>>>> network (or lattice) input? If not, are the reasons for not
>>>>>>> supporting
>>>>>>> this feature because implementing it imposes particular difficulties
>>>>>>> or simply due to lack of interest?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> James
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> Moses-support@mit.edu
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>> --
>>>>>> Hieu Hoang
>>>>>> Researcher
>>>>>> New York University, Abu Dhabi
>>>>>> http://www.hoang.co.uk/hieu
>>>>>>
>>>> --
>>>> Hieu Hoang
>>>> Researcher
>>>> New York University, Abu Dhabi
>>>> http://www.hoang.co.uk/hieu
>>>>
>
> --
> Hieu Hoang
> Researcher
> New York University, Abu Dhabi
> http://www.hoang.co.uk/hieu
>

------------------------------

Message: 2
Date: Thu, 23 Jul 2015 09:17:52 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] BLEU result on baseline EMS experiment
To: moses-support@mit.edu
Message-ID: <55B09520.8030803@neuf.fr>
Content-Type: text/plain; charset="windows-1252"

my bad the europarl corpus was commented out in the config.basic
I need to re run it.

Le 22/07/2015 15:23, Vincent Nguyen a ?crit :
>
> shouldn't the Belu score be more in the 50's for a test set close to
> the corpus ?
> I meant by "real text" that I have a corpus of translations (fr to
> eng) made by translators, typically the kind of text I would like to
> test with Moses.
>
> so my question is : should I use these texts to 1) train or 2) tune my
> model ?
>
> also in terms of language model, can we make it evolve with new texts
> to make it better in time ?
>
>
>
>
>
> Le 22/07/2015 14:28, Hieu Hoang a ?crit :
>> it looks ok, your bleu score is 22.68 for this test set.
>>
>> I don't know what you mean by real text.
>>
>> Hieu Hoang
>> Researcher
>> New York University, Abu Dhabi
>> http://www.hoang.co.uk/hieu
>>
>> On 21 July 2015 at 23:45, Vincent Nguyen <vnguyen@neuf.fr
>> <mailto:vnguyen@neuf.fr>> wrote:
>>
>> here is what I got
>>
>> make sense ?
>>
>>
>> MT evaluation scorer began on 2015 Jul 20 at 23:27:39
>> command line:
>> /home/moses/mosesdecoder/scripts/generic/mteval-v13a.pl
>> <http://mteval-v13a.pl> -c
>> -c -s /home/moses/working/data/dev/newstest2011-src.fr.sgm -r
>> /home/moses/working/data/dev/newstest2011-ref.en.sgm -t
>> /home/moses/working/evaluation/newstest2011.detokenized.sgm.3
>> Evaluation of any-to-en translation using:
>> src set "newstest2011" (110 docs, 3003 segs)
>> ref set "newstest2011" (1 refs)
>> tst set "newstest2011" (1 systems)
>>
>> length ratio: 0.994844739625875 (74296/74681), penalty (log):
>> -0.00518197480348868
>> NIST score = 6.8964 BLEU score = 0.2268 for system "Edinburgh"
>>
>> #
>> ------------------------------------------------------------------------
>>
>> Individual N-gram scoring
>> 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram
>> 8-gram 9-gram
>> ------ ------ ------ ------ ------ ------ ------
>> ------ ------
>> NIST: 5.2752 1.3399 0.2499 0.0273 0.0041 0.0005 0.0000
>> 0.0000 0.0000 "Edinburgh"
>>
>> BLEU: 0.5883 0.2887 0.1636 0.0972 0.0589 0.0364 0.0230
>> 0.0146 0.0093 "Edinburgh"
>>
>> #
>> ------------------------------------------------------------------------
>> Cumulative N-gram scoring
>> 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram
>> 8-gram 9-gram
>> ------ ------ ------ ------ ------ ------ ------
>> ------ ------
>> NIST: 5.2752 6.6151 6.8650 6.8923 6.8964 6.8969 6.8970
>> 6.8970 6.8970 "Edinburgh"
>>
>> BLEU: 0.5853 0.4100 0.3013 0.2268 0.1730 0.1333 0.1037
>> 0.0811 0.0637 "Edinburgh"
>> MT evaluation scorer ended on 2015 Jul 20 at 23:28:01
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150723/b0550960/attachment-0001.htm

------------------------------

Message: 3
Date: Thu, 23 Jul 2015 10:43:32 +0200
From: Cristina <cristinae@cs.upc.edu>
Subject: [Moses-support] DEADLINE EXTENSION: Tweet Translation
Workshop 2015 (TweetMT)
To: corpora@uib.no, mt-list <mt-list@eamt.org>, moses-support@mit.edu
Message-ID:
<CAL0MP8jb6AVG-1=CrDNQ6G8kQVss8_aJqBTomXgSZv6DSviCnQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

We had some requests for extending the deadline, so if you are working with
anything related to tweets, your contribution is welcomed till the 28th of
July!

**************************************************************
> Deadline Extension and Final Call for Papers
>
> TWEETMT 2015
> Tweet Translation Workshop
> http://komunitatea.elhuyar.org/tweetmt/
>
> co-located with the 31st Conference of the Spanish Society
> for Natural Language Processing (SEPLN 2015)
>
> Submission deadline: July 21, 2015
> Date: September 15, 2015
> Location: Alicante, Spain
>
> **************************************************************
>
>
> The TweetMT workshop will deal with the linguistic processing of tweets
> and short texts in tasks related to machine translation, cross-lingual and
> multilingual processing. It will bring together researchers within the
> broad scope of using natural language processing techniques for processing
> tweets in different languages, with the aim of discussing tasks required
> for and depending on the development of machine translation tools for short
> texts.
>
> Topics of interest include, but are not limited to:
> * Machine translation approaches for short and informal texts.
> * Development and annotation of Twitter corpora for machine translation.
> * Evaluation of machine translation of tweets.
> * Language identification of tweets.
> * Normalization of user-generated content.
> * Language-specific tweet collection techniques.
> * Cross-lingual tweet clustering and classification.
> * Cross-lingual tweet retrieval.
> * Multilingual event summarization from tweets.
> * Multilingual tweet sentiment analysis.
>
> Corpora
> ------------
>
> * Machine translation: the TweetMT corpus is available for interested
> authors by contacting the organizers at tweetmt@elhuyar.com
> * Language identification: the TweetLID corpus can be obtained at
> http://komunitatea.elhuyar.org/tweetlid/resources/#Downloads
> * Tweet normalization: the TweetNorm corpus is available at
> http://komunitatea.elhuyar.org/tweet-norm/resources/#Downloads
>
> Paper submission
> ---------------------------
>
> TweetMT accepts two different types of submissions:
>
> * Position papers will have a maximum of 2 pages, including references,
> and will report early results.
> * Short papers will have between 4 and 6 pages, excluding references, and
> will report work in progress.
>
> Submissions can be made through the following Easychair address:
> https://easychair.org/conferences/?conf=tweetmt2015
>
> The papers will be formatted following the SEPLN journal style (
> http://www.sepln.org/home-2/revista/instrucciones-autor/), and will have
> a length according to the type of contribution.
>
> We aim to publish the proceedings of the workshop using the ceur-ws.org
> repository, and have them indexed by DBLP.
>
> Important dates
> -----------------------
>
> * July 28: Paper submission deadline (NEW!)
> * August 5: Notification to authors
> * August 15: Camera ready submission deadline
> * September 15: Workshop
>
> Contact: tweetmt@elhuyar.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150723/3b844c4d/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 105, Issue 48
**********************************************

Moses-support Digest, Vol 105, Issue 48

0 Response to "Moses-support Digest, Vol 105, Issue 48"

Post a Comment