Moses-support Digest, Vol 108, Issue 77

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. sentiment composition (Saif Mohammad)
2. Correct form of using Mira (Davood Mohammadifar)

----------------------------------------------------------------------

Message: 1
Date: Wed, 28 Oct 2015 14:34:02 -0400
From: Saif Mohammad <uvgotsaif@gmail.com>
Subject: [Moses-support] sentiment composition
To: saifm.nrc@gmail.com
Message-ID:
<CALu_-ORSio0hKcOFyTHZOKQwTTR57Zy3nCBeRKTJ2AChmfQSqA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

Those working on meaning composition might find one of the datasets in
SemEval-2016 task 7 interesting. This dataset, the English Twitter Mixed
Polarity Set, focuses on phrases made up of opposite polarity terms. For
example, phrases such as 'lazy sundays', 'best winter break', 'happy
accident', and 'couldn't stop smiling'. Observe that 'lazy' is associated
with negative sentiment whereas 'sundays' is associated with positive
sentiment. Automatic systems have to determine the degree of association of
the whole phrase with positive sentiment.

The test set also includes single word terms (as separate entries). These
terms are chosen from the set of words that are part of the multi-word
phrases. For example, terms such as 'lazy', 'sundays', 'best', 'winter',
and so on. This allows the evaluation to determine how good the automatic
systems are at determining sentiment association of individual words as
well as how good they are at determining sentiment of phrases formed by
their combinations.

The multi-word phrases and single-word terms are drawn from a corpus of
tweets, and may include a small number of hashtag words and creatively
spelled words. However, a majority of the terms are those that one would
use in everyday English.

A development set of terms labeled with intensity scores has also be
released. Task webpage: http://alt.qcri.org/semeval2016/task7/

Cheers.
-Saif

On 30 September 2015 at 14:18, Saif Mohammad <uvgotsaif@gmail.com> wrote:

>
> First Call For Participation
>
> Determining Sentiment Intensity of English and Arabic Phrases
> SemEval 2016 - Task 7
>
> http://alt.qcri.org/semeval2016/task7/
>
> The objective of the task is to test an automatic system?s ability to
> predict a sentiment intensity (aka evaluativeness and sentiment
> association) score for a word or a phrase. Phrases include negators,
> modals, intensifiers, and diminishers ?-- categories known to be
> challenging for sentiment analysis. Specifically, the participants will be
> given a list of terms (single words and multi?word phrases) and be asked to
> provide a score between 0 and 1 that is indicative of the term?s strength
> of association with positive sentiment. A score of 1 indicates maximum
> association with positive sentiment (or least association with negative
> sentiment) and a score of 0 indicates least association with positive
> sentiment (or maximum association with negative sentiment). If a term is
> more positive than another, then it should have a higher score than the
> other.
>
> We introduced this task as part of the SemEval?-2015 Task 10 Sentiment
> Analysis in Twitter, Subtask E (Rosenthal et al., 2015), where the target
> terms were taken from Twitter. In SemEval?-2016, we broaden the scope of
> the task and include three different domains: general English, English
> Twitter, and Arabic Twitter. The Twitter domain differs significantly from
> the general English domain; it includes hashtags, that are often a
> composition of several words (e.g., #f?eelingood)?, misspellings,
> shortings, slang, etc.
>
>
> SUBTASKS
>
> We will have three subtasks, one for each of the three domains:
>
> -- General English Sentiment Modifiers Set: This test set has phrases
> formed by combining a word and a modifier, where a modifier is a negator,
> an auxilary verb, a degree adverb, or a combination of those. For example,
> 'would be very easy', 'did not harm', and 'would have been nice'. (See
> development data for more examples.) The test set also includes single word
> terms (as separate entries). These terms are chosen from the set of words
> that are part of the multi-word phrases. For example, 'easy', 'harm', and
> 'nice'. The terms in the test set will have the same form as the terms in
> the development set, but can involve different words and modifiers.
>
> -- English Twitter Mixed Polarity Set: This test set focuses on phrases
> made up of opposite polarity terms. For example, phrases such as 'lazy
> sundays', 'best winter break', 'happy accident', and 'couldn't stop
> smiling'. Observe that 'lazy' is associated with negative sentiment whereas
> 'sundays' is associated with positive sentiment. Automatic systems have to
> determine the degree of association of the whole phrase with positive
> sentiment. The test set also includes single word terms (as separate
> entries). These terms are chosen from the set of words that are part of the
> multi-word phrases. For example, terms such as 'lazy', 'sundays', 'best',
> 'winter', and so on. This allows the evaluation to determine how good the
> automatic systems are at determining sentiment association of individual
> words as well as how good they are at determining sentiment of phrases
> formed by their combinations. The multi-word phrases and single-word terms
> are drawn from a corpus of tweets, and may include a small number of
> hashtag words and creatively spelled words. However, a majority of the
> terms are those that one would use in everyday English.
>
> -- Arabic Twitter Set: This test set includes single words and phrases
> commonly found in Arabic tweets. The phrases in this set are formed only by
> combining a negator and a word. See development data for examples.
> In each subtask the target terms are chosen from the corresponding domain.
> We will provide a development set and a test set for each domain. No
> separate training data will be provided. The development sets will be large
> enough to be used for tuning or even for training. The test sets and the
> development sets will have no terms in common. The participants are free to
> use any additional manually or automatically generated resources; however,
> we will require that all resources be clearly identified in the submission
> files and in the system description paper.
>
> All of these terms are manually annotated to obtain their strength of
> association scores. We use CrowdFlower to crowdsource the annotations. We
> use the MaxDiff method of annotation. Kiritchenko et al. (2014) showed that
> even though annotators might disagree about answers to individual
> questions, the aggregated scores produced with MaxDiff and the
> corresponding term ranking are consistent. We verified this by randomly
> selecting ten groups of five answers to each question and comparing the
> scores and rankings obtained from these groups of annotations. On average,
> the scores of the terms from the data we have previously annotated
> (SemEval?-2015 Subtask E Twitter data and SemEval?-2016 general English
> terms) differed only by 0.02-?0.04 per term, and the Spearman rank
> correlation coefficient between two sets of rankings was 0.97-0.98.
>
>
> EVALUATION
>
> The participants can submit results for any one, two, or all three
> subtasks. We will provide separate test files for each subtask. The test
> file will contain a list of terms from the corresponding domain. The
> participating systems are expected to assign a sentiment intensity score to
> each term. The order of the terms in the submissions can be arbitrary.
>
> System ratings for terms in each subtask will be evaluated by first
> ranking the terms according to sentiment score and then comparing this
> ranked list to a ranked list obtained from human annotations. Kendall's Tau
> (Kendall, 1938) will be used as the metric to compare the ranked lists. We
> will provide scores for Spearman's Rank Correlation as well, but
> participating teams will be ranked by Kendall's Tau.
> We have released an evaluation script so that participants can:
> -- make sure their output is in the right format;
> -- track the progress of their system's performance on the development
> data.
>
>
> IMPORTANT DATES
>
> -- Training data ready: September 4, 2015
> -- Test data ready: Dec 15, 2015
> -- Evaluation start: January 10, 2016
> -- Evaluation end: January 31, 2016
> -- Paper submission due: February 28, 2016
> -- Paper reviews due: March 31, 2016
> -- Camera ready due: April 30, 2016
> -- SemEval workshop: Summer 2016
>
>
> BACKGROUND AND MOTIVATION
>
> Many of the top performing sentiment analysis systems in recent SemEval
> competitions (2013 Task 2, 2014 Task 4, and 2014 Task 9) rely on
> automatically generated sentiment lexicons. Sentiment lexicons are lists of
> words (and phrases) with prior associations to positive and negative
> sentiments. Some lexicons can additionally provide a sentiment score for a
> term to indicate its strength of evaluative intensity. Higher scores
> indicate greater intensity. Existing manually created sentiment lexicons
> tend to only have discrete labels for terms (positive, negative, neutral)
> but no real-?valued scores indicating the intensity of sentiment. Here for
> the first time we manually create a dataset of words and phrases with
> real-?valued scores of intensity. The goal of this task is to evaluate
> automatic methods for determining sentiment scores of words and phrases.
> Many of the phrases in the test set will include negators (such as 'no' and
> 'doesn?t'), modals (such as 'could' and 'may be'), and intensifiers and
> diminishers (such as 'very' and 'slightly'). This task will enable
> researchers to examine methods for estimating how each of these word
> categories impact intensity of sentiment.
>
>
> ORGANIZERS
>
> -- Svetlana Kiritchenko, National Research Council Canada
> -- Saif M. Mohammad, National Research Council Canada
> -- Mohammad Salameh, University of Alberta
>
> --
> Saif Mohammad
> Research Officer
> Information and Communications Technologies Portfolio
> National Research Council Canada
> http://www.saifmohammad.com
>

--
Saif M. Mohammad
Senior Research Officer
National Research Council Canada
http://www.saifmohammad.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151028/9b506aaa/attachment-0001.html

------------------------------

Message: 2
Date: Wed, 28 Oct 2015 21:57:08 +0000
From: Davood Mohammadifar <davood_mf@hotmail.com>
Subject: [Moses-support] Correct form of using Mira
To: Moses Support <moses-support@mit.edu>
Message-ID: <SNT150-W21777E822268EC35FE8E368C210@phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"

Hello everyone

because of variations in BLEU score when using normal mert, i decided to use mira instead. Moses manual (updated on 28 October 2015) says me to use this command:

$MOSES_SCRIPTS/training/mert-moses.pl work/dev.fr work/dev.en $MOSES_BIN/moses work/model/moses.ini --mertdir $MOSES_BIN --rootdir $MOSES_SCRIPTS --batch-mira --return-best-dev --batch-mira-args '-J 300' --decoder-flags '-threads 8 -v 0

but this command is not work for me. When i execute the command, i just see some options for it and nothing happens. So i wanted to change the command. Based on usual mert, i changed the command to this:

$MOSES_SCRIPTS/training/mert-moses.pl /home/mohammadifar/corpus/tune.true.fa /home/mohammadifar/corpus/tune.true.en $MOSES_BIN/moses /home/mohammadifar/First/train/model/moses.ini --mertdir $MOSES_BIN--rootdir $MOSES_SCRIPTS --batch-mira --return-best-dev --batch-mira-args="-J 300" --decoder-flags="-threads all"

the difference of two command is in the end. The latter works for me very good. BLEU variations in test-set are very slight (many times <0.1 and rarely about 0.2 in 3 times running the whole of translation commands for same dataset). So i want to be sure, Is the form of using mira correct? (Moses v3.0)

Regards
Davood

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151028/b05a3fa7/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 108, Issue 77
**********************************************

Moses-support Digest, Vol 108, Issue 77

0 Response to "Moses-support Digest, Vol 108, Issue 77"

Post a Comment