Moses-support Digest, Vol 108, Issue 1

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Error on lmplz (Kenneth Heafield)
2. Job Opening: Post-Doctoral Researcher in Personalised
Information Retrieval (S?amus Lawless)
3. First Call For Participation in SemEval 2016 Task 7:
Determining Sentiment Intensity of English and Arabic Phrases
(Saif Mohammad)


----------------------------------------------------------------------

Message: 1
Date: Wed, 30 Sep 2015 17:41:32 +0100
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Error on lmplz
To: moses-support@mit.edu
Message-ID: <560C10BC.10402@kheafield.com>
Content-Type: text/plain; charset=windows-1252

That's bad. Would you mind sending me privately a minimal example of
the data that reproduces the problem?

Kenneth

On 09/30/2015 04:29 PM, Alex Martinez wrote:
> Hello,
> today I've pulled moses code and recompiled and some experiments (EMS)
> that were already working are failing on the LM training step with the
> following error:
>
> Executing: /opt/moses/bin/lmplz --text
> /home/alexmc/devel/toydata/process/lm/nc=pos.factored.1 --order 5 --arpa
> /home/alexmc/devel/toydata/process/lm/nc=pos.lm.1 --discount_fallback
> === 1/5 Counting and sorting n-grams ===
> Reading /mnt/a62/devel/toydata/process/lm/nc=pos.factored.1
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> tcmalloc: large alloc 4753956864 bytes == 0x1f7c000 @
> tcmalloc: large alloc 22185107456 bytes == 0x11d536000 @
> ****************************************************************************************************
> Unigram tokens 2433135 types 47
> === 2/5 Calculating and sorting adjusted counts ===
> Chain sizes: 1:564 2:2630656000 3:4932480000 4:7891967488 5:11509120000
> tcmalloc: large alloc 11509121024 bytes == 0x1f7c000 @
> tcmalloc: large alloc 2630656000 bytes == 0x2aff70000 @
> tcmalloc: large alloc 4932485120 bytes == 0x34cc3a000 @
> tcmalloc: large alloc 7891968000 bytes == 0x64933c000 @
> lmplz: ./util/fixed_array.hh:104: T&
> util::FixedArray<T>::operator[](std::size_t) [with T =
> lm::NGramStream<lm::builder::BuildingPayload>; std::size_t = long
> unsigned int]: Assertion `i < size()' failed.
>
> I'm runing a Linux server with Ubuntu 15.04
>
> Any help will be appreciated
>
> Alex Mart?nez
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

Message: 2
Date: Thu, 1 Oct 2015 11:14:05 +0100
From: S?amus Lawless <seamus.lawless@scss.tcd.ie>
Subject: [Moses-support] Job Opening: Post-Doctoral Researcher in
Personalised Information Retrieval
To: Seamus Lawless <seamus.lawless@scss.tcd.ie>
Message-ID: <2A864E07-EF27-4F13-ADE3-5AFF6C7C89E0@scss.tcd.ie>
Content-Type: text/plain; charset="utf-8"

Post-Doctoral Researcher in Personalised Information Retrieval

36 Month Fixed Term Contract, Full-Time
Salary: Appointment will be made on the SFI Team Member Budget Level 2A salary scale at a point in line with Government Pay Policy
Closing date: Thursday 21st October, 2015.

The ADAPT Centre is Ireland?s global centre of excellence for digital content and media innovation. It combines the expertise of researchers at four universities (Trinity College Dublin, Dublin City University, University College Dublin, and Dublin Institute of Technology) with that of its industry partners to produce ground-breaking digital content innovations.

Background & Role
The ADAPT Centre, the centre for digital content technology seeks to appoint a postdoctoral researcher in Personalised Information Retrieval.
ADAPT is Ireland?s global centre of excellence for digital content and media innovation. Led by TCD, it combines the expertise of researchers at four universities (Trinity College Dublin, Dublin City University,University College Dublin, and Dublin Institute of Technology) with that of its industry partners to produce ground-breaking digital content innovations.
ADAPT brings together more than 120 researchers who collectively have won more than ?100m in funding and have a strong track record of transferring world-leading research and innovations to more than 140 companies. With ?50M in new research funding from Science Foundation Ireland and industry, ADAPT is seeking talented individuals to join its growing research team. Our research and technologies will continue to help businesses in all sectors and drive back the frontiers of future Web engagement.

Principle Duties and Responsibilities
The successful candidate will work within a large group of Postdoctoral Researchers, PhD students and Software Developers. The successful candidate will lead research in the specialist area of Personalised Information Retrieval. The tasks to be performed as part of this position will include:
? Applying novel algorithms to the personalisation of search.
? Developing innovative approaches to continuous user modelling over multiple search sessions.
? Evaluating and deploying new personalised information retrieval approaches in authentic scenarios.
? Providing support and advice to PhD students
? Contributing to journal and conference publications.

Approximately 20% of this researcher?s time will be allocated to short term, focused projects with ADAPT Industry Partners.

Funding Information
The position is funded through the Science Foundation Ireland (SFI) ADAPT Research Centre.

Qualifications
Candidates appointed to these posts must have a PhD in Computer Science or a related discipline (essential).

Knowledge & Experience (Essential & Desirable)
? Essential
? Strong expertise in foundational approaches to Information Retrieval (IR)
? Good programming skills and experience with state-of-the-art IR systems
? An excellent track record of publications in the area of IR
? Solid understanding of experimental design and statistics
? Excellent communication and collaboration skills
? Desirable
? Strong expertise in foundational approaches to Information Retrieval (IR)
? Good programming skills and experience with state-of-the-art IR systems
? An excellent track record of publications in the area of IR
? Solid understanding of experimental design and statistics
? Excellent communication and collaboration skills

Skills & Competencies
? Good written and oral proficiency in English (essential).
? Good communication and interpersonal skills both written and verbal.
? Proven aptitude for Programming, System Analysis and Design.
? Proven ability to prioritise workload and work to exacting deadlines.
? Proven track record of publication in high-quality venues.
? Flexible and adaptable in responding to stakeholder needs.
? Experience in releasing code to live production environments.
? Strong team player who is able to take responsibility to contribute to the overall success of the team.
? Enthusiastic and structured approach to research and development.
? Excellent problem solving abilities.
? Desire to learn about new technologies and keep abreast of new product and technical and research developments.

Candidates will be assessed on the following competencies:
Discipline knowledge and Research skills ? Demonstrates knowledge of a research discipline and the ability to conduct a specific programme of research within that discipline.
Understanding the Research Environment ? Demonstrates an awareness of the research environment (for example funding bodies) and the ability to contribute to grant applications.
Communicating Research ? Demonstrates the ability to communicate their research with their peers and the wider research community (for example presenting at conferences and publishing research in relevant journals) and the potential to teach and tutor students.
Managing & Leadership skills - Demonstrates the potential to manage a research project including the supervision of undergraduate students.

Application Procedure:
Informal Queries and Applications to: Aoife Brady, aoife.brady@adaptcentre.ie. Please include the ADAPT Position Title in all email communications.

Equal Opportunities Policy:
Trinity College Dublin is an equal opportunities employer and is committed to the employment policies, procedures and practices which do not discriminate on grounds such as gender, civil status, family status, age, disability, race, religious belief, sexual orientation or membership of the travelling community.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151001/9a827c2c/attachment-0001.html

------------------------------

Message: 3
Date: Wed, 30 Sep 2015 14:18:44 -0400
From: Saif Mohammad <uvgotsaif@gmail.com>
Subject: [Moses-support] First Call For Participation in SemEval 2016
Task 7: Determining Sentiment Intensity of English and Arabic Phrases
To: saifm.nrc@gmail.com
Message-ID:
<CALu_-ORnPEwm8EZ==uRGaBSTpxdT3-ZLXavCV+THC6apCB-3FA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

First Call For Participation

Determining Sentiment Intensity of English and Arabic Phrases
SemEval 2016 - Task 7

http://alt.qcri.org/semeval2016/task7/

The objective of the task is to test an automatic system?s ability to
predict a sentiment intensity (aka evaluativeness and sentiment
association) score for a word or a phrase. Phrases include negators,
modals, intensifiers, and diminishers ?-- categories known to be
challenging for sentiment analysis. Specifically, the participants will be
given a list of terms (single words and multi?word phrases) and be asked to
provide a score between 0 and 1 that is indicative of the term?s strength
of association with positive sentiment. A score of 1 indicates maximum
association with positive sentiment (or least association with negative
sentiment) and a score of 0 indicates least association with positive
sentiment (or maximum association with negative sentiment). If a term is
more positive than another, then it should have a higher score than the
other.

We introduced this task as part of the SemEval?-2015 Task 10 Sentiment
Analysis in Twitter, Subtask E (Rosenthal et al., 2015), where the target
terms were taken from Twitter. In SemEval?-2016, we broaden the scope of
the task and include three different domains: general English, English
Twitter, and Arabic Twitter. The Twitter domain differs significantly from
the general English domain; it includes hashtags, that are often a
composition of several words (e.g., #f?eelingood)?, misspellings,
shortings, slang, etc.


SUBTASKS

We will have three subtasks, one for each of the three domains:

-- General English Sentiment Modifiers Set: This test set has phrases
formed by combining a word and a modifier, where a modifier is a negator,
an auxilary verb, a degree adverb, or a combination of those. For example,
'would be very easy', 'did not harm', and 'would have been nice'. (See
development data for more examples.) The test set also includes single word
terms (as separate entries). These terms are chosen from the set of words
that are part of the multi-word phrases. For example, 'easy', 'harm', and
'nice'. The terms in the test set will have the same form as the terms in
the development set, but can involve different words and modifiers.

-- English Twitter Mixed Polarity Set: This test set focuses on phrases
made up of opposite polarity terms. For example, phrases such as 'lazy
sundays', 'best winter break', 'happy accident', and 'couldn't stop
smiling'. Observe that 'lazy' is associated with negative sentiment whereas
'sundays' is associated with positive sentiment. Automatic systems have to
determine the degree of association of the whole phrase with positive
sentiment. The test set also includes single word terms (as separate
entries). These terms are chosen from the set of words that are part of the
multi-word phrases. For example, terms such as 'lazy', 'sundays', 'best',
'winter', and so on. This allows the evaluation to determine how good the
automatic systems are at determining sentiment association of individual
words as well as how good they are at determining sentiment of phrases
formed by their combinations. The multi-word phrases and single-word terms
are drawn from a corpus of tweets, and may include a small number of
hashtag words and creatively spelled words. However, a majority of the
terms are those that one would use in everyday English.

-- Arabic Twitter Set: This test set includes single words and phrases
commonly found in Arabic tweets. The phrases in this set are formed only by
combining a negator and a word. See development data for examples.
In each subtask the target terms are chosen from the corresponding domain.
We will provide a development set and a test set for each domain. No
separate training data will be provided. The development sets will be large
enough to be used for tuning or even for training. The test sets and the
development sets will have no terms in common. The participants are free to
use any additional manually or automatically generated resources; however,
we will require that all resources be clearly identified in the submission
files and in the system description paper.

All of these terms are manually annotated to obtain their strength of
association scores. We use CrowdFlower to crowdsource the annotations. We
use the MaxDiff method of annotation. Kiritchenko et al. (2014) showed that
even though annotators might disagree about answers to individual
questions, the aggregated scores produced with MaxDiff and the
corresponding term ranking are consistent. We verified this by randomly
selecting ten groups of five answers to each question and comparing the
scores and rankings obtained from these groups of annotations. On average,
the scores of the terms from the data we have previously annotated
(SemEval?-2015 Subtask E Twitter data and SemEval?-2016 general English
terms) differed only by 0.02-?0.04 per term, and the Spearman rank
correlation coefficient between two sets of rankings was 0.97-0.98.


EVALUATION

The participants can submit results for any one, two, or all three
subtasks. We will provide separate test files for each subtask. The test
file will contain a list of terms from the corresponding domain. The
participating systems are expected to assign a sentiment intensity score to
each term. The order of the terms in the submissions can be arbitrary.

System ratings for terms in each subtask will be evaluated by first ranking
the terms according to sentiment score and then comparing this ranked list
to a ranked list obtained from human annotations. Kendall's Tau (Kendall,
1938) will be used as the metric to compare the ranked lists. We will
provide scores for Spearman's Rank Correlation as well, but participating
teams will be ranked by Kendall's Tau.
We have released an evaluation script so that participants can:
-- make sure their output is in the right format;
-- track the progress of their system's performance on the development data.


IMPORTANT DATES

-- Training data ready: September 4, 2015
-- Test data ready: Dec 15, 2015
-- Evaluation start: January 10, 2016
-- Evaluation end: January 31, 2016
-- Paper submission due: February 28, 2016
-- Paper reviews due: March 31, 2016
-- Camera ready due: April 30, 2016
-- SemEval workshop: Summer 2016


BACKGROUND AND MOTIVATION

Many of the top performing sentiment analysis systems in recent SemEval
competitions (2013 Task 2, 2014 Task 4, and 2014 Task 9) rely on
automatically generated sentiment lexicons. Sentiment lexicons are lists of
words (and phrases) with prior associations to positive and negative
sentiments. Some lexicons can additionally provide a sentiment score for a
term to indicate its strength of evaluative intensity. Higher scores
indicate greater intensity. Existing manually created sentiment lexicons
tend to only have discrete labels for terms (positive, negative, neutral)
but no real-?valued scores indicating the intensity of sentiment. Here for
the first time we manually create a dataset of words and phrases with
real-?valued scores of intensity. The goal of this task is to evaluate
automatic methods for determining sentiment scores of words and phrases.
Many of the phrases in the test set will include negators (such as 'no' and
'doesn?t'), modals (such as 'could' and 'may be'), and intensifiers and
diminishers (such as 'very' and 'slightly'). This task will enable
researchers to examine methods for estimating how each of these word
categories impact intensity of sentiment.


ORGANIZERS

-- Svetlana Kiritchenko, National Research Council Canada
-- Saif M. Mohammad, National Research Council Canada
-- Mohammad Salameh, University of Alberta

--
Saif Mohammad
Research Officer
Information and Communications Technologies Portfolio
National Research Council Canada
http://www.saifmohammad.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150930/50822958/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 108, Issue 1
*********************************************

0 Response to "Moses-support Digest, Vol 108, Issue 1"

Post a Comment