Moses-support Digest, Vol 113, Issue 62

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. EAMT 2016 deadline extended: March 27th (Antonio Toral)
2. IRSTLM: Trash sentences getting more probability scores than
proper grammatical sentences (Bhat Irshad)
3. Re: IRSTLM: Trash sentences getting more probability scores
than proper grammatical sentences (Kenneth Heafield)
4. Re: help (Philipp Koehn)

----------------------------------------------------------------------

Message: 1
Date: Wed, 23 Mar 2016 09:58:54 +0100
From: Antonio Toral <atoral@computing.dcu.ie>
Subject: [Moses-support] EAMT 2016 deadline extended: March 27th
To: apertium-stuff@lists.sourceforge.net, moses-support@mit.edu,
mt-list@eamt.vm.bytemark.co.uk
Message-ID: <56F25ACE.6080302@riseup.net>
Content-Type: text/plain; charset=utf-8; format=flowed

*Deadline extended* March 27th, 2016 1159pm (UTC -12)

**************************************************************************
Final Call for Papers
EAMT 2016, Riga, Latvia - http://eamt2016.tilde.com/
19th Annual Conference of the European Association for Machine Translation
**************************************************************************

The European Association for Machine Translation (EAMT,
http://www.eamt.org) invites everyone interested in machine translation,
translation-related tools and resources to participate in this
conference ? developers, researchers, users, professional translators
and translation/localisation managers: anyone who has a stake in the
vision of an information world in which language barriers and issues
become less visible to the information consumer. We especially invite
researchers to describe the state of the art and demonstrate their
cutting-edge results, and professional MT users to share their experiences.

EAMT 2016, the 19th Annual Conference of the European Association for
Machine Translation, will be held in Riga, Latvia, from May 30th to June
1st, 2016.

We expect to receive manuscripts in these three categories:

-------------------
(R) Research papers
-------------------

Long-paper submissions (12 pages) are invited for reports of significant
research results in any aspect of machine translation and related areas.
Such reports should include a substantial evaluation component, or have
a strong theoretical and/or methodological contribution where results
and in-depth evaluations may not be appropriate. Papers are welcome on
all topics in the areas of machine translation and translation-related
technologies, including:

* Advances in various MT paradigms: data-driven, rule-based, and hybrid
approaches
* Technologies for MT deployment: quality estimation, domain adaptation,
etc.
* MT in special settings: low resources, massive resources, high volume,
low computing resources
* MT applications: translation/localisation aids, speech-to-speech,
speech-to-text, OCR, MT for user generated content (blogs, social
networks), etc.
* Linguistic resources for MT: dictionaries, terminology, corpora, etc.
* MT evaluation techniques and evaluation results
* Human factors in MT and user interfaces
* Related multilingual technologies: natural language generation,
information retrieval, text categorisation, text summarisation,
information extraction, etc.

Papers should describe original work. They should emphasise completed
work rather than intended work, and should indicate clearly the state of
completion of the reported results. Where appropriate, concrete
evaluation results should be included.

----------------
(U) User studies
----------------

Short-paper submissions (3-6 pages) are invited for reports on users'
experiences with MT, be it in small or medium size business (SMB),
enterprise, government, or NGOs. Contributions are welcome on:

* Integrating MT and computer-assisted translation into a translation
production workflow (e.g. transforming terminology glossaries into MT
resources, optimizing TM/MT thresholds, mixing online and offline tools,
using interactive MT, dealing with MT confidence scores);
* Use of MT to improve translation or localisation workflows (e.g.
reducing turnaround times, improving translation consistency, increasing
the scope of globalisation projects);
* Managing change when implementing and using MT (e.g. switching between
multiple MT systems, limiting degradations when updating or upgrading an
MT system);
* Implementing open-source MT in the SMB or enterprise (e.g. strategies
to get support, reports on taking pilot results into full deployment,
examples of advanced customisation sought and obtained thanks to the
open-source paradigm, collaboration within open-source MT projects);
* Evaluation of MT in a real-world setting (e.g. error detection
strategies employed, metrics used, productivity or translation quality
gains achieved);
* Post-editing strategies and tools (e.g. limitations of traditional
translation quality assurance tools, challenges associated with
post-editing guidelines);
* Legal issues associated with MT, especially MT in the cloud (e.g.
copyright, privacy);
* Use of MT in social networking or real-time communication (e.g.
enterprise support chat, multilingual content for social media);
* Use of MT to process multilingual content for assimilation purposes
(e.g. cross-lingual information retrieval, MT for e-discovery or spam
detection, MT for highly dynamic content);
* Use of standards for MT.

Papers should highlight problems and solutions and not merely describe
MT integration process or project settings. Where solutions do not seem
to exist, suggestions for MT researchers and developers should be
clearly emphasised. For user papers produced by academics, we require
co-authorship with the actual users.

-------------------------------
(P) Project/Product description
-------------------------------

Abstract submissions (1 page) are invited to report new, interesting:

* Tools for machine translation, computer aided translation, and the
like (including commercial products and open-source software). The
authors should be ready to present the tools in the form of demos or
posters during the conference.
* Research projects related to machine translation. The authors should
be ready to present the projects in the form of posters during the
conference. This follows on from the successful ?project villages? held
at the last EAMT conferences.

---------
Programme
---------

The programme will include oral presentations and poster sessions.
Accepted papers may be assigned to an oral or poster session, but no
differentiation will be made in the conference proceedings.

---------------
Important Dates
---------------

* Paper submission: March 27th, 2016
* Notification to authors: April 22nd, 2016
* Camera-ready deadline: May 2nd, 2016
* Conference: May 30th-June 1st, 2016

------------
Publications
------------

The conference proceedings will be published as a special issue of the
Baltic Journal of Modern Computing (BJMC, http://www.bjmc.lu.lv/), a
scholarly open access electronic quarterly journal, which is indexed by
Thomson Reuters Web of Science Core Collection (Emerging Sources
Citation Index), EBSCO, ProQuest, Directory of Open Access Journals
(DOAJ), Google Scholar, VINITI, Directory of Research Journal Indexing
(DRJI) and Open J-Gate, and has applied to be indexed in Scopus.

In addition, the best accepted papers will be selected to be published,
in an extended version, and with a lighter reviewing process, as regular
papers in the Springer Machine Translation journal
(http://link.springer.com/journal/10590).

-----------
Submissions
-----------

Submissions will be judged on correctness, originality, technical
strength, significance and relevance to the conference, and potential
interest to all attendees. They should mostly contain new material that
has not been presented at any other meeting with publicly available
proceedings.

EAMT 2016 will use electronic submission through the EasyChair
conference tool. To submit a paper, go to the submission website at:
https://easychair.org/conferences/?conf=eamt2016 and follow the
instructions.

Papers that are being submitted in parallel to other conferences or
workshops and papers that contain significant overlap with previously
published work must indicate this on the title page and in the abstract
submitted through EasyChair (using capital letters). In case of
acceptance the paper will only be included in the proceedings if it is
not published in any other conference or workshop to which it was submitted.

Papers should be anonymised (no authors, affiliations or addresses, and
no explicit self-references), be no longer than 12 pages (A4 size) for
research papers, and no longer than 6 pages (A4 size) for user papers,
all in PDF format. Papers must conform to the format defined by the BJMC
template: http://www.bjmc.lu.lv/instructions-to-authors/.
Project/product descriptions do not need to be anonymised and should use
the 1-page template given in
http://eamt2016.tilde.com/sites/eamt2016.tilde.com/files/eamt-2016-product-template.doc.

For further information about this call for papers or if you encounter
any problem regarding submission please contact the track chairs at
eamt2016chairs@tilde.com and put in the subject "[user]" or "[research]"
depending on which track your question is related to. For questions
about the organisation (venue, registration, accommodation, visa,
payments, etc.) please contact the local organisers at eamt2016@tilde.com.

----------------------
EAMT Best Thesis Award
----------------------

The EAMT Best Thesis Award for PhD theses submitted during 2015 will be
awarded at the conference, together with a presentation of the winner?s
work. Information for candidates to the award is available at:
http://www.eamt.org/news/news_best_thesis2015.php. The deadline is the
same as for the paper submission.

------------------
Conference website
------------------

Please visit the conference web page (http://eamt2016.tilde.com/) for
the most up-to-date information about the calendar, the call for papers
and formatting requirements, the programme, invited speakers, related
conference activities, the venue, travel and registration.

---------------------
Conference organisers
---------------------

General Chair: Mikel Forcada (Universitat d?Alacant)

Track Chairs:
* Antonio Toral, Research programme chair (Dublin City University, Ireland)
* Tony O'Dowd, User programme co-chair (KantanMT, Ireland)
* Alexandru Ceausu, User programme co-chair (Amplexor, Luxembourg)

Local Organisation Chair: Andrejs Vasi?jevs (Tilde, Latvia)

Local host: Juris Borzovs (University of Latvia)

------------------------------

Message: 2
Date: Wed, 23 Mar 2016 14:58:50 +0530
From: Bhat Irshad <bhatirshad127@gmail.com>
Subject: [Moses-support] IRSTLM: Trash sentences getting more
probability scores than proper grammatical sentences
To: moses-support@mit.edu
Message-ID:
<CAERVdMZfA9i=acAivyaAm9gcfKt2uZRSPcsL7XOLT4vCSSO6UQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I build a language model using IRSTLM on 20 million tokenized English
sentences and tested on the following two sentences:

1. Yesterday when I was walking towards home , I saw a kangaroo .
2. smdnbs sadb jghsa sdabasd asasd tsados hasdb , I saw a snake .

As we can the first portion of second sentence is completely trash while
first sentence is a proper grammatical one. I was surprised to see that
second sentence got higher probability score (-27.887135) than first one
(-28.91925).

I guess this happened due to back-off, I am not sure though.

echo 'Yesterday when I was walking towards home , I saw a kangaroo .' |
/usr/bin/query english-lcc-ilci-ukwac-tok-20M-n3.blm 2> /tmp/a
Yesterday=126222 2 -4.08843 when=409 3 -2.51627 I=260 3 -0.58336 was=771 3
-0.764257 walking=1624 3 -2.58353 towards=1335 3 -1.95033 home=388 2
-3.910977 ,=209 3 -1.15596 I=260 3 -1.55485 saw=4411 3 -2.31963 a=131 3
-0.886832 kangaroo=106652 2 -5.3615108 .=10 3 -1.24128 </s>=11 3
-0.00203508 Total:
-28.91925 OOV: 0
Perplexity including OOVs: 116.32170228822577
Perplexity excluding OOVs: 116.32170228822577
OOVs: 0
Tokens: 14

echo 'smdnbs sadb jghsa sdabasd asasd tsados hasdb , I saw a snake .' |
/usr/bin/query english-lcc-ilci-ukwac-tok-20M-n3.blm 2> /tmp/a
smdnbs=0 1 -4.0025997 sadb=0 1 -2.23153 jghsa=0 1 -2.23153 sdabasd=0 1
-2.23153 asasd=0 1 -2.23153 tsados=0 1 -2.23153 hasdb=0 1 -2.23153 ,=209 1
-1.42496 I=260 2 -1.9045 saw=4411 3 -2.31963 a=131 3 -0.886832 snake=3768 3
-3.16116 .=10 3 -0.793541 </s>=11 3 -0.0047327 Total: -27.887135 OOV: 7
Perplexity including OOVs: 98.16082104257269
Perplexity excluding OOVs: 31.57449745907425
OOVs: 7
Tokens: 14
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160323/1fa02f6b/attachment-0001.html

------------------------------

Message: 3
Date: Wed, 23 Mar 2016 09:38:21 +0000
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] IRSTLM: Trash sentences getting more
probability scores than proper grammatical sentences
To: moses-support@mit.edu
Message-ID: <56F2640D.5090600@kheafield.com>
Content-Type: text/plain; charset=windows-1252

kangaroo is less probable than snake. Which more than explains the
difference you observed. Film at 11.

That p(<unk>) is pretty high. What happened when you used lmplz to
build the model?

Kenneth

On 03/23/2016 09:28 AM, Bhat Irshad wrote:
> I build a language model using IRSTLM on 20 million tokenized English
> sentences and tested on the following two sentences:
>
> 1. Yesterday when I was walking towards home , I saw a kangaroo .
> 2. smdnbs sadb jghsa sdabasd asasd tsados hasdb , I saw a snake .
>
> As we can the first portion of second sentence is completely trash while
> first sentence is a proper grammatical one. I was surprised to see that
> second sentence got higher probability score (-27.887135) than first one
> (-28.91925).
>
> I guess this happened due to back-off, I am not sure though.
>
> echo 'Yesterday when I was walking towards home , I saw a kangaroo .' |
> /usr/bin/query english-lcc-ilci-ukwac-tok-20M-n3.blm 2> /tmp/a
> Yesterday=126222 2 -4.08843when=409 3 -2.51627I=260 3 -0.58336was=771 3
> -0.764257walking=1624 3 -2.58353towards=1335 3 -1.95033home=388 2
> -3.910977,=209 3 -1.15596I=260 3 -1.55485saw=4411 3 -2.31963a=131 3
> -0.886832kangaroo=106652 2 -5.3615108.=10 3 -1.24128</s>=11 3
> -0.00203508Total: -28.91925 OOV: 0
> Perplexity including OOVs:116.32170228822577
> Perplexity excluding OOVs:116.32170228822577
> OOVs:0
> Tokens:14
>
> echo 'smdnbs sadb jghsa sdabasd asasd tsados hasdb , I saw a snake .' |
> /usr/bin/query english-lcc-ilci-ukwac-tok-20M-n3.blm 2> /tmp/a
> smdnbs=0 1 -4.0025997sadb=0 1 -2.23153jghsa=0 1 -2.23153sdabasd=0 1
> -2.23153asasd=0 1 -2.23153tsados=0 1 -2.23153hasdb=0 1 -2.23153,=209 1
> -1.42496I=260 2 -1.9045saw=4411 3 -2.31963a=131 3 -0.886832snake=3768 3
> -3.16116.=10 3 -0.793541</s>=11 3 -0.0047327Total: -27.887135 OOV: 7
> Perplexity including OOVs:98.16082104257269
> Perplexity excluding OOVs:31.57449745907425
> OOVs:7
> Tokens:14
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

------------------------------

Message: 4
Date: Wed, 23 Mar 2016 10:39:44 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] help
To: Parul gupta <btpg71@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDAWvXu6bGLNi2FzxwvU7trr=EaDn=zJHc64ZLBTJP6sng@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

this looks like a successful build.

It will not have the server functionality where moses runs as a daemon and
responds to tcp/ip requests.

-phi

On Tue, Mar 22, 2016 at 2:43 AM, Parul gupta <btpg71@gmail.com> wrote:

> warning: No toolsets are configured.
> warning: Configuring default toolset "gcc".
> warning: If the default is wrong, your build may not work correctly.
> warning: Use the "toolset=xxxxx" option to override our guess.
> warning: For more configuration options, please consult
> warning:
> http://boost.org/boost-build2/doc/html/bbv2/advanced/configuration.html
> NOT BUILDING MOSES SERVER!
> Performing configuration checks
>
> - Shared Boost : yes (cached)
> - Static Boost : yes (cached)
> ...patience...
> ...patience...
> ...found 4563 targets...
> SUCCESS
>
>
> what does it mean ? is there anything wrong ?
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160323/4ee7ffe1/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 113, Issue 62
**********************************************

Moses-support Digest, Vol 113, Issue 62

0 Response to "Moses-support Digest, Vol 113, Issue 62"

Post a Comment