Moses-support Digest, Vol 89, Issue 48

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. COLING 2014 Call for System Demonstrations (???)
2. Re: Recaser - LM model loading (Tomas Fulajtar)


----------------------------------------------------------------------

Message: 1
Date: Thu, 20 Mar 2014 02:07:15 +0000
From: ??? <sbpark@sejong.knu.ac.kr>
Subject: [Moses-support] COLING 2014 Call for System Demonstrations
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<01648087DD5D4349A12B758B6F475B8432FD8FDD@SejongSRV.sejong.knu.ac.kr>
Content-Type: text/plain; charset="utf-8"

********** Apologies for cross-posting **********

COLING 2014 Call for System Demonstrations
The 25th International Conference on Computational Linguistics, August 23 - 29, 2014, Dublin, Ireland

http://www.coling-2014.org

Important dates
19 May 2014: Paper submission deadline
23 June 2014: Author notification
7 July 2014: Camera-ready paper submission deadline


The COLING 2014 Demonstration Programme Committee invites proposals for system demonstrations. The demonstration programme is part of the main conference programme and aims at showcasing working systems that apply a wide range of conference topics. The session will provide opportunities to exchange ideas gained from implementing NLP systems, and to obtain feedback from expert users.

COLING 2014 will be held in Dublin, Ireland from 23-29 August 2014.

The COLING conference has a history that dates back to the 1960s. It is held every two years and regularly attracts more than 700 delegates. The conference has developed into one of the premier Natural Language Processing (NLP) conferences worldwide and is a major international event for the presentation of new research results and for the demonstration of new systems and techniques in the broad field of Computational Linguistics and NLP.

Topics of interest
COLING 2014 solicits demonstrations on original and unpublished research on the following topics, including, but not limited to:
? pragmatics, semantics, syntax, grammars and the lexicon;
? cognitive, mathematical and computational models of language processing;
? models of communication by language;
? lexical semantics and ontologies;
? word segmentation, tagging and chunking;
? parsing, both syntactic and deep;
? generation and summarisation;
? paraphrasing, textual entailment and question answering;
? speech recognition, text-to-speech and spoken language understanding;
? multimodal and natural language interfaces and dialogue systems;
? information retrieval, information extraction and knowledge base linking;
? machine learning for natural language;
? modelling of discourse and dialogue;
? sentiment analysis, opinion mining and social media;
? multilingual processing, machine translation and translation aids;
? applications, tools and language resources;
? system evaluation methodology and metrics.


Submissions
The submissions should address the following questions:
? What is the problem the proposed system addresses?
? Why is the system important and what is its impact?
? What is the novelty of the used approach/technology?
? Who is the target audience?
? How does the system work?
? How does it compare with existing systems?
? How is the system licenced?

The maximum submission length is 4 pages (including references). Papers shall be submitted in English and must conform to the official COLING 2014 style guidelines available on the conference website. The anonymisation of submissions is optional. If authors choose to remain anonymous, it is their responsibility to take every measure to conceal potentially identifying information.
http://www.coling-2014.org/instructions-for-authors.php

Submission and reviewing will be managed in the START system:
https://www.softconf.com/coling2014/demos/

The only accepted format for submissions is PDF. Accepted papers will appear in the conference proceedings in a dedicated volume for demonstration systems.


Demonstration chairs
Lamia Tounsi, CNGL, Dublin City University, Ireland
Rafal Rak, NaCTeM, University of Manchester, UK


Programme Committee
Michiel Bacchiani, Google Inc.
Kay Berkling, Cooperative State University, Karlsruhe
Ann Bies, Linguistic Data Consortium
William Black, University of Manchester
Francis Bond, Nanyang Technological University
Chris Brew, Nuance Communications
Aoife Cahill, Educational Testing Service
Vittorio Castelli, IBM
Md. Faisal Mahbub Chowdhury, IBM
L?a Deleris, IBM
Martin Emms, Trinity College Dublin
Guillaume Gravier, IRISA & INRIA Rennes
Keith Hall, Google Research
Derrick Higgins, Educational Testing Service
Keikichi Hirose, University of Tokyo
Frank Hopfgartner, Technische Universit?t Berlin
Daxin Jiang, Microsoft STC-A
John Kelleher, Trinity College Dublin
Adam Kilgarriff, Lexical Computing Ltd
BalaKrishna Kolluru, Toshiba
Seamus Lawless, Trinity College Dublin
Saturnino Luz, Trinity College Dublin
Nitin Madnani, Educational Testing Service
Hilary McDonald, Trinity College Dublin
Helen Meng, Chinese University of Hong Kong
Peter Mika, Yahoo Labs
Tony O'Dowd, KantanMT
Florian Pinel, IBM
Johann Roturier, Symantec
Andrew Rowley, University of Manchester
Fr?d?rique Segond, Viseo Research
Swapna Somasundaran, Educational Testing Services
Tomoki Toda, Nara Institute of Science and Technology
Xinglong Wang, Brandwatch
Jason Williams, Microsoft Research

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140320/3210624d/attachment-0001.htm

------------------------------

Message: 2
Date: Thu, 20 Mar 2014 08:44:03 +0000
From: Tomas Fulajtar <TomasFu@moravia.com>
Subject: Re: [Moses-support] Recaser - LM model loading
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<0F546C639409F6479DF2CE6DD5267B16A040FC10@dag-cz-1.CZ.moravia-it.com>
Content-Type: text/plain; charset="us-ascii"

Hi Hieu,

The main reason we decided to keep with 1.0 release while preparing new environment is that all of our models were prepared on it and we want to avoid any issues in production line. I am aware of new features in the latest release and definitely will look into it.

Yes, the LMs are stored locally - that's why I was surprised about the strange load time. Luckily the older version of IRSTLM worked well.

I avoided binarization just for debugging purposes - to eliminate the possible issues there( actually we use the binarized models and compact tables for phrase/reordering tables - it is really a great feature). Hmm, if I understand it correctly, it sounds that when we switch to binarized KenLM model, it anyway won't use the loadtext_ram function from IRSTLM, so it is not relevant. Sorry for the confusion.

Thanks for mentioning the distortion - I have forgotten to add the -dl 0 switch.

Tomas


From: hieuhoang@gmail.com [mailto:hieuhoang@gmail.com] On Behalf Of Hieu Hoang
Sent: Thursday, March 20, 2014 1:54 AM
To: Tomas Fulajtar
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Recaser - LM model loading

It should matter --> It should not matter

On 19 March 2014 23:09, Hieu Hoang <Hieu.Hoang@ed.ac.uk<mailto:Hieu.Hoang@ed.ac.uk>> wrote:
You seem to be using the text LM. This will take a long time to load, especially if it's over a network. It should matter what linux distribution you're using.
You should:
1. Make sure your files are on local disks
2. Binarize the LM with KenLM or IRSTLM. Also, binarize the phrase tables
3. If it's a recaser, the distortion limit [distortion-limit] should be 0. Otherwise the recaser can reorder the output.
Also, you should consider updating your version of Moses. This will allow you to use IRSTLM 5.80.03. There's various changes to make it more extensible, faster and more reliable.



On 19 March 2014 12:31, Tomas Fulajtar <TomasFu@moravia.com<mailto:TomasFu@moravia.com>> wrote:
Hi Hieu,

Looking to log , the problem seems to be related to IRTSLM library and it code inside src/lmtable.cpp (function named loadtext_ram).

I have tried to return back to IRSTLM 5.80.01 and it resolved the issue with long LM loading. However as the issue might be reproducible by other people, I am wondering if we should report it to IRTSLM team and maybe add the comment to Moses wiki as well ( A see there is a comment about issues with IRSTLM source code in official repos and recommended to prefer 5.80.03, which unfortunately wont' work on my environment).

Kind regards,


Tomas

From: Tomas Fulajtar
Sent: Wednesday, March 19, 2014 9:58 AM
To: 'Hieu Hoang'
Cc: moses-support@mit.edu<mailto:moses-support@mit.edu>
Subject: RE: [Moses-support] Recaser - LM model loading

Hi Hieu,

Please find the Moses.ini attached.

The LM model is default 3-gram IRSTLM trained by command :
/opt/moses/scripts/recaser/train-recaser.perl --dir=$dir --lm=IRSTLM --build-lm=/usr/local/irstlm/bin/build-lm.sh --corpus=file --train-script=/opt/moses/scripts/training/train-model.perl.

I do not expect the problem is in LM preparation steps as we are using the same scripts for long time without issues.
Parameters of trained LM:
iARPA

\data\
ngram 1= 219165
ngram 2= 2616463
ngram 3= 7215865


The command issued for the recasing experiment:
echo 'some text to recase ' | moses -f recase/moses.ini

Response on Fedora (showing only the part with the LM data loading) :

Defined parameters (per moses.ini or switch):
config: moses.ini
distortion-limit: 6
input-factors: 0
lmodel-file: 1 0 3 /tmp/recase/cased.irstlm.gz
mapping: 0 T 0
ttable-file: 0 0 0 5 /tmp/recase/phrase-table.gz
ttable-limit: 20
weight-d: 0.6
weight-l: 0.5000
weight-t: 0.20 0.20 0.20 0.20 0.20
weight-w: -1
/var/www/moses/bin
ScoreProducer: Distortion start: 0 end: 1
ScoreProducer: WordPenalty start: 1 end: 2
ScoreProducer: !UnknownWordPenalty start: 2 end: 3
Loading lexical distortion models...have 0 models
Start loading LanguageModel /tmp/recase/cased.irstlm.gz : [0.009] seconds
In LanguageModelIRST::Load: nGramOrder = 3
Language Model Type of /tmp/recase/cased.irstlm.gz is 1
Language Model Type is 1
iARPA
loadtxt_ram()
1-grams: reading 219165 entries
done level1
2-grams: reading 2616463 entries
done level2
3-grams: reading 7215865 entries
.done level3
done
OOV code is 219164
OOV code is 219164
IRST: m_unknownId=219164
ScoreProducer: LM start: 3 end: 4
Finished loading LanguageModels : [34.666] seconds
...

Reponse on Suse:
Defined parameters (per moses.ini or switch):
config: recase/moses.ini
distortion-limit: 6
input-factors: 0
lmodel-file: 1 0 3 /home/sandy/retrain/recase/cased.irstlm.gz
mapping: 0 T 0
ttable-file: 0 0 0 5 /home/sandy/retrain/recase/phrase-table.gz
ttable-limit: 20
weight-d: 0.6
weight-l: 0.5000
weight-t: 0.20 0.20 0.20 0.20 0.20
weight-w: -1

ScoreProducer: Distortion start: 0 end: 1
ScoreProducer: WordPenalty start: 1 end: 2
ScoreProducer: !UnknownWordPenalty start: 2 end: 3
Loading lexical distortion models...have 0 models
Start loading LanguageModel /home/sandy/retrain/recase/cased.irstlm.gz : [0.001] seconds
In LanguageModelIRST::Load: nGramOrder = 3
Language Model Type of /home/sandy/retrain/recase/cased.irstlm.gz is 1
Language Model Type is 1
iARPA
loadtxt_ram()
1-grams: reading 219165 entries
done level 1
2-grams: reading 2616463 entries
done level 2
3-grams: reading 7215865 entries
.done level 3
done
OOV code is 219164
OOV code is 219164
IRST: m_unknownId=219164
ScoreProducer: LM start: 3 end: 4
Finished loading LanguageModels : [1045.969] seconds
...

As you can see the loading takes enormous 1045 seconds.

---
Meanwhile I have found there is also gcc 4.7 available in SUSE 11 SP3 SDK, thus I tried to recompile boost/irstlm/moses, but the results are almost same (it is faster by 200 sec due the optimization in compiler.)

Thus the last config on SUSE is following:

irstlm 5.80.03 - recompiled under gcc 4.7
mgiza 0.6.3 updated to 0.7.3 and recompiled under 4.7
boost 1.55 - recompiled under gcc 4.7

I have also attached the build.log in case it would be useful.

Today I am going to run regression tests to see if there are any particular issues found.


Tomas


From: hieuhoang@gmail.com<mailto:hieuhoang@gmail.com> [mailto:hieuhoang@gmail.com] On Behalf Of Hieu Hoang
Sent: Wednesday, March 19, 2014 1:36 AM
To: Tomas Fulajtar
Cc: moses-support@mit.edu<mailto:moses-support@mit.edu>
Subject: Re: [Moses-support] Recaser - LM model loading

What is a recaser LM? What command is taking 20 minutes? Can you send me the moses.ini file you're using.


On 17 March 2014 12:58, Tomas Fulajtar <TomasFu@moravia.com<mailto:TomasFu@moravia.com>> wrote:
Hello,

I am experiencing strange behavior when using recaser LM model after migrated to moses(1.0) compiled on different machine.
The problem is that loading of LM takes 20 minutes on my new machine (SUSE), while on previous it was 20 secs or so.

Machine 1: Fedora 18:

* gcc: 4.7.2

* perl 5.16

* moses 1.0

* irstlm 5.80.01

* mgiza 0.7.0

* boost 1.52

Machine 2: SUSE SLES 11 SP3


* perl: 5.10.0

* gcc: 4.3

* moses 1.0

* irstlm 5.80.03

* mgiza 0.6.3

* boost 1.55

Moses compilation command:

sudo ./bjam --prefix=/opt/moses --install-scripts=/opt/moses/scripts -j4 -a --with-irstlm=/usr/local/irstlm --with-xmlrpc-c=/usr/local --with-cmph=usr/local --with-boost=/opt/boost --with-giza=/usr/local/bin --enable-boost-pool --enable-optimization --debug-symbols=off toolset=gcc -d2 --debug-configuration --max-kenlm-order=7 |tee ~/build.log 2>&1

I have tested the speed using the same recaser IRSTLM model data in ARPA format . There is actually no error displayed, thus I wonder where to continue with debugging. Also tried to retrain model on SUSE and then test on Fedora, but the result is same (no error, but too slow on SUSE). Does anybody have idea where to look for resolution? Maybe the problem is in IRSTLM used?


Thank you,

Tomas Fulajtar



_______________________________________________
Moses-support mailing list
Moses-support@mit.edu<mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140320/7290dd07/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 89, Issue 48
*********************************************

0 Response to "Moses-support Digest, Vol 89, Issue 48"

Post a Comment