Moses-support Digest, Vol 88, Issue 70

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Very slow tuning with binarised kenlm language model
(Hieu Hoang)
2. Re: Tokenisation issue - following the implementation
baseline (Philipp Koehn)
3. Re: Very slow tuning with binarised kenlm language model
(Barry Haddow)
4. CfP: SIGIR Medical Information Retrieval Workshop (Liadh Kelly)


----------------------------------------------------------------------

Message: 1
Date: Fri, 28 Feb 2014 14:45:08 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Very slow tuning with binarised kenlm
language model
To: Felipe S?nchez Mart?nez <fsanchez@dlsi.ua.es>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbiYHaMBA5ZVqb3SybNmO1xr8K6aKQJ8ESpSrPW1+tKK8g@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

if you phrase table and lexicalised reordering model are also binarized,
you should run
cat pt.binphr* reordering.bin* lm.binlm* > /dev/null
This forces the files into filesystem cache so that you don't get
pagefaults.


On 28 February 2014 11:56, Felipe S?nchez Mart?nez <fsanchez@dlsi.ua.es>wrote:

> Hello all,
>
> I am tuning a system that uses a binarised kenlm language model. This
> model was binarised with default parameters and is 22 GB in size after
> binarisation.
>
> The thing is that the language model and the (filtered) phrase table fit
> into memory (32 GB ; so, no swaping) but moses is translating very very
> slowly, it is only using around 15% of CPU.
>
> Is there anything I can do to make it faster?
>
> Thank you very much for your help
> Regards
> --
> Felipe
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140228/d73a8f9b/attachment-0001.htm

------------------------------

Message: 2
Date: Fri, 28 Feb 2014 09:47:22 -0500
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] Tokenisation issue - following the
implementation baseline
To: janez.kadivec@zop-cr.com
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDC7Kjp=TM5UG05Tz-MrrWkxKY3r92t3ZWxZtrhPZSawUg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

> We entered the command also for the French file:
>
> ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l fr \
> < ~/corpus/training/news-commentary-v8.fr-en.fr \
> > ~/corpus/news-commentary-v8.fr-en.tok.fr
>
>
> The execution of this command "hangs" with no results. It doesn't display
> the Tokenizer version: 1.1 and other information as it did for the English
> language. This is pretty unusual for me.
> This command should also create a tokenisated file in the corpus folder, but
> it doesn't.
>
> What we are doing wrong? Why the output file is not in the corpus folder?
> Why the tokenisation of the French file doesn't finnish?

That should not happen - it should be done about as quickly as the English.

Try to write the whole command in one line without the "\":
~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l fr <
~/corpus/training/news-commentary-v8.fr-en.fr >
~/corpus/news-commentary-v8.fr-en.tok.fr

-phi


------------------------------

Message: 3
Date: Fri, 28 Feb 2014 15:19:44 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Very slow tuning with binarised kenlm
language model
To: fsanchez@dlsi.ua.es, moses-support <moses-support@mit.edu>
Message-ID: <5310A910.6010901@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Felipe

If it's not swapping, and cpu is at 15%, then it's likely to be waiting
on IO. Are you sure all the models are on local disks? Is any other
process accessing the disks? Sometimes iotop can help see what is going on,

cheers - Barry

On 28/02/14 11:56, Felipe S?nchez Mart?nez wrote:
> Hello all,
>
> I am tuning a system that uses a binarised kenlm language model. This
> model was binarised with default parameters and is 22 GB in size after
> binarisation.
>
> The thing is that the language model and the (filtered) phrase table fit
> into memory (32 GB ; so, no swaping) but moses is translating very very
> slowly, it is only using around 15% of CPU.
>
> Is there anything I can do to make it faster?
>
> Thank you very much for your help
> Regards


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 4
Date: Fri, 28 Feb 2014 15:47:54 +0000
From: Liadh Kelly <lkelly@computing.dcu.ie>
Subject: [Moses-support] CfP: SIGIR Medical Information Retrieval
Workshop
To: Liadh Kelly <liadh.kelly@computing.dcu.ie>
Message-ID: <5310AFAA.7040104@computing.dcu.ie>
Content-Type: text/plain; charset="iso-8859-1"


*


Medical Information Retrieval (MedIR) Workshop


http://medir.dcu.ie/

At SIGIR 2014, July 11 2014, Gold Coast, Australia


Call for Papers (2 & 4 page)

Submission deadline: April 28



Medical information search refers to methodologies and technologies that
seek to improve access to medical information archives via a process of
information retrieval (IR). Such information is now potentially
accessible from many sources including the general web, social media,
journal articles, and hospital records.

Medical information is of interest to a wide variety of users, including
patients and their families, researchers, general practitioners and
clinicians, and practitioners with specific expertise such as
radiologists. Despite the popularity of the medical domain for users of
search engines, and current interest in this topic within the
information retrieval research community, development of search and
access technologies remains particularly challenging. One of the central
issues in medical information search is diversity of the users of these
services. In particular, they will have varying categories of
information needs, varying levels of medical knowledge, and varying
language skills. In addition, the format, reliability, and quality of
biomedical and medical information varies greatly. A single health
record can contain clinical notes, technical pathology data, images, and
patient-contributed histories, and may be linked by a physician to
research papers. The importance of health and medical topics and their
impact on people's everyday lives makes the need for retrieval of
accurate and reliable information especially important. Determining the
likely reliability of available information is challenging. Finally, as
with information retrieval in general, the evaluation of medical search
tools is vital and challenging. For example, there are no established or
standardized baselines or evaluation metrics, and limited availability
of test collections.

This workshop aims to bring together researchers interested in medical
information search with the goal of identifying specific research
challenges that need to be addressed to advance the state-of-the-art and
to foster interdisciplinary collaborations towards the meeting of these
challenges. To enable this, we encourage participation from researchers
in all fields related to medical information search including mainstream
information retrieval, but also natural language processing,
multilingual text processing, and medical image analysis.

Topics of interest include but are not limited to:

*

- Users and information needs

*

- Semantics and NLP for medical IR

*

- Reliability and trust in medical IR

*

- Personalised search

*

- Evaluation of medical IR

*

- Multilingual issues in medical IR

*

- Multimedia technologies in medical IR

*

- The role of social media in medical IR


Paper Submissions

The workshop is now accepting paper submissions. Short papers (4 pages)
and short position papers (2 pages) describing approaches or ideas /
challenges on the topics of the workshop are invited. Submissions should
be in ACM SIGS format. LaTeX and Word templates are available
athttp://www.acm.org/sigs/publications/proceedings-templates
<http://www.acm.org/sigs/publications/proceedings-templates>(for LaTeX,
use the "Option 2" style).

Papers should be anonymised for double blind review and submitted in pdf
format through the EasyChair system
https://www.easychair.org/conferences/?conf=medir2014no later than
midnight Pacific Daylight Time on April 28, 2014. Submissions will be
reviewed by members of the workshop program committee. Accepted papers
will be included in the SIGIR 2014 Medical Information Search Workshop
proceedings.


Important Dates

April 28, 2014: Deadline for paper submission (midnight Pacific Daylight
Time)
May 10, 2014: Notification to authors
May 17, 2014: Camera-ready papers due
July 11, 2014: Workshop


Further Information

Further information is available on the workshop website at
http://medir.dcu.ie/ or by emailing the workshop organisers.


Workshop Organisers

Lorraine Goeuriot, Dublin City University, Ireland

Gareth J.F. Jones, Dublin City University, Ireland

Liadh Kelly, Dublin City University, Ireland

Henning M?ller, University of Applied Sciences Western Switzerland

Justin Zobel, University of Melbourne, Australia


*



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140228/9e0698a7/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 88, Issue 70
*********************************************

0 Response to "Moses-support Digest, Vol 88, Issue 70"

Post a Comment