Moses-support Digest, Vol 112, Issue 40

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Is memory mapping lazy? (Hieu Hoang)
2. Re: Segmentation Fault (Jasneet Sabharwal)
3. CALL FOR PARTICIPATION in the Second Automatic Post-Editing
(APE) shared task (Rajen Chatterjee)

----------------------------------------------------------------------

Message: 1
Date: Sun, 21 Feb 2016 22:54:02 +0000
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Is memory mapping lazy?
To: moses-support@mit.edu
Message-ID: <56CA400A.40107@gmail.com>
Content-Type: text/plain; charset="windows-1252"

I can confirm that using NFS doesn't suffer from the disk wait problem,
at least the implementation used on the Edinburgh servers.

Using memory mapped files on NFS or local disk gives the same speed
performance

On 19/02/16 23:29, Lane Schwartz wrote:
> Hey,
>
> This is mostly addressed to Kenneth, since as far as I know he's the
> author of the data structures involved.
>
> I have access to a cluster at the University of Illinois. The cluster
> here uses GPFS as its file system.
>
> I've observed that when running moses, especially with lots of
> threads, that the threads spend virtually all of their time at near 0%
> CPU usage, in D (uninterruptible sleep, awaiting IO) status. When I
> copy my model files and config file to scratch space on local disk
> (and cat $file > /dev/null each model file), this issue disappears. It
> appears that doing cat $file > /dev/null on GPFS does not load the
> file into RAM in the same way that doing so appears to do on other
> file systems.
>
> I spent quite a bit of time today with three cluster admins / disk
> engineers trying to debug this problem.
>
> Their ultimate solution was for me to cp each $file from GPFS to
> /dev/shm, which as far as I can tell acts like a RAM disk. Doing so
> resolves the issue.
>
> Their best estimate of the problem is that moses (from their
> perspective) appeared to (for each thread) ask the file system for
> access to data that's present in the model files, causing a new disk
> read (with a corresponding disk lock) every time. They believe that
> this issue is not present with local disk because the cat $file >
> /dev/null is pre-loading each file into RAM in that case, but is not
> doing so with GPFS. Thus the threads are (according to this theory)
> getting bogged down by disk locks.
>
> I was puzzled by this, because I thought that the probing data
> structure underlying the LM and the phrase table used memory mapping.
> I had (perhaps naively) assumed that when the memory mapping is
> initiated, the OS actively loaded all of the file contents into
> appropriate VM pages. Now the question is, is the memory mapping
> actually acting lazily, only loading data from disk on an as-needed
> basis? If so, that could potentially explain the horrific disk delays
> that I'm encountering. And if so, then one question is, is it possible
> to alter the behavior of the memory mapping such that when the memory
> map is initiated, it actually does active load the entire file into
> memory?
>
> Thanks,
> Lane
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

--
Hieu Hoang
http://www.hoang.co.uk/hieu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160221/90fadcae/attachment-0001.html

------------------------------

Message: 2
Date: Sun, 21 Feb 2016 20:23:55 -0800
From: Jasneet Sabharwal <jasneet.sabharwal@sfu.ca>
Subject: Re: [Moses-support] Segmentation Fault
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <84521FC3-F8E8-419B-86DA-F9BB2B7D5D2A@sfu.ca>
Content-Type: text/plain; charset=utf-8

Is it possible to cache some data when decoding a source sentence? I was trying to use boost's thread_specific_ptr to cache a map which I want to update in my evaluation function but when I try to access the map (https://github.com/KonceptGeek/mosesdecoder/blob/RELEASE-3.0-CombinedFeature-Caching/moses/FF/CoarseBiLM.cpp#L145-L154) I get segmentation fault as the object doesn't exist.

Is there any other way to do some caching?

> On Feb 20, 2016, at 3:35 PM, Jasneet Sabharwal <jasneet.sabharwal@sfu.ca> wrote:
>
> Thanks Mathias, I'll try these out.
>
> ----- Original Message -----
> From: "Matthias Huck" <mhuck@inf.ed.ac.uk>
> To: "Hieu Hoang" <hieuhoang@gmail.com>, "Jasneet Sabharwal" <jasneet.sabharwal@sfu.ca>
> Cc: "moses-support" <moses-support@mit.edu>
> Sent: Saturday, February 20, 2016 6:08:52 PM
> Subject: Re: [Moses-support] Segmentation Fault
>
> Hi Jasneet,
>
> Why don't you use a proper profiling tool, e.g. the one in valgrind [1]?
>
> If you visualize its output [2], you'll see quickly where the program
> spends all the computing time.
>
> Cheers,
> Matthias
>
>
> [1] http://valgrind.org/docs/manual/cl-manual.html
> [2] https://github.com/jrfonseca/gprof2dot
>
>
>
>> On Sat, 2016-02-20 at 09:58 +0000, Hieu Hoang wrote:
>> it's great that you've written a new feature function but you will
>> have to debug it yourself. I suggest you put lots of debugging
>> messages in your code to find out where the problem is.
>>
>> Moses has the Timer class in /moses/Timer.h which you can use to help
>> your debug your problem
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 20 February 2016 at 04:20, Jasneet Sabharwal <
>> jasneet.sabharwal@sfu.ca> wrote:
>>> Hi Hieu,
>>>
>>> Just to provide more info, I had compiled moses using the following
>>> command: "./bjam -j8 -q --with-cmph=/cs/natlang
>>> -user/jasneet/softwares/cmph-2.0/ --with-boost=/cs/natlang
>>> -user/jasneet/softwares/boost/ --max-kenlm-order=8 -a --with-mm -
>>> -with-probing-pt?.
>>>
>>> Following are some more translation times from the logs using the
>>> command:
>>>
>>> $ grep ?Translation took? mert.log
>>>
>>> Line 53: Translation took 9504.886 seconds total
>>> Line 25: Translation took 16931.106 seconds total
>>> Line 20: Translation took 17477.958 seconds total
>>> Line 34: Translation took 18409.183 seconds total
>>> Line 36: Translation took 20495.204 seconds total
>>> Line 48: Translation took 16093.966 seconds total
>>> Line 68: Translation took 4773.139 seconds total
>>> Line 18: Translation took 22165.429 seconds total
>>> Line 10: Translation took 23794.930 seconds total
>>> Line 11: Translation took 26313.130 seconds total
>>> Line 74: Translation took 6238.326 seconds total
>>> Line 66: Translation took 14968.715 seconds total
>>> Line 3: Translation took 28973.902 seconds total
>>> Line 45: Translation took 27619.088 seconds total
>>> Line 81: Translation took 4666.394 seconds total
>>> Line 37: Translation took 36502.892 seconds total
>>> Line 83: Translation took 3143.882 seconds total
>>> Line 70: Translation took 20143.743 seconds total
>>> Line 1: Translation took 38498.391 seconds total
>>> Line 19: Translation took 39683.472 seconds total
>>> Line 15: Translation took 39903.566 seconds total
>>> Line 33: Translation took 40047.447 seconds total
>>>
>>> The times are extremely high and I?m not really sure why it is
>>> taking so much time.
>>>
>>> Regards,
>>> Jasneet
>>>> On Feb 18, 2016, at 11:04 AM, Jasneet Sabharwal <
>>>> jasneet.sabharwal@sfu.ca> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I was able to solve the segmentation fault issue. It was because
>>>> of OOVs. I?m currently trying to tune the parameters using mert,
>>>> but it is running extremely slow. For example, from the logs:
>>>>
>>>> Translating: ?? ? ? ?? ? ? ? ? ? ??????? ? ? ? ? ? ? ?? ? , ? ?
>>>> ?? ?? ?? ??? ? ? ? ?? ? ? ? ??????? ? ?? ?? ?
>>>> Line 43: Initialize search took 0.007 seconds total
>>>> Line 43: Collecting options took 0.191 seconds at
>>>> moses/Manager.cpp:117
>>>> Line 38: Search took 1092.075 seconds
>>>> Line 38: Decision rule took 0.000 seconds total
>>>> Line 38: Additional reporting took 0.041 seconds total
>>>> Line 38: Translation took 1092.132 seconds total
>>>>
>>>> I tried to time the functions in my feature function using
>>>> clock_t but all of them show up as 0.000. I?m not sure why tuning
>>>> is taking too much time. My moses.ini is attached in this email.
>>>>
>>>> Any suggestions would be helpful.
>>>>
>>>> Regards,
>>>> Jasneet
>>>>
>>>> <moses.ini>
>>>>> On Feb 12, 2016, at 3:58 PM, Hieu Hoang <hieuhoang@gmail.com>
>>>>> wrote:
>>>>>
>>>>> I think it's
>>>>> FeatureFunction::GetScoreProducerDescription()
>>>>>
>>>>>> On 12/02/16 23:56, Jasneet Sabharwal wrote:
>>>>>> Thanks, will give that a try.
>>>>>>
>>>>>> Also, is it possible to get the value of feature name inside
>>>>>> the feature function. I?m specifically talking about ?name?
>>>>>> parameter in moses.ini. I?m running multiple copies of my
>>>>>> feature function with different parameter as follows:
>>>>>> CoarseBiLM name=CoarseBiLM tgtWordId...
>>>>>> CoarseBiLM name=CoarseLM100 tgtWordId?
>>>>>> CoarseBiLM name=CoarseLM1600 tgtWordId...
>>>>>> CoarseBiLM name=CoarseBiLMWithoutClustering tgtWordId?
>>>>>>
>>>>>> Thanks,
>>>>>> Jasneet
>>>>>>> On Feb 12, 2016, at 3:39 PM, Hieu Hoang <
>>>>>>> hieuhoang@gmail.com> wrote:
>>>>>>>
>>>>>>> you can run the decoder
>>>>>>> ./moses -v 3
>>>>>>> however, you should put debugging messages in your feature
>>>>>>> functions to find out where the problem is. It looks like
>>>>>>> its in the Load() method so add lots of debugging message
>>>>>>> in there and all functions it calls
>>>>>>>
>>>>>>>> On 12/02/16 23:34, Jasneet Sabharwal wrote:
>>>>>>>> Thanks Hieu for your reply.
>>>>>>>>
>>>>>>>> Is it possible to do a verbose output of what?s
>>>>>>>> happening, so that I can identify when it?s going out of
>>>>>>>> memory? I?m only running it for 1928 sentences. I have
>>>>>>>> almost 170gb of free memory and additional 400gb memory
>>>>>>>> in buffer.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jasneet
>>>>>>>>
>>>>>>>>> On Feb 12, 2016, at 2:36 PM, Hieu Hoang <
>>>>>>>>> hieuhoang@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> looks like it's run out of memory.
>>>>>>>>>
>>>>>>>>>> On 11/02/16 23:23, Jasneet Sabharwal wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I was adding a new feature function in Moses (https:/
>>>>>>>>>> /github.com/KonceptGeek/mosesdecoder/blob/master/mose
>>>>>>>>>> s/FF/CoarseBiLM.cpp). It works fine when I test it
>>>>>>>>>> for 1-2 sentences, but when I?m trying to tune my
>>>>>>>>>> parameters, I?m getting segmentation faults or
>>>>>>>>>> sometimes it is bad_alloc. Following was one of the
>>>>>>>>>> commands that was executed during the tuning process
>>>>>>>>>> which caused the Segmentation Fault or bad_alloc:
>>>>>>>>>>
>>>>>>>>>> moses -threads 40 -v 0 -config filtered/moses.ini
>>>>>>>>>> -weight-overwrite 'CoarseLM100= 0.075758 LM0=
>>>>>>>>>> 0.075758 CoarseBiLMNotClustered= 0.075758
>>>>>>>>>> WordPenalty0= -0.151515 PhrasePenalty0= 0.030303
>>>>>>>>>> CoarseBiLMClustered= 0.075758 TranslationModel0=
>>>>>>>>>> 0.030303 0.030303 0.030303 0.030303 Distortion0=
>>>>>>>>>> 0.045455 CoarseLM1600= 0.075758 LexicalReordering0=
>>>>>>>>>> 0.045455 0.045455 0.045455 0.045455 0.045455
>>>>>>>>>> 0.045455' -n-best-list run1.best100.out 100 distinct
>>>>>>>>>> -input-file tune.word.lc.cn
>>>>>>>>>>
>>>>>>>>>> The log is enclosed in this email.
>>>>>>>>>>
>>>>>>>>>> Any pointers would be very useful.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Jasneet
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Moses-support mailing list
>>>>>>>>>> Moses-support@mit.edu
>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>> --
>>>>>>>>> Hieu Hoang
>>>>>>>>> http://www.hoang.co.uk/hieu
>>>>>>> --
>>>>>>> Hieu Hoang
>>>>>>> http://www.hoang.co.uk/hieu
>>>>> --
>>>>> Hieu Hoang
>>>>> http://www.hoang.co.uk/hieu
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

Message: 3
Date: Mon, 22 Feb 2016 11:01:38 +0100
From: Rajen Chatterjee <rajen.k.chatterjee@gmail.com>
Subject: [Moses-support] CALL FOR PARTICIPATION in the Second
Automatic Post-Editing (APE) shared task
To: "moses-support@mit.edu" <moses-support@mit.edu>, mt-list@eamt.org,
corpora@uib.no, NLP-IP account givem to manshri
<nlp-ai@cse.iitb.ac.in>
Message-ID:
<CAC4-+NyHtN3B17QfaO3nrWfYCxrm=Lc6xg0DNwzTGs_4TA5KDg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

[apologies for cross-posting]

CALL FOR PARTICIPATION

in the

Second Automatic Post-Editing (APE) shared task

at the First Conference on Machine Translation (WMT16)

--------------------------------------------------------------------

OVERVIEW

The second round of the APE shared task (
http://www.statmt.org/wmt16/ape-task.html) follows the first pilot round
organised in 2015. The aim is to examine automatic methods for correcting
errors produced by an unknown machine translation (MT) system. This has to
be done by exploiting knowledge acquired from human post-edits, which are
provided as training material.

This year the task focuses on the Information Technology (IT) domain, in
which English source sentences have been translated into German by an
unknown MT system and then manually post-edited by professional
translators. At training stage, the collected human post-edits have to be
used to learn correction rules for the APE systems. At test stage they will
be used for system evaluation with automatic metrics (TER and BLEU).

--------------------------------------------------------------------

GOALS

The aim of this task is to improve MT output in black-box scenarios, in
which the MT system is used "as is" and cannot be modified. From the
application point of view APE components would make it possible to:

-

Cope with systematic errors of an MT system whose decoding process is
not accessible
-

Provide professional translators with improved MT output quality to
reduce (human) post-editing effort
-

Adapt the output of a general-purpose system to the lexicon/style
requested in a specific application domain

--------------------------------------------------------------------

DATA & EVALUATION

Training, development and test data consist in English-German triplets (source,
target and post-edit) belonging to the IT domain. Training and development
respectively contain 12,000 and 1,000 triplets (available soon), while the
test set 2,000 instances. All data is provided by the EU project QT21 (
http://www.qt21.eu/).

Systems' performance will be evaluated with respect to their capability to
reduce the distance that separates an automatic translation from its
human-revised version. Such distance will be measured in terms of TER,
which will be computed between automatic and human post-edits in
case-sensitive mode. Also BLEU will be taken into consideration as a
secondary evaluation metric.

To gain further insights on final output quality, a subset of the outputs
of the submitted systems will also be manually evaluated.

--------------------------------------------------------------------

DIFFERENCES FROM THE FIRST PILOT ROUND

Compared to the the pilot round, the main differences are:

-

the domain specificity (from news to IT);
-

the target language (from Spanish to German);
-

the post-editors (from crowdsourced workers to professional translators);
-

the evaluation metrics (from case-sensitive/insensitive TER to
case-sensitive
TER and BLEU);
-

the performance analysis (from automatic metrics to automatic metrics
plus manual evaluation).

--------------------------------------------------------------------

IMPORTANT DATES

Release of training data: February 22, 2016

Test set distributed: April 18, 2016

Submission deadline: April 24, 2016

Paper submission deadline: May 8, 2016

Manual evaluation: May 2016

Notification of acceptance: June 5, 2016

Camera-ready deadline: June 22, 2016

For any information or question on the task, please send an email to:
wmt-ape at fbk.eu To be always updated about the APE pilot task, you can
also join the wmt-ape group:
http://groups.google.com/a/fbk.eu/forum/#!forum/wmt-ape

--------------------------------------------------------------------

ORGANIZERS

Rajen Chatterjee (Fondazione Bruno Kessler)

Matteo Negri (Fondazione Bruno Kessler)

Raphael Rubino (Saarland University)

Marco Turchi (Fondazione Bruno Kessler)
Marcos Zampieri (Saarland University)

--
-Regards,
Rajen Chatterjee.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160222/9f117d46/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 112, Issue 40
**********************************************

Moses-support Digest, Vol 112, Issue 40

0 Response to "Moses-support Digest, Vol 112, Issue 40"

Post a Comment