Moses-support Digest, Vol 112, Issue 37

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Is memory mapping lazy? (Lane Schwartz)
2. Re: Is memory mapping lazy? (Marcin Junczys-Dowmunt)
3. Re: Is memory mapping lazy? (Kenneth Heafield)
4. Re: Is memory mapping lazy? (Kenneth Heafield)
5. Re: Segmentation Fault (Jasneet Sabharwal)


----------------------------------------------------------------------

Message: 1
Date: Fri, 19 Feb 2016 17:29:15 -0600
From: Lane Schwartz <dowobeha@gmail.com>
Subject: [Moses-support] Is memory mapping lazy?
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CABv3vZnWobR0SEVX8y76cMvhrnd7AKKS15zSMX6-TcvMtDswKg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hey,

This is mostly addressed to Kenneth, since as far as I know he's the author
of the data structures involved.

I have access to a cluster at the University of Illinois. The cluster here
uses GPFS as its file system.

I've observed that when running moses, especially with lots of threads,
that the threads spend virtually all of their time at near 0% CPU usage, in
D (uninterruptible sleep, awaiting IO) status. When I copy my model files
and config file to scratch space on local disk (and cat $file > /dev/null
each model file), this issue disappears. It appears that doing cat $file >
/dev/null on GPFS does not load the file into RAM in the same way that
doing so appears to do on other file systems.

I spent quite a bit of time today with three cluster admins / disk
engineers trying to debug this problem.

Their ultimate solution was for me to cp each $file from GPFS to /dev/shm,
which as far as I can tell acts like a RAM disk. Doing so resolves the
issue.

Their best estimate of the problem is that moses (from their perspective)
appeared to (for each thread) ask the file system for access to data that's
present in the model files, causing a new disk read (with a corresponding
disk lock) every time. They believe that this issue is not present with
local disk because the cat $file > /dev/null is pre-loading each file into
RAM in that case, but is not doing so with GPFS. Thus the threads are
(according to this theory) getting bogged down by disk locks.

I was puzzled by this, because I thought that the probing data structure
underlying the LM and the phrase table used memory mapping. I had (perhaps
naively) assumed that when the memory mapping is initiated, the OS actively
loaded all of the file contents into appropriate VM pages. Now the question
is, is the memory mapping actually acting lazily, only loading data from
disk on an as-needed basis? If so, that could potentially explain the
horrific disk delays that I'm encountering. And if so, then one question
is, is it possible to alter the behavior of the memory mapping such that
when the memory map is initiated, it actually does active load the entire
file into memory?

Thanks,
Lane
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160219/8fbe45c5/attachment-0001.html

------------------------------

Message: 2
Date: Sat, 20 Feb 2016 00:34:10 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Is memory mapping lazy?
To: moses-support@mit.edu
Message-ID: <56C7A672.2070508@amu.edu.pl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Lane,
For the compact phrase table and reordering table you can use
--minphr-memory and --minlexr-memory respectively. That will disable
memory mapping entirely and just read both into RAM.
Best,
Marcin

On 20.02.2016 00:29, Lane Schwartz wrote:
> Hey,
>
> This is mostly addressed to Kenneth, since as far as I know he's the
> author of the data structures involved.
>
> I have access to a cluster at the University of Illinois. The cluster
> here uses GPFS as its file system.
>
> I've observed that when running moses, especially with lots of
> threads, that the threads spend virtually all of their time at near 0%
> CPU usage, in D (uninterruptible sleep, awaiting IO) status. When I
> copy my model files and config file to scratch space on local disk
> (and cat $file > /dev/null each model file), this issue disappears. It
> appears that doing cat $file > /dev/null on GPFS does not load the
> file into RAM in the same way that doing so appears to do on other
> file systems.
>
> I spent quite a bit of time today with three cluster admins / disk
> engineers trying to debug this problem.
>
> Their ultimate solution was for me to cp each $file from GPFS to
> /dev/shm, which as far as I can tell acts like a RAM disk. Doing so
> resolves the issue.
>
> Their best estimate of the problem is that moses (from their
> perspective) appeared to (for each thread) ask the file system for
> access to data that's present in the model files, causing a new disk
> read (with a corresponding disk lock) every time. They believe that
> this issue is not present with local disk because the cat $file >
> /dev/null is pre-loading each file into RAM in that case, but is not
> doing so with GPFS. Thus the threads are (according to this theory)
> getting bogged down by disk locks.
>
> I was puzzled by this, because I thought that the probing data
> structure underlying the LM and the phrase table used memory mapping.
> I had (perhaps naively) assumed that when the memory mapping is
> initiated, the OS actively loaded all of the file contents into
> appropriate VM pages. Now the question is, is the memory mapping
> actually acting lazily, only loading data from disk on an as-needed
> basis? If so, that could potentially explain the horrific disk delays
> that I'm encountering. And if so, then one question is, is it possible
> to alter the behavior of the memory mapping such that when the memory
> map is initiated, it actually does active load the entire file into
> memory?
>
> Thanks,
> Lane
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



------------------------------

Message: 3
Date: Fri, 19 Feb 2016 23:38:51 +0000
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Is memory mapping lazy?
To: Lane Schwartz <dowobeha@gmail.com>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID: <56C7A78B.10607@kheafield.com>
Content-Type: text/plain; charset=utf-8

Hi,

The default is mmap with MAP_POPULATE (see man mmap). As to whether
GPFS implements MAP_POPULATE correctly, I defer to the former IBM
employee.

KenLM implements the following options via config.load_method:

typedef enum {
// mmap with no prepopulate
LAZY,
// On linux, pass MAP_POPULATE to mmap.
POPULATE_OR_LAZY,
// Populate on Linux. malloc and read on non-Linux.
POPULATE_OR_READ,
// malloc and read.
READ,
// malloc and read in parallel (recommended for Lustre)
PARALLEL_READ,
} LoadMethod;

However, Moses currently has "lazyken" as a true/false flag. false maps
to POPULATE_OR_LAZY. true maps to LAZY. This should be refactored in
moses/LM/Ken.cpp lines 503 and 538 to expose all the options in the enum.

It's worth noting that the kernel preferentially evicts mmapped data
under swap pressure, which is probably not the behavior you want for a
network filesystem.

Another thing to note is that huge page functionality with mmapped files
is a mess on linux (you really have to be root and setup hugetlbfs).
However, the malloc and read approaches are compatible with transparent
huge pages (and my code even aligns to a 1 GB boundary now), so
malloc+read results in faster queries.

Kenneth

On 02/19/2016 11:29 PM, Lane Schwartz wrote:
> Hey,
>
> This is mostly addressed to Kenneth, since as far as I know he's the
> author of the data structures involved.
>
> I have access to a cluster at the University of Illinois. The cluster
> here uses GPFS as its file system.
>
> I've observed that when running moses, especially with lots of threads,
> that the threads spend virtually all of their time at near 0% CPU usage,
> in D (uninterruptible sleep, awaiting IO) status. When I copy my model
> files and config file to scratch space on local disk (and cat $file >
> /dev/null each model file), this issue disappears. It appears that doing
> cat $file > /dev/null on GPFS does not load the file into RAM in the
> same way that doing so appears to do on other file systems.
>
> I spent quite a bit of time today with three cluster admins / disk
> engineers trying to debug this problem.
>
> Their ultimate solution was for me to cp each $file from GPFS to
> /dev/shm, which as far as I can tell acts like a RAM disk. Doing so
> resolves the issue.
>
> Their best estimate of the problem is that moses (from their
> perspective) appeared to (for each thread) ask the file system for
> access to data that's present in the model files, causing a new disk
> read (with a corresponding disk lock) every time. They believe that this
> issue is not present with local disk because the cat $file > /dev/null
> is pre-loading each file into RAM in that case, but is not doing so with
> GPFS. Thus the threads are (according to this theory) getting bogged
> down by disk locks.
>
> I was puzzled by this, because I thought that the probing data structure
> underlying the LM and the phrase table used memory mapping. I had
> (perhaps naively) assumed that when the memory mapping is initiated, the
> OS actively loaded all of the file contents into appropriate VM pages.
> Now the question is, is the memory mapping actually acting lazily, only
> loading data from disk on an as-needed basis? If so, that could
> potentially explain the horrific disk delays that I'm encountering. And
> if so, then one question is, is it possible to alter the behavior of the
> memory mapping such that when the memory map is initiated, it actually
> does active load the entire file into memory?
>
> Thanks,
> Lane
>
>
>


------------------------------

Message: 4
Date: Sat, 20 Feb 2016 00:13:22 +0000
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Is memory mapping lazy?
To: moses-support@mit.edu
Message-ID: <56C7AFA2.3010500@kheafield.com>
Content-Type: text/plain; charset=windows-1252

Added load= option in 7a1baee, deprecating lazy=. Valid load= options
are the lowercase version of the enum shown below.

There's copies of the loading code in Backward.cpp and Reordering.h that
blames back to Lane. I've put ? : hacks in and hope he'll pay the cost
for code copying.

Kenneth

On 02/19/2016 11:38 PM, Kenneth Heafield wrote:
> Hi,
>
> The default is mmap with MAP_POPULATE (see man mmap). As to whether
> GPFS implements MAP_POPULATE correctly, I defer to the former IBM
> employee.
>
> KenLM implements the following options via config.load_method:
>
> typedef enum {
> // mmap with no prepopulate
> LAZY,
> // On linux, pass MAP_POPULATE to mmap.
> POPULATE_OR_LAZY,
> // Populate on Linux. malloc and read on non-Linux.
> POPULATE_OR_READ,
> // malloc and read.
> READ,
> // malloc and read in parallel (recommended for Lustre)
> PARALLEL_READ,
> } LoadMethod;
>
> However, Moses currently has "lazyken" as a true/false flag. false maps
> to POPULATE_OR_LAZY. true maps to LAZY. This should be refactored in
> moses/LM/Ken.cpp lines 503 and 538 to expose all the options in the enum.
>
> It's worth noting that the kernel preferentially evicts mmapped data
> under swap pressure, which is probably not the behavior you want for a
> network filesystem.
>
> Another thing to note is that huge page functionality with mmapped files
> is a mess on linux (you really have to be root and setup hugetlbfs).
> However, the malloc and read approaches are compatible with transparent
> huge pages (and my code even aligns to a 1 GB boundary now), so
> malloc+read results in faster queries.
>
> Kenneth
>
> On 02/19/2016 11:29 PM, Lane Schwartz wrote:
>> Hey,
>>
>> This is mostly addressed to Kenneth, since as far as I know he's the
>> author of the data structures involved.
>>
>> I have access to a cluster at the University of Illinois. The cluster
>> here uses GPFS as its file system.
>>
>> I've observed that when running moses, especially with lots of threads,
>> that the threads spend virtually all of their time at near 0% CPU usage,
>> in D (uninterruptible sleep, awaiting IO) status. When I copy my model
>> files and config file to scratch space on local disk (and cat $file >
>> /dev/null each model file), this issue disappears. It appears that doing
>> cat $file > /dev/null on GPFS does not load the file into RAM in the
>> same way that doing so appears to do on other file systems.
>>
>> I spent quite a bit of time today with three cluster admins / disk
>> engineers trying to debug this problem.
>>
>> Their ultimate solution was for me to cp each $file from GPFS to
>> /dev/shm, which as far as I can tell acts like a RAM disk. Doing so
>> resolves the issue.
>>
>> Their best estimate of the problem is that moses (from their
>> perspective) appeared to (for each thread) ask the file system for
>> access to data that's present in the model files, causing a new disk
>> read (with a corresponding disk lock) every time. They believe that this
>> issue is not present with local disk because the cat $file > /dev/null
>> is pre-loading each file into RAM in that case, but is not doing so with
>> GPFS. Thus the threads are (according to this theory) getting bogged
>> down by disk locks.
>>
>> I was puzzled by this, because I thought that the probing data structure
>> underlying the LM and the phrase table used memory mapping. I had
>> (perhaps naively) assumed that when the memory mapping is initiated, the
>> OS actively loaded all of the file contents into appropriate VM pages.
>> Now the question is, is the memory mapping actually acting lazily, only
>> loading data from disk on an as-needed basis? If so, that could
>> potentially explain the horrific disk delays that I'm encountering. And
>> if so, then one question is, is it possible to alter the behavior of the
>> memory mapping such that when the memory map is initiated, it actually
>> does active load the entire file into memory?
>>
>> Thanks,
>> Lane
>>
>>
>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

Message: 5
Date: Fri, 19 Feb 2016 20:20:53 -0800
From: Jasneet Sabharwal <jasneet.sabharwal@sfu.ca>
Subject: Re: [Moses-support] Segmentation Fault
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <B7A96E4A-54F7-4325-ADA0-A4BCFC7AB9B7@sfu.ca>
Content-Type: text/plain; charset="utf-8"

Hi Hieu,

Just to provide more info, I had compiled moses using the following command: "./bjam -j8 -q --with-cmph=/cs/natlang-user/jasneet/softwares/cmph-2.0/ --with-boost=/cs/natlang-user/jasneet/softwares/boost/ --max-kenlm-order=8 -a --with-mm --with-probing-pt?.

Following are some more translation times from the logs using the command:

$ grep ?Translation took? mert.log

Line 53: Translation took 9504.886 seconds total
Line 25: Translation took 16931.106 seconds total
Line 20: Translation took 17477.958 seconds total
Line 34: Translation took 18409.183 seconds total
Line 36: Translation took 20495.204 seconds total
Line 48: Translation took 16093.966 seconds total
Line 68: Translation took 4773.139 seconds total
Line 18: Translation took 22165.429 seconds total
Line 10: Translation took 23794.930 seconds total
Line 11: Translation took 26313.130 seconds total
Line 74: Translation took 6238.326 seconds total
Line 66: Translation took 14968.715 seconds total
Line 3: Translation took 28973.902 seconds total
Line 45: Translation took 27619.088 seconds total
Line 81: Translation took 4666.394 seconds total
Line 37: Translation took 36502.892 seconds total
Line 83: Translation took 3143.882 seconds total
Line 70: Translation took 20143.743 seconds total
Line 1: Translation took 38498.391 seconds total
Line 19: Translation took 39683.472 seconds total
Line 15: Translation took 39903.566 seconds total
Line 33: Translation took 40047.447 seconds total

The times are extremely high and I?m not really sure why it is taking so much time.

Regards,
Jasneet
> On Feb 18, 2016, at 11:04 AM, Jasneet Sabharwal <jasneet.sabharwal@sfu.ca> wrote:
>
> Hi,
>
> I was able to solve the segmentation fault issue. It was because of OOVs. I?m currently trying to tune the parameters using mert, but it is running extremely slow. For example, from the logs:
>
> Translating: ?? ? ? ?? ? ? ? ? ? ??????? ? ? ? ? ? ? ?? ? , ? ? ?? ?? ?? ??? ? ? ? ?? ? ? ? ??????? ? ?? ?? ?
> Line 43: Initialize search took 0.007 seconds total
> Line 43: Collecting options took 0.191 seconds at moses/Manager.cpp:117
> Line 38: Search took 1092.075 seconds
> Line 38: Decision rule took 0.000 seconds total
> Line 38: Additional reporting took 0.041 seconds total
> Line 38: Translation took 1092.132 seconds total
>
> I tried to time the functions in my feature function <https://github.com/KonceptGeek/mosesdecoder/blob/master/moses/FF/CoarseBiLM.cpp> using clock_t but all of them show up as 0.000. I?m not sure why tuning is taking too much time. My moses.ini is attached in this email.
>
> Any suggestions would be helpful.
>
> Regards,
> Jasneet
>
> <moses.ini>
>> On Feb 12, 2016, at 3:58 PM, Hieu Hoang <hieuhoang@gmail.com <mailto:hieuhoang@gmail.com>> wrote:
>>
>> I think it's
>> FeatureFunction::GetScoreProducerDescription()
>>
>> On 12/02/16 23:56, Jasneet Sabharwal wrote:
>>> Thanks, will give that a try.
>>>
>>> Also, is it possible to get the value of feature name inside the feature function. I?m specifically talking about ?name? parameter in moses.ini. I?m running multiple copies of my feature function with different parameter as follows:
>>> CoarseBiLM name=CoarseBiLM tgtWordId...
>>> CoarseBiLM name=CoarseLM100 tgtWordId?
>>> CoarseBiLM name=CoarseLM1600 tgtWordId...
>>> CoarseBiLM name=CoarseBiLMWithoutClustering tgtWordId?
>>>
>>> Thanks,
>>> Jasneet
>>>> On Feb 12, 2016, at 3:39 PM, Hieu Hoang < <mailto:hieuhoang@gmail.com>hieuhoang@gmail.com <mailto:hieuhoang@gmail.com>> wrote:
>>>>
>>>> you can run the decoder
>>>> ./moses -v 3
>>>> however, you should put debugging messages in your feature functions to find out where the problem is. It looks like its in the Load() method so add lots of debugging message in there and all functions it calls
>>>>
>>>> On 12/02/16 23:34, Jasneet Sabharwal wrote:
>>>>> Thanks Hieu for your reply.
>>>>>
>>>>> Is it possible to do a verbose output of what?s happening, so that I can identify when it?s going out of memory? I?m only running it for 1928 sentences. I have almost 170gb of free memory and additional 400gb memory in buffer.
>>>>>
>>>>> Thanks,
>>>>> Jasneet
>>>>>
>>>>>> On Feb 12, 2016, at 2:36 PM, Hieu Hoang <hieuhoang@gmail.com <mailto:hieuhoang@gmail.com>> wrote:
>>>>>>
>>>>>> looks like it's run out of memory.
>>>>>>
>>>>>> On 11/02/16 23:23, Jasneet Sabharwal wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was adding a new feature function in Moses ( <https://github.com/KonceptGeek/mosesdecoder/blob/master/moses/FF/CoarseBiLM.cpp>https://github.com/KonceptGeek/mosesdecoder/blob/master/moses/FF/CoarseBiLM.cpp <https://github.com/KonceptGeek/mosesdecoder/blob/master/moses/FF/CoarseBiLM.cpp>). It works fine when I test it for 1-2 sentences, but when I?m trying to tune my parameters, I?m getting segmentation faults or sometimes it is bad_alloc. Following was one of the commands that was executed during the tuning process which caused the Segmentation Fault or bad_alloc:
>>>>>>>
>>>>>>> moses -threads 40 -v 0 -config filtered/moses.ini -weight-overwrite 'CoarseLM100= 0.075758 LM0= 0.075758 CoarseBiLMNotClustered= 0.075758 WordPenalty0= -0.151515 PhrasePenalty0= 0.030303 CoarseBiLMClustered= 0.075758 TranslationModel0= 0.030303 0.030303 0.030303 0.030303 Distortion0= 0.045455 CoarseLM1600= 0.075758 LexicalReordering0= 0.045455 0.045455 0.045455 0.045455 0.045455 0.045455' -n-best-list run1.best100.out 100 distinct -input-file tune.word.lc.cn <http://tune.word.lc.cn/>
>>>>>>>
>>>>>>> The log is enclosed in this email.
>>>>>>>
>>>>>>> Any pointers would be very useful.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jasneet
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>>>>>
>>>>>> --
>>>>>> Hieu Hoang
>>>>>> http://www.hoang.co.uk/hieu <http://www.hoang.co.uk/hieu>
>>>>
>>>> --
>>>> Hieu Hoang
>>>> http://www.hoang.co.uk/hieu <http://www.hoang.co.uk/hieu>
>>
>> --
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu <http://www.hoang.co.uk/hieu>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160219/7e3e0787/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 112, Issue 37
**********************************************

0 Response to "Moses-support Digest, Vol 112, Issue 37"

Post a Comment