Moses-support Digest, Vol 112, Issue 13

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Problem with processPhraseTableMin (Marcin Junczys-Dowmunt)
2. Re: Problem with processPhraseTableMin (Jeremy Gwinnup)
3. Re: Problem with processPhraseTableMin (Marcin Junczys-Dowmunt)


----------------------------------------------------------------------

Message: 1
Date: Thu, 04 Feb 2016 16:21:50 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Problem with processPhraseTableMin
To: ugermann@inf.ed.ac.uk
Cc: moses-support@mit.edu
Message-ID: <44fbfa50ff228dc8bbaddc1a336abb8e@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"



Hi Uli,
By now we think this particular error might be caused by Jeremy using
the intel compiler instead of g++.

Duplicate entries will cause crashes due to the minimal perfect hash
function used, it should die with an error message about collisions
then. The duplicate entries coming from the sigtest-filter is another
matter I should look into, that should not be happening either.

W dniu 2016-02-04 16:03, Ulrich Germann napisa?(a):

> I've had processPhraseTableMin crash when the phrase table contains duplicate entries (can't remember if there was an unreasonable memory allocation involved). Is Marcin using the exact same phrase table? Can you check if the phrase table has duplicate entries?
>
> To crash or not to crash could also depend on OS and libraries used. You can get the versions of libraries compiled into moses with
>
> moses --version
>
> I've had duplicate entries in the phrase table after running ptable-sigtest-filter, which is Marcin's implementation of Johnson et al.'s significance filtering that I pulled in from his WIPO branch; compile with --with-mm --with-mm-extras to get it compiled.
>
> - Uli
>
> On Wed, Feb 3, 2016 at 12:01 PM, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:
>
>> Weird.
>>
>> Jeremy, I binarized your phrase-table a couple of times with different
>> commits (also the most recent one), and I cannot reproduce the error.
>> Try maybe -threads 10 or 12.
>> I can make the binarized versions available for download.
>>
>> W dniu 02.02.2016 [1] o 18:21, Marcin Junczys-Dowmunt pisze:
>>> Looks fine, I had no problems running it with 18 and more domain
>>> indicators. Your machine is certainly more than suitable. Just one
>>> remark, using more than 8-12 threads usually slows things down, but
>>> should not cause crashes. Any chance to have a look at that table?
>>>
>>> W dniu 02.02.2016 [1] o 18:16, Jeremy Gwinnup pisze:
>>>> Marcin,
>>>>
>>>> I was able to use -T with processLexicalTableMin successfully. I also tried processPhraseTableMin using a local tmp dir with 200G free and it still crashed at step 3 with the huge malloc message. Phrase table is nothing fancy - just standard 4 scores and 3 domain indicator features. Here's a complete output with more info about the phrase table:
>>>>
>>>> Phrase table in question:
>>>>
>>>> -rw-rw-r-- 1 jgwinnup scream 2.2G Feb 1 23:58 phrase-table.1.gz
>>>>
>>>> Machine in question has 1TB RAM/32 cores - should be more than enough for the jobe
>>
>>>>
>>>> Moses git-rev ends with: 80572b4 (Jan. 27)
>>>>
>>>> 1tqoct1:model> $MOSES/bin/processPhraseTableMin -in phrase-table.1.gz -out phrase-table.1 -threads all -nscores 7 -T /tmp_with_200G_free
>>>> WARNING: You are using a nonstandard number of scores (7) with PREnc. Set the index of P(t|s) with -rankscore int if it is not 2.
>>>> Used options:
>>>> Text phrase table will be read from: phrase-table.1.gz
>>>> Output phrase table will be written to: phrase-table.1.minphr
>>>> Step size for source landmark phrases: 2^10=1024
>>>> Source phrase fingerprint size: 16 bits / P(fp)=1.52588e-05
>>>> Selected target phrase encoding: Huffman + PREnc
>>>> Maxiumum allowed rank for PREnc: 100
>>>> Number of score components in phrase table: 7
>>>> Single Huffman code set for score components: no
>>>> Using score quantization: no
>>>> Explicitly included alignment information: yes
>>>> Running with 32 threads
>>>>
>>>> Pass 1/3: Creating hash function for rank assignment
>>>> ..................................................[5000000]
>>>> ..................................................[10000000]
>>>> ..................................................[15000000]
>>>> ..................................................[20000000]
>>>> ..................................................[25000000]
>>>> ..................................................[30000000]
>>>> ..................................................[35000000]
>>>> ..................................................[40000000]
>>>> ..................................................[45000000]
>>>> ....
>>>>
>>>> Pass 2/3: Creating source phrase index + Encoding target phrases
>>>> ..................................................[5000000]
>>>> ..................................................[10000000]
>>>> ..................................................[15000000]
>>>> ..................................................[20000000]
>>>> ..................................................[25000000]
>>>> ..................................................[30000000]
>>>> ..................................................[35000000]
>>>> ..................................................[40000000]
>>>> ..................................................[45000000]
>>>> ....
>>>>
>>>> Intermezzo: Calculating Huffman code sets
>>>> Creating Huffman codes for 471366 target phrase symbols
>>>> tcmalloc: large alloc 13808820224 bytes == 0xb0592000 @
>>>> tcmalloc: large alloc 27617640448 bytes == 0x3e86b0000 @
>>>> tcmalloc: large alloc 5187358422106112 bytes == (nil) @
>>>> terminate called after throwing an instance of 'std::bad_alloc'
>>>> what(): std::bad_alloc
>>>>
>>>>
>>>>
>>>>
>>>>> On Feb 2, 2016, at 10:21 AM, Jeremy Gwinnup <jeremy@gwinnup.org> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm having a problem using processPhraseTableMin to compress a phrase table with 7 scores - the program consistently coredumps at step 3 - command and relevant output below. Is there anything I'm doing glaringly wrong?
>>>>>
>>>>> Thanks!
>>>>> -Jeremy
>>>>>
>>>>> Command:
>>>>>
>>>>> 1tqoct1:model> $MOSES/bin/processPhraseTableMin -in phrase-table.1.gz -out phrase-table.1 -threads all -nscores 7
>>>>>
>>>>> Once we get to step 3:
>>>>>
>>>>> Intermezzo: Calculating Huffman code sets
>>>>> Creating Huffman codes for 471366 target phrase symbols
>>>>> tcmalloc: large alloc 13983629312 bytes == 0xb14ce000 @
>>>>> tcmalloc: large alloc 27967250432 bytes == 0x3f3ca4000 @
>>>>> tcmalloc: large alloc 15681406635450368 bytes == (nil) @
>>>>> terminate called after throwing an instance of 'std::bad_alloc'
>>>>> what(): std::bad_alloc
>>>>>
>>>>> Top looked like this when the program ran into trouble:
>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>>>> 27416 jgwinnup 20 0 45.9g 30g 4.0g R 10.6 3.0 1589:17 processPhraseTa
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support [2]
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support [2]
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support [2]
>
> --
>
> Ulrich Germann
> Senior Researcher
> School of Informatics
>
> University of Edinburgh



Links:
------
[1] tel:02.02.2016
[2] http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160204/f35f636b/attachment-0001.html

------------------------------

Message: 2
Date: Thu, 4 Feb 2016 10:37:21 -0500
From: Jeremy Gwinnup <jeremy@gwinnup.org>
Subject: Re: [Moses-support] Problem with processPhraseTableMin
To: moses-support@mit.edu
Message-ID: <BF4C0779-2811-42B4-9171-81190D52B7E2@gwinnup.org>
Content-Type: text/plain; charset=utf-8

Uli,

I sent the phrase-table to Marcin yesterday to test - He was able to binarize the table successfully. Here, we?ve been compiling moses with the Intel compiler. We built the same checkout with gcc and using processPhraseTableMin from that build we were able to successfully binarize the phrase table.

One thing I saw during testing these different configs was the intel-compiled version would output tcmalloc debug messages, but the gcc-compiled one would not. We?re using tcmalloc-minimal for these builds. Should we be using the full version?

Running moses ?version on both builds shows Boost 1.54, Xmlrpc-c 1.33.17 and CMPH (version unknown) linked in. We compile static binaries on a RHEL 6-based distro (Scientific Linux 6.7)

-Jeremy


> Message: 2
> Date: Thu, 4 Feb 2016 15:03:02 +0000
> From: Ulrich Germann <ulrich.germann@gmail.com>
> Subject: Re: [Moses-support] Problem with processPhraseTableMin
> To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
> Cc: "moses-support@mit.edu" <moses-support@mit.edu>
> Message-ID:
> <CAHQSRUq_gtrCUBkzwMZpVMKYPORmyGsE4sW-4rYBS_jzML1tWA@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I've had processPhraseTableMin crash when the phrase table contains
> duplicate entries (can't remember if there was an unreasonable memory
> allocation involved). Is Marcin using the exact same phrase table? Can you
> check if the phrase table has duplicate entries?
>
> To crash or not to crash could also depend on OS and libraries used. You
> can get the versions of libraries compiled into moses with
>
> moses --version
>
> I've had duplicate entries in the phrase table after running
> ptable-sigtest-filter, which is Marcin's implementation of Johnson et al.'s
> significance filtering that I pulled in from his WIPO branch; compile with
> --with-mm --with-mm-extras to get it compiled.
>
> - Uli




------------------------------

Message: 3
Date: Thu, 04 Feb 2016 16:47:43 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Problem with processPhraseTableMin
To: Jeremy Gwinnup <jeremy@gwinnup.org>
Cc: moses-support@mit.edu
Message-ID: <3efec4f2b659ba2ebda8570436d9679e@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"



I've been using it with and without tcmalloc with no problems, the part
where it crashes for is not multi-threading anyway. I guess it's the
intel compiler, no idea why though.

W dniu 2016-02-04 16:37, Jeremy Gwinnup napisa?(a):

> Uli,
>
> I sent the phrase-table to Marcin yesterday to test - He was able to binarize the table successfully. Here, we've been compiling moses with the Intel compiler. We built the same checkout with gcc and using processPhraseTableMin from that build we were able to successfully binarize the phrase table.
>
> One thing I saw during testing these different configs was the intel-compiled version would output tcmalloc debug messages, but the gcc-compiled one would not. We're using tcmalloc-minimal for these builds. Should we be using the full version?
>
> Running moses --version on both builds shows Boost 1.54, Xmlrpc-c 1.33.17 and CMPH (version unknown) linked in. We compile static binaries on a RHEL 6-based distro (Scientific Linux 6.7)
>
> -Jeremy
>
>> Message: 2 Date: Thu, 4 Feb 2016 15:03:02 +0000 From: Ulrich Germann <ulrich.germann@gmail.com> Subject: Re: [Moses-support] Problem with processPhraseTableMin To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl> Cc: "moses-support@mit.edu" <moses-support@mit.edu> Message-ID: <CAHQSRUq_gtrCUBkzwMZpVMKYPORmyGsE4sW-4rYBS_jzML1tWA@mail.gmail.com> Content-Type: text/plain; charset="utf-8" I've had processPhraseTableMin crash when the phrase table contains duplicate entries (can't remember if there was an unreasonable memory allocation involved). Is Marcin using the exact same phrase table? Can you check if the phrase table has duplicate entries? To crash or not to crash could also depend on OS and libraries used. You can get the versions of libraries compiled into moses with moses --version I've had duplicate entries in the phrase table after running ptable-sigtest-filter, which is Marcin's implementation of Johnson et al.'s significance filtering that I pulled in from his WIPO br!
anch;
compile with --with-mm --with-mm-extras to get it compiled. - Uli
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support [1]



Links:
------
[1] http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160204/33e74030/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 112, Issue 13
**********************************************

0 Response to "Moses-support Digest, Vol 112, Issue 13"

Post a Comment