Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Error in factored models, get-corpus crashed (Philipp Koehn)
2. Re: Problem with processPhraseTableMin (Ulrich Germann)
----------------------------------------------------------------------
Message: 1
Date: Wed, 3 Feb 2016 12:23:01 -0500
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] Error in factored models, get-corpus
crashed
To: Sunayana Gawde <sunayanagawde17@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDC4nKkqmZhZNOXS56uD4eNi4eUFr4b2br4A49g98M1c8g@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
the "get-corpus" option of specifying a parallel corpus is useful, if you
have a script that generates the corpus.
The script has to take the three parameters:
- stem of the file names where the corpus will be stored
- input extension
- output extension
If this crashes, take a look at
/home/development/sunayana/POS-eng-kon/steps/3/CORPUS_train1_get-corpus.3.STDERR
to see what went wrong.
-phi
On Wed, Feb 3, 2016 at 3:32 AM, Sunayana Gawde <sunayanagawde17@gmail.com>
wrote:
> I am developing a MT system for English to Konkani and my corpus has POS
> tags to each word. I am using the same config file from statmt.org
> website and after doing necessary changes in it, i run this command:
>
> nohup nice /usr/local/bin/smt/mosesdecoder-3.0/scripts/ems/experiment.perl
> -config config.en-kn -exec &> log &
>
> But then when i check log file, i see this error:
>
> EXECUTE STEPS
> number of steps doable or running: 1 at Tue Feb 2 19:11:28 IST 2016
> doable: CORPUS:train1:get-corpus
> executing
> /home/development/sunayana/POS-eng-kon/steps/3/CORPUS_train1_get-corpus.3
> via sh (1 active)
> step CORPUS:train1:get-corpus crashed
> number of steps doable or running: 0 at Tue Feb 2 19:11:35 IST 2016
>
> Please tell me how to remove this error and run my system successfully.
>
> thanks
>
> --
> *Regards*
>
> Ms. Sunayana R. Gawde.
>
> DCST, Goa University.
> * P**leas**e don't print t**his e-mail unles**s you really need to.*
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160203/b981e225/attachment-0001.html
------------------------------
Message: 2
Date: Thu, 4 Feb 2016 15:03:02 +0000
From: Ulrich Germann <ulrich.germann@gmail.com>
Subject: Re: [Moses-support] Problem with processPhraseTableMin
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAHQSRUq_gtrCUBkzwMZpVMKYPORmyGsE4sW-4rYBS_jzML1tWA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
I've had processPhraseTableMin crash when the phrase table contains
duplicate entries (can't remember if there was an unreasonable memory
allocation involved). Is Marcin using the exact same phrase table? Can you
check if the phrase table has duplicate entries?
To crash or not to crash could also depend on OS and libraries used. You
can get the versions of libraries compiled into moses with
moses --version
I've had duplicate entries in the phrase table after running
ptable-sigtest-filter, which is Marcin's implementation of Johnson et al.'s
significance filtering that I pulled in from his WIPO branch; compile with
--with-mm --with-mm-extras to get it compiled.
- Uli
On Wed, Feb 3, 2016 at 12:01 PM, Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
wrote:
> Weird.
>
> Jeremy, I binarized your phrase-table a couple of times with different
> commits (also the most recent one), and I cannot reproduce the error.
> Try maybe -threads 10 or 12.
> I can make the binarized versions available for download.
>
> W dniu 02.02.2016 o 18:21, Marcin Junczys-Dowmunt pisze:
> > Looks fine, I had no problems running it with 18 and more domain
> > indicators. Your machine is certainly more than suitable. Just one
> > remark, using more than 8-12 threads usually slows things down, but
> > should not cause crashes. Any chance to have a look at that table?
> >
> > W dniu 02.02.2016 o 18:16, Jeremy Gwinnup pisze:
> >> Marcin,
> >>
> >> I was able to use -T with processLexicalTableMin successfully. I also
> tried processPhraseTableMin using a local tmp dir with 200G free and it
> still crashed at step 3 with the huge malloc message. Phrase table is
> nothing fancy - just standard 4 scores and 3 domain indicator features.
> Here?s a complete output with more info about the phrase table:
> >>
> >> Phrase table in question:
> >>
> >> -rw-rw-r-- 1 jgwinnup scream 2.2G Feb 1 23:58 phrase-table.1.gz
> >>
> >> Machine in question has 1TB RAM/32 cores - should be more than enough
> for the jobe
> >>
> >> Moses git-rev ends with: 80572b4 (Jan. 27)
> >>
> >> 1tqoct1:model> $MOSES/bin/processPhraseTableMin -in phrase-table.1.gz
> -out phrase-table.1 -threads all -nscores 7 -T /tmp_with_200G_free
> >> WARNING: You are using a nonstandard number of scores (7) with PREnc.
> Set the index of P(t|s) with -rankscore int if it is not 2.
> >> Used options:
> >> Text phrase table will be read from: phrase-table.1.gz
> >> Output phrase table will be written to: phrase-table.1.minphr
> >> Step size for source landmark phrases: 2^10=1024
> >> Source phrase fingerprint size: 16 bits / P(fp)=1.52588e-05
> >> Selected target phrase encoding: Huffman + PREnc
> >> Maxiumum allowed rank for PREnc: 100
> >> Number of score components in phrase table: 7
> >> Single Huffman code set for score components: no
> >> Using score quantization: no
> >> Explicitly included alignment information: yes
> >> Running with 32 threads
> >>
> >> Pass 1/3: Creating hash function for rank assignment
> >> ..................................................[5000000]
> >> ..................................................[10000000]
> >> ..................................................[15000000]
> >> ..................................................[20000000]
> >> ..................................................[25000000]
> >> ..................................................[30000000]
> >> ..................................................[35000000]
> >> ..................................................[40000000]
> >> ..................................................[45000000]
> >> ....
> >>
> >> Pass 2/3: Creating source phrase index + Encoding target phrases
> >> ..................................................[5000000]
> >> ..................................................[10000000]
> >> ..................................................[15000000]
> >> ..................................................[20000000]
> >> ..................................................[25000000]
> >> ..................................................[30000000]
> >> ..................................................[35000000]
> >> ..................................................[40000000]
> >> ..................................................[45000000]
> >> ....
> >>
> >> Intermezzo: Calculating Huffman code sets
> >> Creating Huffman codes for 471366 target phrase symbols
> >> tcmalloc: large alloc 13808820224 bytes == 0xb0592000 @
> >> tcmalloc: large alloc 27617640448 bytes == 0x3e86b0000 @
> >> tcmalloc: large alloc 5187358422106112 bytes == (nil) @
> >> terminate called after throwing an instance of 'std::bad_alloc'
> >> what(): std::bad_alloc
> >>
> >>
> >>
> >>
> >>> On Feb 2, 2016, at 10:21 AM, Jeremy Gwinnup <jeremy@gwinnup.org>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I?m having a problem using processPhraseTableMin to compress a phrase
> table with 7 scores - the program consistently coredumps at step 3 -
> command and relevant output below. Is there anything I?m doing glaringly
> wrong?
> >>>
> >>> Thanks!
> >>> -Jeremy
> >>>
> >>> Command:
> >>>
> >>> 1tqoct1:model> $MOSES/bin/processPhraseTableMin -in phrase-table.1.gz
> -out phrase-table.1 -threads all -nscores 7
> >>>
> >>> Once we get to step 3:
> >>>
> >>> Intermezzo: Calculating Huffman code sets
> >>> Creating Huffman codes for 471366 target phrase symbols
> >>> tcmalloc: large alloc 13983629312 bytes == 0xb14ce000 @
> >>> tcmalloc: large alloc 27967250432 bytes == 0x3f3ca4000 @
> >>> tcmalloc: large alloc 15681406635450368 bytes == (nil) @
> >>> terminate called after throwing an instance of 'std::bad_alloc'
> >>> what(): std::bad_alloc
> >>>
> >>> Top looked like this when the program ran into trouble:
> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> >>> 27416 jgwinnup 20 0 45.9g 30g 4.0g R 10.6 3.0 1589:17
> processPhraseTa
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
--
Ulrich Germann
Senior Researcher
School of Informatics
University of Edinburgh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160204/9058e7fd/attachment.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 112, Issue 12
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 112, Issue 12"
Post a Comment