Moses-support Digest, Vol 151, Issue 5

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. processPhraseTableMin Cannot encode numbers larger than
268435455 (He Shiming)
2. Re: processPhraseTableMin Cannot encode numbers largerthan
268435455 (Marcin Junczys-Dowmunt)
3. Re: processPhraseTableMin Cannot encode numbers largerthan
268435455 (He Shiming)


----------------------------------------------------------------------

Message: 1
Date: Fri, 10 May 2019 11:43:17 +0800
From: He Shiming <heshiming@gmail.com>
Subject: [Moses-support] processPhraseTableMin Cannot encode numbers
larger than 268435455
To: moses-support@mit.edu
Message-ID:
<CANBMWHjy3XkhbWkqhT1Lb8yPB6179hgByQ06evk4XTGYVo0j3Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

I'm training a Chinese-to-English phrase-based model, using 33 million
sentence pairs. My phrase table is 90GB gzipped, and the reordering table
is 27GB gzipped. When running processPhraseTableMin, it dies in step 3
because of the following error:

Intermezzo: Calculating Huffman code sets
Creating Huffman codes for 1786817 target phrase symbols
Creating Huffman codes for 871265 scores
Creating Huffman codes for 18018117 scores
Creating Huffman codes for 827039 scores
Creating Huffman codes for 17861459 scores
Creating Huffman codes for 50 alignment points

Pass 3/3: Compressing target phrases
..................................................[5000000]
..................................................[345000000]
............................................terminate called after throwing
an instance of 'util::Exception'
what(): moses/TranslationModel/CompactPT/ListCoders.h:179 in static void
Moses::Simple9::EncodeSymbol(Moses::Simple9::uint&, InIt, InIt) [with InIt
= unsigned int*; Moses::Simple9::uint = unsigned int] threw util::Exception
because `*it > 268435455'.
You are trying to encode 436766721 with Simple9. Cannot encode numbers
larger than 268435455 (2^28-1)
Aborted (core dumped)

Is my phrase table too big? Pruning seems to have only removed 0.1% of the
phrases. Is retraining using fewer pairs my only option?

--
Best regards,
He Shiming
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20190509/f33ffb5c/attachment-0001.html

------------------------------

Message: 2
Date: Thu, 9 May 2019 20:53:19 -0700
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] processPhraseTableMin Cannot encode
numbers largerthan 268435455
To: He Shiming <heshiming@gmail.com>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID: <20190510035319.1AF5D674B0@pp.amu.edu.pl>
Content-Type: text/plain; charset="utf-8"

Hi,
Yes, a smaller phrase table should help. I wrote the table, but that was in 2012 and I cannot really remember what goes on in there. I think making sure that you do not have too many target phrases per source phrase should help.

From: He Shiming
Sent: Thursday, May 9, 2019 8:49 PM
To: moses-support@mit.edu
Subject: [Moses-support] processPhraseTableMin Cannot encode numbers largerthan 268435455

Hi,

I'm training a Chinese-to-English phrase-based model, using 33 million sentence pairs. My phrase table is 90GB gzipped, and the reordering table is 27GB gzipped. When running processPhraseTableMin, it dies in step 3 because of the following error:

Intermezzo: Calculating Huffman code sets
? ? ? ? Creating Huffman codes for 1786817 target phrase symbols
? ? ? ? Creating Huffman codes for 871265 scores
? ? ? ? Creating Huffman codes for 18018117 scores
? ? ? ? Creating Huffman codes for 827039 scores
? ? ? ? Creating Huffman codes for 17861459 scores
? ? ? ? Creating Huffman codes for 50 alignment points

Pass 3/3: Compressing target phrases
..................................................[5000000]
..................................................[345000000]
............................................terminate called after throwing an instance of 'util::Exception'
? what():? moses/TranslationModel/CompactPT/ListCoders.h:179 in static void Moses::Simple9::EncodeSymbol(Moses::Simple9::uint&, InIt, InIt) [with InIt = unsigned int*; Moses::Simple9::uint = unsigned int] threw util::Exception because `*it > 268435455'.
You are trying to encode 436766721 with Simple9. Cannot encode numbers larger than 268435455 (2^28-1)
Aborted (core dumped)

Is my phrase table too big? Pruning seems to have only removed 0.1% of the phrases. Is retraining using fewer pairs my only option?

--
Best regards,
He Shiming

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20190509/0f045cc4/attachment-0001.html

------------------------------

Message: 3
Date: Fri, 10 May 2019 11:57:05 +0800
From: He Shiming <heshiming@gmail.com>
Subject: Re: [Moses-support] processPhraseTableMin Cannot encode
numbers largerthan 268435455
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CANBMWHgTEsG8gSbYnJo-zpi1zwib0=SeLYbA+T1nihgcuDR3_A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Ok, thank you Marcin.

On Fri, May 10, 2019 at 11:53 AM Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
wrote:

> Hi,
>
> Yes, a smaller phrase table should help. I wrote the table, but that was
> in 2012 and I cannot really remember what goes on in there. I think making
> sure that you do not have too many target phrases per source phrase should
> help.
>
>
>
> *From: *He Shiming <heshiming@gmail.com>
> *Sent: *Thursday, May 9, 2019 8:49 PM
> *To: *moses-support@mit.edu
> *Subject: *[Moses-support] processPhraseTableMin Cannot encode numbers
> largerthan 268435455
>
>
>
> Hi,
>
>
>
> I'm training a Chinese-to-English phrase-based model, using 33 million
> sentence pairs. My phrase table is 90GB gzipped, and the reordering table
> is 27GB gzipped. When running processPhraseTableMin, it dies in step 3
> because of the following error:
>
>
>
> Intermezzo: Calculating Huffman code sets
>
> Creating Huffman codes for 1786817 target phrase symbols
>
> Creating Huffman codes for 871265 scores
>
> Creating Huffman codes for 18018117 scores
>
> Creating Huffman codes for 827039 scores
>
> Creating Huffman codes for 17861459 scores
>
> Creating Huffman codes for 50 alignment points
>
>
>
> Pass 3/3: Compressing target phrases
>
> ..................................................[5000000]
>
> ..................................................[345000000]
>
> ............................................terminate called after
> throwing an instance of 'util::Exception'
>
> what(): moses/TranslationModel/CompactPT/ListCoders.h:179 in static
> void Moses::Simple9::EncodeSymbol(Moses::Simple9::uint&, InIt, InIt) [with
> InIt = unsigned int*; Moses::Simple9::uint = unsigned int] threw
> util::Exception because `*it > 268435455'.
>
> You are trying to encode 436766721 with Simple9. Cannot encode numbers
> larger than 268435455 (2^28-1)
>
> Aborted (core dumped)
>
>
>
> Is my phrase table too big? Pruning seems to have only removed 0.1% of the
> phrases. Is retraining using fewer pairs my only option?
>
>
>
> --
>
> Best regards,
> He Shiming
>
>
>


--
Best regards,
He Shiming
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20190509/3d006497/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 151, Issue 5
*********************************************

0 Response to "Moses-support Digest, Vol 151, Issue 5"

Post a Comment