Moses-support Digest, Vol 103, Issue 32

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Transliteration model is using processPhraseTable, which
is not found in Moses version 3.0 (Kenneth Heafield)


----------------------------------------------------------------------

Message: 1
Date: Sun, 10 May 2015 08:46:41 -0400
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Transliteration model is using
processPhraseTable, which is not found in Moses version 3.0
To: moses-support@mit.edu
Message-ID: <554F5331.8010706@kheafield.com>
Content-Type: text/plain; charset=windows-1252

KenLM doesn't care about the order field, so that behavior is correct.
As to why it produced a KenLM config and a SRILM binary file, that's a
bug.

On 05/10/15 06:15, Ergun Bicici wrote:
>
> Transliteration config file is not copying LM order (order=5):
> evaluation/Transliteration-Module/test.transliterated.3/evaluation/moses.filtered.ini
>
> and appends the following to SRILM binary LM:
> KENLM lazyken=0
>
> which is giving the following:
> ****************************************************************************************************
> Exception: lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece
> &, std::vector<unsigned long, std::allocator<unsigned long>> &) threw
> FormatLoadException'.
> first non-empty line was "SRILM_BINARY_NGRAM_002" not \data\. Byte: 23
>
> I replaced this with SRILM and
> obtained Transliteration-Module/test.transliterated.3 file.
>
>
>
> Best Regards,
> Ergun
>
> Ergun Bi?ici, CNGL, School of Computing, DCU, www.cngl.ie
> <http://www.cngl.ie>
> http://www.computing.dcu.ie/~ebicici/
>
>
> On Wed, May 6, 2015 at 1:45 PM, Ergun Bicici
> <Ergun.Bicici@computing.dcu.ie <mailto:Ergun.Bicici@computing.dcu.ie>>
> wrote:
>
>
> Dear Nadir,
>
> Thank you very much for explaining transliteration. I have "yes" for
> both transliteration-module and post-decoding-transliteration in the
> EMS configuration file used for en-ru.
>
> Best Regards,
> Ergun
>
> Ergun Bi?ici, CNGL, School of Computing, DCU, www.cngl.ie
> <http://www.cngl.ie>
> http://www.computing.dcu.ie/~ebicici/
>
>
> ---------- Forwarded message ----------
> From: *Nadir Durrani* <nadir.durrani@nu.edu.pk
> <mailto:nadir.durrani@nu.edu.pk>>
> Date: Wed, May 6, 2015 at 11:17 AM
> Subject: Re: Transliteration model is using processPhraseTable,
> which is not found in Moses version 3.0
> To: Ergun Bicici <Ergun.Bicici@computing.dcu.ie
> <mailto:Ergun.Bicici@computing.dcu.ie>>
>
>
> Hi Ergun,
>
> If you are only going to do
>
> transliteration-module = "yes"
>
> Moses will train the transliteration system but not going to do
> anything with it. You have to select whether you want to use
> post-deocoding or in-decoding transliteration.
>
> In post-decoding method, transliteration is done in the post-decoding
> step i.e. the decoder has translated all the sentences and now you
> just need to replace OOV words with their best transliteration given
> the context. This is Method 2 as described in the following paper
>
> http://aclweb.org/anthology//E/E14/E14-4029.pdf
>
> you can enable it by using
>
> post-decoding-transliteration = "yes"
>
>
> Using in-decoding method (Method 3 in the paper), you do
> transliteration inside the decoder on the fly. The advantage of this
> over Method 2 in theory is that you can also reorder the OOV word and
> make use of other features. But it does not give any clear-cut gains.
>
> More details here:
>
> http://www.statmt.org/moses/?n=Advanced.OOVs
>
> Nadir
>
> >> On Tue, May 5, 2015 at 5:33 PM, Ergun Bicici
> >> <Ergun.Bicici@computing.dcu.ie
> <mailto:Ergun.Bicici@computing.dcu.ie>> wrote:
> >> >
> >> > Hi Nadir,
> >> >
> >> > I am using Moses 3.0 and for transliteration to work, I copied
> >> > scripts/Transliteration/ from latest onto Moses 3.0 path,
> re-ran, and
> >> > obtained translation results.
> >> >
> >> >
> >> > Best Regards,
> >> > Ergun
> >> >
> >> > Ergun Bi?ici, CNGL, School of Computing, DCU, www.cngl.ie
> <http://www.cngl.ie>
> >> > http://www.computing.dcu.ie/~ebicici/
> >> >
> >> >
> >> > On Mon, May 4, 2015 at 7:32 AM, Nadir Durrani
> <nadir.durrani@nu.edu.pk <mailto:nadir.durrani@nu.edu.pk>>
> >> > wrote:
> >> >>
> >> >> Hi Ergun,
> >> >>
> >> >> processPhraseTable is no longer supported by Moses. But I see that
> >> >> Phil Williams has already fixed this problem in transliteration
> >> >> module, by changing
> >> >>
> >> >> `$MOSES_SRC/scripts/training/filter-model-given-input.pl
> <http://filter-model-given-input.pl>
> >> >> $TRANSLIT_MODEL/evaluation/$eval_file.filtered
> >> >> $TRANSLIT_MODEL/evaluation/$eval_file.moses.table.ini
> >> >> $TRANSLIT_MODEL/evaluation/$eval_file -Binarizer
> >> >> "$MOSES_SRC/bin/processPhraseTable"`;
> >> >>
> >> >> to
> >> >>
> >> >> `$MOSES_SRC/scripts/training/filter-model-given-input.pl
> <http://filter-model-given-input.pl>
> >> >> $TRANSLIT_MODEL/evaluation/$eval_file.filtered
> >> >> $TRANSLIT_MODEL/evaluation/$eval_file.moses.table.ini
> >> >> $TRANSLIT_MODEL/evaluation/$eval_file -Binarizer
> >> >> "$MOSES_SRC/bin/CreateOnDiskPt 1 1 4 100 2"`;
> >> >>
> >> >> in
> >> >>
> >> >>
> path-to-moses/scripts/Transliteration/in-decoding-transliteration.pl
> <http://in-decoding-transliteration.pl>
> >> >>
> >> >> Here's the commit
> >> >>
> >> >>
> >> >>
> >> >>
> https://github.com/moses-smt/mosesdecoder/commit/7e54e23fe234ac48f44beeee0e473d09a5b4d5f6
> >> >>
> >> >> May be you pulled and in between version where the
> processPhraseTable
> >> >> was removed but transliteration scripts were not fixed.
> >> >>
> >> >> Cheers,
> >> >> Nadir
> >> >>
> >> >>
> >> >> On Mon, May 4, 2015 at 7:46 AM,
> <moses-support-request@mit.edu
> <mailto:moses-support-request@mit.edu>> wrote:
> >> >> > Send Moses-support mailing list submissions to
> >> >> > moses-support@mit.edu <mailto:moses-support@mit.edu>
> >> >> >
> >> >> > To subscribe or unsubscribe via the World Wide Web, visit
> >> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >> > or, via email, send a message with subject or body 'help' to
> >> >> > moses-support-request@mit.edu
> <mailto:moses-support-request@mit.edu>
> >> >> >
> >> >> > You can reach the person managing the list at
> >> >> > moses-support-owner@mit.edu
> <mailto:moses-support-owner@mit.edu>
> >> >> >
> >> >> > When replying, please edit your Subject line so it is more
> specific
> >> >> > than "Re: Contents of Moses-support digest..."
> >> >> >
> >> >> >
> >> >> > Today's Topics:
> >> >> >
> >> >> > 1. Re: 12-gram language model ARPA file for 16GB (liling tan)
> >> >> > 2. Transliteration model is using processPhraseTable,
> which is
> >> >> > not found in Moses version 3.0 (Ergun Bicici)
> >> >> > 3. Re: Transliteration model is using processPhraseTable,
> which
> >> >> > is not found in Moses version 3.0 (Hieu Hoang)
> >> >> > 4. Europarl monolingual corpus (Hieu Hoang)
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> ----------------------------------------------------------------------
> >> >> >
> >> >> > Message: 1
> >> >> > Date: Sun, 3 May 2015 19:44:12 +0200
> >> >> > From: liling tan <alvations@gmail.com
> <mailto:alvations@gmail.com>>
> >> >> > Subject: Re: [Moses-support] 12-gram language model ARPA
> file for
> >> >> > 16GB
> >> >> > To: moses-support <moses-support@mit.edu
> <mailto:moses-support@mit.edu>>
> >> >> > Message-ID:
> >> >> >
> >> >> >
> <CAKzPaJJ7fY=9C89POact542vu32d+H3=0i_Dnaj=YfizbFA+cQ@mail.gmail.com
> <mailto:YfizbFA%2BcQ@mail.gmail.com>>
> >> >> > Content-Type: text/plain; charset="utf-8"
> >> >> >
> >> >> > Dear Moses devs/users,
> >> >> >
> >> >> > For now, I only know that it takes more than 250GB. I've
> 250GB of
> >> >> > free
> >> >> > space and KenLM got "poisoned" by insufficient space...
> >> >> >
> >> >> > Does anyone have an idea how big would a 12-gram language
> model ARPA
> >> >> > file
> >> >> > trained on 16GB of text become?
> >> >> >
> >> >> > STDERR:
> >> >> >
> >> >> > === 1/5 Counting and sorting n-grams ===
> >> >> > Reading /media/2tb/wmt15/corpus.truecase/train-lm.en
> >> >> >
> >> >> >
> >> >> >
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> >> >> > tcmalloc: large alloc 7846035456 bytes == 0x10f4000 @
> >> >> > tcmalloc: large alloc 73229664256 bytes == 0x1d542e000 @
> >> >> >
> >> >> >
> >> >> >
> ****************************************************************************************************
> >> >> > Unigram tokens 3038737446 types 5924314
> >> >> > === 2/5 Calculating and sorting adjusted counts ===
> >> >> > Chain sizes: 1:71091768 2:804524736 3:1508483968 4:2413574144
> >> >> > 5:3519795968
> >> >> > 6:4827148288 7:6335632384 8:8045247488 9:9955993600
> 10:12067871744
> >> >> > 11:14380880896 12:16895020032
> >> >> > tcmalloc: large alloc 16895025152 bytes == 0x1d542e000 @
> >> >> > tcmalloc: large alloc 2413576192 bytes == 0x8f2a0000 @
> >> >> > tcmalloc: large alloc 3519799296 bytes == 0x5c4488000 @
> >> >> > tcmalloc: large alloc 4827152384 bytes == 0x696146000 @
> >> >> > tcmalloc: large alloc 6335635456 bytes == 0x7b5cce000 @
> >> >> > tcmalloc: large alloc 8045248512 bytes == 0x92f6f0000 @
> >> >> > tcmalloc: large alloc 9955999744 bytes == 0xb0ef7c000 @
> >> >> > tcmalloc: large alloc 12067872768 bytes == 0xd60644000 @
> >> >> > tcmalloc: large alloc 14380883968 bytes == 0x12f616e000 @
> >> >> > Last input should have been poison.
> >> >> > Last input should have been poison.util/file.cc:196 in void
> >> >> > util::WriteOrThrow(int, const void*, std::size_t) threw
> FDException
> >> >> > because
> >> >> > `ret < 1'.
> >> >> > No space left on device in /tmp/PC2o3z (deleted) while writing
> >> >> > 5301120368
> >> >> > bytes
> >> >> >
> >> >> > Last input should have been poison.util/file.cc:196 in void
> >> >> > util::WriteOrThrow(int, const void*, std::size_t) threw
> FDException
> >> >> > because
> >> >> > `ret < 1'.
> >> >> > No space left on device in /tmp/PftXeo (deleted) while writing
> >> >> > 1941075872
> >> >> > bytesLast input should have been poison.
> >> >> >
> >> >> > util/file.cc:196 in void util::WriteOrThrow(int, const void*,
> >> >> > std::size_t)
> >> >> > threw FDException because `ret < 1'.
> >> >> > No space left on device in /tmp/CuZcPM (deleted) while writing
> >> >> > 2984722272
> >> >> > bytes
> >> >> >
> >> >> > util/file.cc:196 in void util::WriteOrThrow(int, const void*,
> >> >> > std::size_t)
> >> >> > threw FDException because `ret < 1'.
> >> >> > No space left on device in /tmp/F2bE8A (deleted) while writing
> >> >> > 389439488
> >> >> > bytes
> >> >> >
> >> >> > Regards,
> >> >> > Liling
> >> >> > -------------- next part --------------
> >> >> > An HTML attachment was scrubbed...
> >> >> > URL:
> >> >> >
> >> >> >
> http://mailman.mit.edu/mailman/private/moses-support/attachments/20150503/b56dc8ba/attachment-0001.htm
> >> >> >
> >> >> > ------------------------------
> >> >> >
> >> >> > Message: 2
> >> >> > Date: Sun, 3 May 2015 22:42:22 +0100
> >> >> > From: Ergun Bicici <Ergun.Bicici@computing.dcu.ie
> <mailto:Ergun.Bicici@computing.dcu.ie>>
> >> >> > Subject: [Moses-support] Transliteration model is using
> >> >> > processPhraseTable, which is not found in Moses
> version 3.0
> >> >> > To: moses-support <moses-support@mit.edu
> <mailto:moses-support@mit.edu>>
> >> >> > Message-ID:
> >> >> >
> >> >> >
> <CAB2pGncpvc4roLXwLcFcXytZHKEqSZvzaX2L16Yfo=P-vq1jBA@mail.gmail.com
> <mailto:P-vq1jBA@mail.gmail.com>>
> >> >> > Content-Type: text/plain; charset="utf-8"
> >> >> >
> >> >> > binarizing...gzip -cd
> >> >> >
> >> >> >
> >> >> >
> en-ru_path/model/Transliteration.8/tuning/filtered/phrase-table.0-0.1.1.gz
> >> >> > | LC_ALL=C sort -T
> en-ru_path/model/Transliteration.8/tuning/filtered
> >> >> > |
> >> >> > moses_3.0/mosesdecoder/bin/processPhraseTable -ttable 0 0 -
> -nscores
> >> >> > 4
> >> >> > -out
> >> >> >
> >> >> >
> en-ru_path/model/Transliteration.8/tuning/filtered/phrase-table.0-0.1.1
> >> >> > sh: moses_3.0/mosesdecoder/bin/processPhraseTable: No such
> file or
> >> >> > directory
> >> >> > sort: write failed: standard output: Broken pipe
> >> >> > sort: write error
> >> >> >
> >> >> > How can I have processPhraseTable built?
> >> >> >
> >> >> > Best Regards,
> >> >> > Ergun
> >> >> >
> >> >> > Ergun Bi?ici, CNGL, School of Computing, DCU, www.cngl.ie
> <http://www.cngl.ie>
> >> >> > http://www.computing.dcu.ie/~ebicici/
> >> >> > -------------- next part --------------
> >> >> > An HTML attachment was scrubbed...
> >> >> > URL:
> >> >> >
> >> >> >
> http://mailman.mit.edu/mailman/private/moses-support/attachments/20150503/dacaa1c9/attachment-0001.htm
> >> >> >
> >> >> > ------------------------------
> >> >> >
> >> >> > Message: 3
> >> >> > Date: Mon, 04 May 2015 08:31:18 +0400
> >> >> > From: Hieu Hoang <hieuhoang@gmail.com
> <mailto:hieuhoang@gmail.com>>
> >> >> > Subject: Re: [Moses-support] Transliteration model is using
> >> >> > processPhraseTable, which is not found in Moses
> version 3.0
> >> >> > To: Ergun Bicici <Ergun.Bicici@computing.dcu.ie
> <mailto:Ergun.Bicici@computing.dcu.ie>>, moses-support
> >> >> > <moses-support@mit.edu <mailto:moses-support@mit.edu>>
> >> >> > Message-ID: <5546F616.4000007@gmail.com
> <mailto:5546F616.4000007@gmail.com>>
> >> >> > Content-Type: text/plain; charset="windows-1252"
> >> >> >
> >> >> > do you know where the processPhraseTable exec is being
> called from?
> >> >> >
> >> >> > it would be helpful so we can make sure it uses something else.
> >> >> >
> >> >> > if you really want processPhraseTable back, uncomment 3 lines in
> >> >> > misc/Jamfile
> >> >> >
> >> >> > +++ b/misc/Jamfile
> >> >> > @@ -1,8 +1,8 @@
> >> >> > -#exe processPhraseTable : GenerateTuples.cpp
> processPhraseTable.cpp
> >> >> > ..//boost_filesystem ../moses//moses ;
> >> >> > +exe processPhraseTable : GenerateTuples.cpp
> processPhraseTable.cpp
> >> >> > ..//boost_filesystem ../moses//moses ;
> >> >> >
> >> >> > exe processLexicalTable : processLexicalTable.cpp
> >> >> > ..//boost_filesystem
> >> >> > ../moses//moses ;
> >> >> >
> >> >> > -#exe queryPhraseTable : queryPhraseTable.cpp
> ..//boost_filesystem
> >> >> > ../moses//moses ;
> >> >> > +exe queryPhraseTable : queryPhraseTable.cpp
> ..//boost_filesystem
> >> >> > ../moses//moses ;
> >> >> >
> >> >> > exe queryLexicalTable : queryLexicalTable.cpp
> ..//boost_filesystem
> >> >> > ../moses//moses ;
> >> >> >
> >> >> > @@ -46,6 +46,6 @@ $(TOP)//boost_iostreams
> >> >> > $(TOP)//boost_program_options
> >> >> > ;
> >> >> >
> >> >> > -alias programs : 1-1-Extraction TMining generateSequences
> >> >> > processLexicalTable queryLexicalTable programsMin
> programsProbing
> >> >> > merge-sorted prunePhraseTable ;
> >> >> > -#processPhraseTable queryPhraseTable
> >> >> > +alias programs : 1-1-Extraction TMining generateSequences
> >> >> > processLexicalTable queryLexicalTable programsMin
> programsProbing
> >> >> > merge-sorted prunePhraseTable processPhraseTable
> queryPhraseTable ;
> >> >> >
> >> >> > On 04/05/2015 01:42, Ergun Bicici wrote:
> >> >> >>
> >> >> >> binarizing...gzip -cd
> >> >> >>
> >> >> >>
> >> >> >>
> en-ru_path/model/Transliteration.8/tuning/filtered/phrase-table.0-0.1.1.gz
> >> >> >> | LC_ALL=C sort -T
> >> >> >> en-ru_path/model/Transliteration.8/tuning/filtered
> >> >> >> | moses_3.0/mosesdecoder/bin/processPhraseTable -ttable 0 0 -
> >> >> >> -nscores
> >> >> >> 4 -out
> >> >> >>
> >> >> >>
> en-ru_path/model/Transliteration.8/tuning/filtered/phrase-table.0-0.1.1
> >> >> >> sh: moses_3.0/mosesdecoder/bin/processPhraseTable: No such
> file or
> >> >> >> directory
> >> >> >> sort: write failed: standard output: Broken pipe
> >> >> >> sort: write error
> >> >> >>
> >> >> >> How can I have processPhraseTable built?
> >> >> >>
> >> >> >> Best Regards,
> >> >> >> Ergun
> >> >> >>
> >> >> >> Ergun Bi?ici, CNGL, School of Computing, DCU, www.cngl.ie
> <http://www.cngl.ie>
> >> >> >> <http://www.cngl.ie>
> >> >> >> http://www.computing.dcu.ie/~ebicici/
> >> >> >> <http://www.computing.dcu.ie/%7Eebicici/>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> _______________________________________________
> >> >> >> Moses-support mailing list
> >> >> >> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >> >> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >> >
> >> >> > --
> >> >> > Hieu Hoang
> >> >> > Researcher
> >> >> > New York University, Abu Dhabi
> >> >> > http://www.hoang.co.uk/hieu
> >> >> >
> >> >> > -------------- next part --------------
> >> >> > An HTML attachment was scrubbed...
> >> >> > URL:
> >> >> >
> >> >> >
> http://mailman.mit.edu/mailman/private/moses-support/attachments/20150504/303023d0/attachment-0001.htm
> >> >> >
> >> >> > ------------------------------
> >> >> >
> >> >> > Message: 4
> >> >> > Date: Mon, 4 May 2015 08:46:15 +0400
> >> >> > From: Hieu Hoang <hieuhoang@gmail.com
> <mailto:hieuhoang@gmail.com>>
> >> >> > Subject: [Moses-support] Europarl monolingual corpus
> >> >> > To: moses-support <moses-support@mit.edu
> <mailto:moses-support@mit.edu>>
> >> >> > Message-ID:
> >> >> >
> >> >> >
> <CAEKMkbiO64F_m20RwNXyDOj60FHEZ_oo+BY+hzkW3TBFukPfAQ@mail.gmail.com
> <mailto:CAEKMkbiO64F_m20RwNXyDOj60FHEZ_oo%2BBY%2BhzkW3TBFukPfAQ@mail.gmail.com>>
> >> >> > Content-Type: text/plain; charset="utf-8"
> >> >> >
> >> >> > What's the easiest way get the single-language data from the
> Europarl
> >> >> > corpus as described in the 1st table in:
> >> >> > http://statmt.org/europarl/
> >> >> >
> >> >> > I tried downloading the xml source
> >> >> > http://statmt.org/europarl/v7/europarl.tgz
> >> >> > stripping the xml and running split-sentence.perl, but this
> takes an
> >> >> > unfathomably long time
> >> >> >
> >> >> > Hieu Hoang
> >> >> > Researcher
> >> >> > New York University, Abu Dhabi
> >> >> > http://www.hoang.co.uk/hieu
> >> >> > -------------- next part --------------
> >> >> > An HTML attachment was scrubbed...
> >> >> > URL:
> >> >> >
> >> >> >
> http://mailman.mit.edu/mailman/private/moses-support/attachments/20150504/ba5b4087/attachment.htm
> >> >> >
> >> >> > ------------------------------
> >> >> >
> >> >> > _______________________________________________
> >> >> > Moses-support mailing list
> >> >> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >> >
> >> >> >
> >> >> > End of Moses-support Digest, Vol 103, Issue 5
> >> >> > *********************************************
> >> >
> >> >
> >
> >
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 103, Issue 32
**********************************************

0 Response to "Moses-support Digest, Vol 103, Issue 32"

Post a Comment