Moses-support Digest, Vol 99, Issue 24

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Deprecating the Binary phrase-table (Hieu Hoang)
2. Floating point exception in processPhraseTableMin
(Kenneth Heafield)
3. Re: Floating point exception in processPhraseTableMin
(Marcin Junczys-Dowmunt)
4. Re: Feature score deltas in the chart decoder (Jun-ya NORIMATSU)

----------------------------------------------------------------------

Message: 1
Date: Mon, 12 Jan 2015 20:17:35 +0000
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: [Moses-support] Deprecating the Binary phrase-table
To: moses-support <moses-support@mit.edu>
Message-ID: <54B42BDF.6010105@gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed

The original binary phrase-table (PhraseDictionaryBinary) has been
around with us for a long time and it's starting to show it's age and
getting in the way of further changes to the decoder.

Some of it's shortcomings:
1. isn't multi-threaded. We get around it by essentially
instantiating a new instance of it for every thread
2. Doesn't support translation rule properties, where we can store
arbitrary information with each rule
3. ?Doesn't? support sparse features
4. Can't change the API. The decoder has to keep around a jumble of
legacy code to support it. (grep for LEGACY, these functions are just
for the binary phrase-table)
5. Doesn't support hierarchical/syntax models.
6. Richard Zens (the original developer) joined the dark side many
moons ago so no-one really takes care of it anymore.

If people want binary phrase-tables, there's now a glutony of choice.
1. Marcin's compact phrase-table is pretty awesome - it's fast and
small.
2. Nikolay's Probing Pt built on KenLM's datastructures.
3. Uli's dynamic suffix array
4. My OnDisk pt. Supports both phrase-based and syntax.

With this in mind, we will deprecate the old binary pt. We can leave it
in the decoder for a while but get rid of the
processPhraseTable
so new ones won't be created.

Please raise your voice if you object

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu

------------------------------

Message: 2
Date: Mon, 12 Jan 2015 22:20:40 -0500
From: Kenneth Heafield <moses@kheafield.com>
Subject: [Moses-support] Floating point exception in
processPhraseTableMin
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <54B48F08.5080306@kheafield.com>
Content-Type: text/plain; charset=utf-8

Dear Moses/Marcin,

I'm getting a Floating point exception in processPhraseTableMin from
Moses d0807c.

Arguments, minus the absolute paths, are:

processPhraseTableMin -in phrase-table.gz -out phrase-table -nscores 4
-threads 16 -T /tmp -encoding None

The phrase table is rather large and it runs for several hours before
crashing. Log output is below.

Used options:
Text phrase table will be read from: phrase-table.gz
Output phrase table will be written to: phrase-table.minphr
Step size for source landmark phrases: 2^10=1024
Source phrase fingerprint size: 16 bits / P(fp)=1.52588e-05
Selected target phrase encoding: Huffman
Number of score components in phrase table: 4
Single Huffman code set for score components: no
Using score quantization: no
Explicitly included alignment information: yes
Running with 16 threads

Pass 1/2: Creating source phrase index + Encoding target phrases
..................................................[5000000]
..................................................[10000000]
..................................................[15000000]
..................................................[20000000]
..................................................[25000000]
..................................................[30000000]
..................................................[35000000]
..................................................[40000000]
..................................................[45000000]
..................................................[50000000]
..................................................[55000000]
..................................................[60000000]
..................................................[65000000]
..................................................[70000000]
..................................................[75000000]
..................................................[80000000]
..................................................[85000000]
..................................................[90000000]
..................................................[95000000]
..................................................[100000000]
..................................................[105000000]
..................................................[110000000]
..................................................[115000000]
..................................................[120000000]
..................................................[125000000]
..................................................[130000000]
..................................................[135000000]
..................................................[140000000]
..................................................[145000000]
..................................................[150000000]
..................................................[155000000]
..................................................[160000000]
..................................................[165000000]
..................................................[170000000]
..................................................[175000000]
..................................................[180000000]
..............................................

Intermezzo: Calculating Huffman code sets
Creating Huffman codes for 624564 target phrase symbols
Creating Huffman codes for 551381 scores
Creating Huffman codes for 15296482 scores
Creating Huffman codes for 582875 scores
Creating Huffman codes for 15806633 scores
Creating Huffman codes for 50 alignment points

Pass 2/2: Compressing target phrases
..................................................[5000000]
..................................................[10000000]

Kenneth

------------------------------

Message: 3
Date: Tue, 13 Jan 2015 08:25:20 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Floating point exception in
processPhraseTableMin
To: moses-support@mit.edu
Message-ID: <54B4C860.8050808@amu.edu.pl>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hi Kenneth.
Recently I am encountering an increased number of crashes, too. I guess
there are some heisenbugs in the binarization that manifest maybe due to
a new boost version or something. A workaround is usually to use less
threads, only one or up to 4 (it's actually not much faster with 16
anyway). If it still crashes try -encoding None . I am planning to write
a new binarization tool from scratch, this one is giving me too much
headache.

W dniu 13.01.2015 o 04:20, Kenneth Heafield pisze:
> Dear Moses/Marcin,
>
> I'm getting a Floating point exception in processPhraseTableMin from
> Moses d0807c.
>
> Arguments, minus the absolute paths, are:
>
> processPhraseTableMin -in phrase-table.gz -out phrase-table -nscores 4
> -threads 16 -T /tmp -encoding None
>
> The phrase table is rather large and it runs for several hours before
> crashing. Log output is below.
>
> Used options:
> Text phrase table will be read from: phrase-table.gz
> Output phrase table will be written to: phrase-table.minphr
> Step size for source landmark phrases: 2^10=1024
> Source phrase fingerprint size: 16 bits / P(fp)=1.52588e-05
> Selected target phrase encoding: Huffman
> Number of score components in phrase table: 4
> Single Huffman code set for score components: no
> Using score quantization: no
> Explicitly included alignment information: yes
> Running with 16 threads
>
> Pass 1/2: Creating source phrase index + Encoding target phrases
> ..................................................[5000000]
> ..................................................[10000000]
> ..................................................[15000000]
> ..................................................[20000000]
> ..................................................[25000000]
> ..................................................[30000000]
> ..................................................[35000000]
> ..................................................[40000000]
> ..................................................[45000000]
> ..................................................[50000000]
> ..................................................[55000000]
> ..................................................[60000000]
> ..................................................[65000000]
> ..................................................[70000000]
> ..................................................[75000000]
> ..................................................[80000000]
> ..................................................[85000000]
> ..................................................[90000000]
> ..................................................[95000000]
> ..................................................[100000000]
> ..................................................[105000000]
> ..................................................[110000000]
> ..................................................[115000000]
> ..................................................[120000000]
> ..................................................[125000000]
> ..................................................[130000000]
> ..................................................[135000000]
> ..................................................[140000000]
> ..................................................[145000000]
> ..................................................[150000000]
> ..................................................[155000000]
> ..................................................[160000000]
> ..................................................[165000000]
> ..................................................[170000000]
> ..................................................[175000000]
> ..................................................[180000000]
> ..............................................
>
> Intermezzo: Calculating Huffman code sets
> Creating Huffman codes for 624564 target phrase symbols
> Creating Huffman codes for 551381 scores
> Creating Huffman codes for 15296482 scores
> Creating Huffman codes for 582875 scores
> Creating Huffman codes for 15806633 scores
> Creating Huffman codes for 50 alignment points
>
> Pass 2/2: Compressing target phrases
> ..................................................[5000000]
> ..................................................[10000000]
>
> Kenneth
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

Message: 4
Date: Tue, 13 Jan 2015 21:15:30 +0900
From: Jun-ya NORIMATSU <info@jnory.com>
Subject: Re: [Moses-support] Feature score deltas in the chart decoder
To: moses-support@mit.edu
Message-ID: <54B50C62.90406@jnory.com>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi,

> You can download the regression tests and the data needed from
> https://github.com/moses-smt/moses-regression-tests
> Add your test to the 'tests' directory. Add your LM to the 'LM'
directory.
>
> I would make a copy of an existing test, eg. phrase.basic-surface-only,
> and name it
> phrase.basic-surface-only.withDALM
> and change it to use DALM instead.
>
> You can test only your new test by running
> ./bjam --with-DALM=... --with-regtest=....
> *phrase.basic-surface-only.withDALM.passed*
Thanks! I'll try it.

--
Jun-ya NORIMATSU

On 2015/01/12 20:48, Hieu Hoang wrote:
> thanks!
>
> On 12/01/15 04:49, Jun-ya NORIMATSU wrote:
>> Hi,
>>
>> > In the Moses master branch I found one other feature function that
>> > requires modifications:
>> >
>> > moses/LM/DALMWrapper.cpp
>> >
>> > This feature is currently not covered by a regression test, and I don't
>> > have any setup with this feature myself. I would not be able to test any
>> > modifications in that code and therefore would like to request that the
>> > authors apply the necessary updates themselves.
>>
>> I've just finished to check and update the code.
>> https://github.com/moses-smt/mosesdecoder/commit/39799188a0478eda822167202fb7d404b35fbaad
>> https://github.com/moses-smt/mosesdecoder/commit/832b725c595d586f8802be12588fce9b495d36b8
>>
>> Please let me know if you find some problems.
>>
>> By the way, I'd like to add the regression test in near future.
>> Would you mind my asking the place(directories or repositories...) to
>> add the test code?
> You can download the regression tests and the data needed from
> https://github.com/moses-smt/moses-regression-tests
> Add your test to the 'tests' directory. Add your LM to the 'LM' directory.
>
> I would make a copy of an existing test, eg. phrase.basic-surface-only,
> and name it
> phrase.basic-surface-only.withDALM
> and change it to use DALM instead.
>
> You can test only your new test by running
> ./bjam --with-DALM=... --with-regtest=....
> *phrase.basic-surface-only.withDALM.passed*
>
>>
>> Thanks,
>> --
>> Jun-ya NORIMATSU
>>
>>> ??: [Moses-support] Feature score deltas in the chart decoder
>>> ??: 2015-01-08 04:17
>>> ???: Matthias Huck<mhuck@inf.ed.ac.uk>
>>> ??: Moses-support<moses-support@mit.edu>
>>>
>>> Hi,
>>>
>>> I've just pushed a commit to Moses that brings about a slight change
>>> wrt. the way the chart decoder deals with feature scores.
>>>
>>> The chart decoder now stores deltas of individual feature scores instead
>>> of constantly summing everything up. This behaviour is similar to what
>>> we have been doing in the phrase-based decoder since a long time
>>> already. The main purpose of this modification is to improve efficiency
>>> with sparse features a bit.
>>>
>>> https://github.com/moses-smt/mosesdecoder/commit/465b47566424efb707bdc063d0bff52b0650eb0a
>>>
>>>
>>>
>>> The modification may however break existing feature function
>>> implementations.
>>>
>>> As a rule of thumb, any feature function that calls
>>>
>>> ScoreComponentCollection::Assign()
>>> in
>>> EvaluateWhenApplied(const ChartHypothesis&, ...)
>>>
>>> is affected and needs to be adapted to the new behaviour.
>>>
>>> Basically, the ScoreComponentCollection variable passed to
>>> EvaluateWhenApplied() now accumulates the delta score of the current
>>> rule application only, whereas it was previously accumulating the
>>> overall score of the partial hypothesis.
>>> I.e., calling Assign() in EvaluateWhenApplied() now does not replace the
>>> overall score any more, but has the same effect as calling PlusEquals().
>>>
>>> If you are the author of a feature function that implements
>>> EvaluateWhenApplied(const ChartHypothesis&, ...) and calls Assign()
>>> within that method, or if you are using such a feature function in your
>>> experiments, please update your implementation. The feature function
>>> should call PlusEquals() instead and add a score delta.
>>>
>>> I've already updated moses/LM/Ken.cpp and moses/LM/Implementation.cpp
>>> and Rico has updated moses/LM/BilingualLM.cpp .
>>> In the Moses master branch I found one other feature function that
>>> requires modifications:
>>>
>>> moses/LM/DALMWrapper.cpp
>>>
>>> This feature is currently not covered by a regression test, and I don't
>>> have any setup with this feature myself. I would not be able to test any
>>> modifications in that code and therefore would like to request that the
>>> authors apply the necessary updates themselves.
>>>
>>> Please let me know in case you notice any issues or if you need any
>>> further information or advice regarding this modification.
>>>
>>> Cheers,
>>> Matthias
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 99, Issue 24
*********************************************

Moses-support Digest, Vol 99, Issue 24

0 Response to "Moses-support Digest, Vol 99, Issue 24"

Post a Comment