Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Sparse phrase table, is still supported? (Hieu Hoang)
----------------------------------------------------------------------
Message: 1
Date: Fri, 17 Jul 2015 09:08:54 +0400
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Sparse phrase table, is still supported?
To: Philipp Koehn <phi@jhu.edu>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>, jian zhang
<zhangj@computing.dcu.ie>
Message-ID:
<CAEKMkbg30kkEcD6hwhGY5U1xxCS3AetYCRz9wB0YQ2mtFE12zw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
the OnDisk pt can do everything - sparse features, properties, hiero
models. it's just slow and big
i think the old Binary pt did sparse features but not properties, the
Compact pt does neither
Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu
On 17 July 2015 at 06:25, Philipp Koehn <phi@jhu.edu> wrote:
> Hi,
>
> I have not a clear picture what phrase table implementations support
> sparse features. Until recently PhraseTableBin did, but PhraseTableCompact
> did not. Not sure, if things changed either way.
>
> -phi
>
> On Thu, Jul 16, 2015 at 9:42 PM, Matthias Huck <mhuck@inf.ed.ac.uk> wrote:
> > Hi,
> >
> > You're right, I claimed in the previous mail that "in order to produce
> > sparse features, you need to write a feature function anyway" and this
> > is of course not true if you get the sparse phrase table features to
> > work.
> >
> > When I tried those sparse domain indicators recently, they didn't work
> > out of the box, and I also don't know where to find the relevant code.
> > My guess is that this functionality was broken during the course of
> > Moses refactoring, but it may as well still be there and waiting to be
> > activated in the moses.ini. What I did was just switching to dense
> > domain indicators.
> >
> > Maybe Hieu can help?
> >
> > Cheers,
> > Matthias
> >
> >
> > On Thu, 2015-07-16 at 10:03 +0100, jian zhang wrote:
> >> Hi Matthias,
> >>
> >>
> >> Thanks for the information.
> >>
> >>
> >> I tested on moses 3.0, adding phrase table sparse feature is seems
> >> working.
> >>
> >>
> >> However, I did not add any flag into ini, like suggested "If a phrase
> >> table contains sparse features, then this needs to be flagged in the
> >> configuration file by adding the word sparse after the phrase table
> >> file name.". Did i miss anything?
> >>
> >>
> >> Regards,
> >>
> >>
> >> Jian
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Jul 16, 2015 at 3:23 AM, Matthias Huck <mhuck@inf.ed.ac.uk>
> >> wrote:
> >> Hi Jian,
> >>
> >> That depends on the nature of the features you're planning to
> >> implement.
> >>
> >> In order to produce sparse features, you need to write a
> >> feature
> >> function anyway.
> >>
> >> But if it's only a handful of scores and they can be
> >> calculated during
> >> extraction time, then go for dense features and add the scores
> >> directly
> >> to the phrase table.
> >>
> >> If the scores cannot be precalculated, for instance because
> >> you need
> >> non-local information that is only available during decoding,
> >> then a
> >> feature function implementation becomes necessary.
> >>
> >> When you write a feature function that calculates scores
> >> during decoding
> >> time, it can produce dense scores, sparse scores, or both
> >> types. That's
> >> up to you.
> >>
> >> If it's plenty of scores which are fired rarely, then sparse
> >> is the
> >> right choice. And you certainly need a sparse feature function
> >> implementation in case you are not aware in advance of the
> >> overall
> >> amount of feature scores it can produce.
> >>
> >> If you need information from phrase extraction in order to
> >> calculate
> >> scores during decoding time, then we have something denoted as
> >> "phrase
> >> properties". Phrase properties give you a means of storing
> >> arbitrary
> >> additional information in the phrase table. You have to extend
> >> the
> >> extraction pipeline to retrieve and store the phrase
> >> properties you
> >> require. The decoder can later read this information from the
> >> phrase
> >> table, and your feature function can utilize it in some way.
> >>
> >> A large amount of sparse feature scores can somewhat slow down
> >> decoding
> >> and tuning. Also, you have to use MIRA or PRO for tuning, not
> >> MERT.
> >>
> >> Cheers,
> >> Matthias
> >>
> >>
> >> On Thu, 2015-07-16 at 02:18 +0100, jian zhang wrote:
> >> > Hi Matthias,
> >> >
> >> >
> >> > Not for domain feature.
> >> >
> >> >
> >> > I want to implement some sparse features, so there are two
> >> options:
> >> > 1, add to phrase table, if it is supported
> >> > 2, implement sparse feature functions,
> >> >
> >> >
> >> > I'd like to know are there any difference between these two
> >> options,
> >> > for example, tuning, compute sentence translation scores ...
> >> >
> >> >
> >> > Regards,
> >> >
> >> >
> >> >
> >> > Jian
> >> >
> >> >
> >> >
> >> > On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck
> >> <mhuck@inf.ed.ac.uk>
> >> > wrote:
> >> > Hi,
> >> >
> >> > Are you planning to use binary domain indicator
> >> features? I'm
> >> > not sure
> >> > whether a sparse feature function for this is
> >> currently
> >> > implemented. If
> >> > you're working with a small set of domains, you can
> >> employ
> >> > dense
> >> > indicators instead (domain-features = "indicator" in
> >> EMS).
> >> > You'll have
> >> > to re-extract the phrase table, though. Or process
> >> it with a
> >> > script to
> >> > add dense indicator values to the scores field.
> >> >
> >> > I believe that there might also be some bug in the
> >> extraction
> >> > pipeline
> >> > when both domain-features = "sparse indicator" and
> >> > score-settings =
> >> > "--GoodTuring" are active in EMS. At least it caused
> >> me
> >> > trouble a couple
> >> > of weeks ago. However, I must admit that I didn't
> >> investigate
> >> > it further
> >> > at that point.
> >> >
> >> > Anyway, the bottom line is that I recommend
> >> re-extracting with
> >> > dense
> >> > indicators.
> >> >
> >> > But let me know what you find regarding a sparse
> >> > implementation.
> >> >
> >> > Cheers,
> >> > Matthias
> >> >
> >> >
> >> > On Thu, 2015-07-16 at 00:48 +0100, jian zhang wrote:
> >> > > Hi,
> >> > >
> >> > >
> >> > > Is the sparse features at phrase table, like
> >> > >
> >> > >
> >> > >
> >> > > das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718
> >> ||| 0-0 1-1
> >> > ||| 5000
> >> > > 5000 2500 ||| dom_europarl 1
> >> > >
> >> > >
> >> > >
> >> > > still supported? If yes, what should I set to the
> >> ini file
> >> > based on
> >> > > the example above?
> >> > >
> >> > >
> >> > > Thank,
> >> > >
> >> > >
> >> > > Jian
> >> > >
> >> > >
> >> > > --
> >> > > Jian Zhang
> >> > > Centre for Next Generation Localisation (CNGL)
> >> > > Dublin City University
> >> >
> >> > > _______________________________________________
> >> > > Moses-support mailing list
> >> > > Moses-support@mit.edu
> >> > >
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >
> >> >
> >> >
> >> > --
> >> > The University of Edinburgh is a charitable body,
> >> registered
> >> > in
> >> > Scotland, with registration number SC005336.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Jian Zhang
> >> > Centre for Next Generation Localisation (CNGL)
> >> > Dublin City University
> >>
> >>
> >>
> >> --
> >> The University of Edinburgh is a charitable body, registered
> >> in
> >> Scotland, with registration number SC005336.
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jian Zhang
> >> Centre for Next Generation Localisation (CNGL)
> >> Dublin City University
> >
> >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150717/71a17080/attachment.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 105, Issue 35
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 105, Issue 35"
Post a Comment