Moses-support Digest, Vol 105, Issue 33

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Getting started - Baseline basic question (Kenneth Heafield)
2. Re: Sparse phrase table, is still supported? (Matthias Huck)

----------------------------------------------------------------------

Message: 1
Date: Thu, 16 Jul 2015 06:50:26 -0400
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Getting started - Baseline basic question
To: moses-support@mit.edu
Message-ID: <55A78C72.7030302@kheafield.com>
Content-Type: text/plain; charset=windows-1252

--interpolate_unigrams 0 is there if people want to emulate SRI's weird
behavior with regard to <unk>. It shouldn't be used in most cases as it
produces a large p(<unk>).

On 07/16/15 01:40, Hieu Hoang wrote:
> see the example under 'Comparison With Other Toolkits'
>
> if you have a corpus file called 'text' and you want to create a
> 5-gram language model, the command is
> |lmplz -o 5 --interpolate_unigrams 0 <text >text.arpa|
>
> Hieu Hoang
> Researcher
> New York University, Abu Dhabi
> http://www.hoang.co.uk/hieu
>
> On 15 July 2015 at 23:50, Vincent Nguyen <vnguyen@neuf.fr
> <mailto:vnguyen@neuf.fr>> wrote:
>
> you need to create a language model. You can do it with kenlm
> using the program lmplz. The program is described here:
>
> https://kheafield.com/code/kenlm/estimation/
> We haven't updated the tutorial with more info about lmplz
> yet, but we should do at some point
>
>
>
> I have read this page, but honestly I have no clue about the
> sequence of what to do to create the LM and the training
> once I have prepared the Corpus.
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

Message: 2
Date: Thu, 16 Jul 2015 15:42:19 +0100
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] Sparse phrase table, is still supported?
To: jian zhang <zhangj@computing.dcu.ie>
Cc: moses-support@mit.edu
Message-ID: <1437057739.2035.23.camel@inf.ed.ac.uk>
Content-Type: text/plain; charset="UTF-8"

Hi,

You're right, I claimed in the previous mail that "in order to produce
sparse features, you need to write a feature function anyway" and this
is of course not true if you get the sparse phrase table features to
work.

When I tried those sparse domain indicators recently, they didn't work
out of the box, and I also don't know where to find the relevant code.
My guess is that this functionality was broken during the course of
Moses refactoring, but it may as well still be there and waiting to be
activated in the moses.ini. What I did was just switching to dense
domain indicators.

Maybe Hieu can help?

Cheers,
Matthias

On Thu, 2015-07-16 at 10:03 +0100, jian zhang wrote:
> Hi Matthias,
>
>
> Thanks for the information.
>
>
> I tested on moses 3.0, adding phrase table sparse feature is seems
> working.
>
>
> However, I did not add any flag into ini, like suggested "If a phrase
> table contains sparse features, then this needs to be flagged in the
> configuration file by adding the word sparse after the phrase table
> file name.". Did i miss anything?
>
>
> Regards,
>
>
> Jian
>
>
>
>
>
>
>
> On Thu, Jul 16, 2015 at 3:23 AM, Matthias Huck <mhuck@inf.ed.ac.uk>
> wrote:
> Hi Jian,
>
> That depends on the nature of the features you're planning to
> implement.
>
> In order to produce sparse features, you need to write a
> feature
> function anyway.
>
> But if it's only a handful of scores and they can be
> calculated during
> extraction time, then go for dense features and add the scores
> directly
> to the phrase table.
>
> If the scores cannot be precalculated, for instance because
> you need
> non-local information that is only available during decoding,
> then a
> feature function implementation becomes necessary.
>
> When you write a feature function that calculates scores
> during decoding
> time, it can produce dense scores, sparse scores, or both
> types. That's
> up to you.
>
> If it's plenty of scores which are fired rarely, then sparse
> is the
> right choice. And you certainly need a sparse feature function
> implementation in case you are not aware in advance of the
> overall
> amount of feature scores it can produce.
>
> If you need information from phrase extraction in order to
> calculate
> scores during decoding time, then we have something denoted as
> "phrase
> properties". Phrase properties give you a means of storing
> arbitrary
> additional information in the phrase table. You have to extend
> the
> extraction pipeline to retrieve and store the phrase
> properties you
> require. The decoder can later read this information from the
> phrase
> table, and your feature function can utilize it in some way.
>
> A large amount of sparse feature scores can somewhat slow down
> decoding
> and tuning. Also, you have to use MIRA or PRO for tuning, not
> MERT.
>
> Cheers,
> Matthias
>
>
> On Thu, 2015-07-16 at 02:18 +0100, jian zhang wrote:
> > Hi Matthias,
> >
> >
> > Not for domain feature.
> >
> >
> > I want to implement some sparse features, so there are two
> options:
> > 1, add to phrase table, if it is supported
> > 2, implement sparse feature functions,
> >
> >
> > I'd like to know are there any difference between these two
> options,
> > for example, tuning, compute sentence translation scores ...
> >
> >
> > Regards,
> >
> >
> >
> > Jian
> >
> >
> >
> > On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck
> <mhuck@inf.ed.ac.uk>
> > wrote:
> > Hi,
> >
> > Are you planning to use binary domain indicator
> features? I'm
> > not sure
> > whether a sparse feature function for this is
> currently
> > implemented. If
> > you're working with a small set of domains, you can
> employ
> > dense
> > indicators instead (domain-features = "indicator" in
> EMS).
> > You'll have
> > to re-extract the phrase table, though. Or process
> it with a
> > script to
> > add dense indicator values to the scores field.
> >
> > I believe that there might also be some bug in the
> extraction
> > pipeline
> > when both domain-features = "sparse indicator" and
> > score-settings =
> > "--GoodTuring" are active in EMS. At least it caused
> me
> > trouble a couple
> > of weeks ago. However, I must admit that I didn't
> investigate
> > it further
> > at that point.
> >
> > Anyway, the bottom line is that I recommend
> re-extracting with
> > dense
> > indicators.
> >
> > But let me know what you find regarding a sparse
> > implementation.
> >
> > Cheers,
> > Matthias
> >
> >
> > On Thu, 2015-07-16 at 00:48 +0100, jian zhang wrote:
> > > Hi,
> > >
> > >
> > > Is the sparse features at phrase table, like
> > >
> > >
> > >
> > > das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718
> ||| 0-0 1-1
> > ||| 5000
> > > 5000 2500 ||| dom_europarl 1
> > >
> > >
> > >
> > > still supported? If yes, what should I set to the
> ini file
> > based on
> > > the example above?
> > >
> > >
> > > Thank,
> > >
> > >
> > > Jian
> > >
> > >
> > > --
> > > Jian Zhang
> > > Centre for Next Generation Localisation (CNGL)
> > > Dublin City University
> >
> > > _______________________________________________
> > > Moses-support mailing list
> > > Moses-support@mit.edu
> > >
> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> > --
> > The University of Edinburgh is a charitable body,
> registered
> > in
> > Scotland, with registration number SC005336.
> >
> >
> >
> >
> >
> > --
> > Jian Zhang
> > Centre for Next Generation Localisation (CNGL)
> > Dublin City University
>
>
>
> --
> The University of Edinburgh is a charitable body, registered
> in
> Scotland, with registration number SC005336.
>
>
>
>
>
>
> --
> Jian Zhang
> Centre for Next Generation Localisation (CNGL)
> Dublin City University

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 105, Issue 33
**********************************************

Moses-support Digest, Vol 105, Issue 33

0 Response to "Moses-support Digest, Vol 105, Issue 33"

Post a Comment