Moses-support Digest, Vol 84, Issue 12

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: lattices with EPSILON (Hieu Hoang)

----------------------------------------------------------------------

Message: 1
Date: Mon, 7 Oct 2013 16:56:43 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] lattices with EPSILON
To: Yulia Tsvetkov <yulia.tsvetkov@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbgF9eZQqiHiVeSPetqm4-QS4yp3J9gFeAVnrysLFKztMw@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

ok, I've limited the maximum length of the input paths created with the
arguments
-max-phrase-length ???
by default, this is 20 which will still consume a fair bit of memory, but
should be under 15GB.

In fact, you can set this to
-max-phrase-length 7
as by default, the maximum EXTRACTED length is 7.

code is here

https://github.com/moses-smt/mosesdecoder/commit/8b9d4d1c7dac2f2a53e9a4d5949e10a4511aeb0c

On 7 October 2013 11:54, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:

> @ondrej - Yes, Yulia's lattices look like confusion networks in disguised
> so there will be a large number of paths through the lattice.
>
> the memory explosion is due to my code creating an object for every path.
> It was mainly for the reason mention previously above, ie:
>
> I want to give each feature function the opprtunity to score with full
> knowledge of the path.
>
> However, the old binary phrase-table doesn't require these objects to do
> the lookups. Therefore, to enable Yulia and anyone else to decode large
> lattices, my code will not run when
> 1. decoding lattice/confusion networks, AND
> 2. using the old binary phrase table.
>
> @Liang - thanks for the suggestions. I'm not sure how our lattice were
> created. Lexi knows
>
> thanks for all who responded, was very useful.
>
>
>
> On 4 October 2013 22:20, Ondrej Bojar <bojar@ufal.mff.cuni.cz> wrote:
>
>> Hi,
>>
>> while you can always run rmepsilon from openfst or other toolkit, epsilon
>> edges will be probably particularly useful if one would use different
>> semirings for different components of the score vector. With generic
>> toolkits, all the components of the score vector are processed in a single
>> manner. Depending on whether Moses features do the "plus" of their
>> respective scores on their own, each feature can use its own semiring.
>>
>> The probably (in some sense) maximal explosion in the number of paths is
>> achieved when the lattice has the form of a confusion network (no
>> epsilons). You get the full cartesian product of choices of the first
>> token, the second token etc.
>>
>> Cheers, Ondrej.
>>
>> "Hieu Hoang" <Hieu.Hoang@ed.ac.uk> wrote:
>>
>> >@nicola - i didn't see a reason either but some lattices from a speech
>> >recognizer contains them so was just curious. I think chris has a point -
>> >they may be easier to create.
>> >
>> >I think they may also more efficient to decode. In a non-deterministic
>> >lattice, you might have the 2 edges with the same symbol coming out of 1
>> >node. Each would have to be decoded separately.
>> >
>> >However, its a pain to decode epsilons and there might be weird edge
>> cases,
>> >eg. consecutive, beginning and end epsilons, entirely epsiloms.
>> >
>> >@chris - cheers for the explanation. i might use victor's code and see
>> how
>> >it goes.
>> >
>> >Do you have an example (large) lattice that blows up memory that you can
>> >share?
>> >
>> >Yes - i've changed the code to extract all possible paths. In fact, i
>> >extract all paths from beginning to end of sentence, without limit. 2
>> >reasons for this
>> > 1. I also divorced extracting the path creation from the phrase-table
>> >lookup. In the general case there's multiple phrase-tables so it's
>> >difficult to keep track of the tries. Also, the intertwinning of the
>> binary
>> >pt loookup with lattices made it difficult to read.
>> > 2. I want to give each feature function the opprtunity to score with
>> >full knowledge of the path.
>> >
>> >This may have to be altered if the memory explosion is too drastic
>> >
>> >
>> >
>> >
>> >On 4 October 2013 17:49, Chris Dyer <cdyer@cs.cmu.edu> wrote:
>> >
>> >> It's useful to have epsilons since it simplifies the creation of
>> >> lattices in some cases. Yes, you can convert them to a deterministic
>> >> equivalent, but that involves implementing FSA determinatization (or
>> >> using a tool like https://pypi.python.org/pypi/pyfst), which may not
>> >> be convenient.
>> >>
>> >> Btw, I've also noticed that memory usage with lattices/CNs explodes
>> >> with non-binarized phrase tables (maybe also with binarized PTs?).
>> >> This is independent of the size of the phrase table and only seems to
>> >> be a function of the lattice structure. I'm not sure what's going on
>> >> (the code has changed substantially since I last looked at it). But,
>> >> you should always match paths in the lattice with paths in the phrase
>> >> table trie- maybe moses is now trying to extract all possible paths in
>> >> the lattice up to max-phrase-size or something?
>> >>
>> >> On Fri, Oct 4, 2013 at 11:22 AM, Nicola Bertoldi <bertoldi@fbk.eu>
>> wrote:
>> >> > I don't see any reason why a lattice should contain an EPSILON edge.
>> >> >
>> >> > In a confusion network, EPSILON are needed to allow the translation
>> of
>> >> input of different lengths.
>> >> > The sausage structure of the CN imposes the same amount of source
>> words,
>> >> > and the EPSILONs overcome this constraint.
>> >> >
>> >> > This is not the case for lattice, because you can have any number of
>> >> edges/words in a complete source path.
>> >> >
>> >> >
>> >> > cheers,
>> >> > Nicola
>> >> >
>> >> >
>> >> >
>> >> > On Oct 4, 2013, at 2:52 PM, Hieu Hoang wrote:
>> >> >
>> >> > I'm just looking at the lattices decoding, as implemented in moses.
>> >> >
>> >> > for confusion networks, it's fair to have EPSILON words (that
>> represent
>> >> blank words). However, I don't see the point of them in lattices.
>> >> >
>> >> > Anyone have an opinion? How is it implemented in cdec & joshua?
>> >> >
>> >> > --
>> >> > Hieu Hoang
>> >> > Research Associate
>> >> > University of Edinburgh
>> >> > http://www.hoang.co.uk/hieu
>> >> >
>> >> > _______________________________________________
>> >> > Moses-support mailing list
>> >> > Moses-support@mit.edu<mailto:Moses-support@mit.edu>
>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Moses-support mailing list
>> >> > Moses-support@mit.edu
>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups
>> >> "cdec users" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an
>> >> email to cdec-users+unsubscribe@googlegroups.com.
>> >> For more options, visit https://groups.google.com/groups/opt_out.
>> >>
>> >
>> >
>> >
>> >--
>> >Hieu Hoang
>> >Research Associate
>> >University of Edinburgh
>> >http://www.hoang.co.uk/hieu
>> >_______________________________________________
>> >Moses-support mailing list
>> >Moses-support@mit.edu
>> >http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> --
>> Ondrej Bojar
>> http://www.cuni.cz/~obo
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131007/85886e41/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 84, Issue 12
*********************************************

Moses-support Digest, Vol 84, Issue 12

0 Response to "Moses-support Digest, Vol 84, Issue 12"

Post a Comment