Moses-support Digest, Vol 84, Issue 8

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: lattices with EPSILON (Chris Dyer)
2. Re: lattices with EPSILON (Hieu Hoang)
3. Re: lattices with EPSILON (Yulia Tsvetkov)


----------------------------------------------------------------------

Message: 1
Date: Fri, 4 Oct 2013 12:49:31 -0400
From: Chris Dyer <cdyer@cs.cmu.edu>
Subject: Re: [Moses-support] lattices with EPSILON
To: Nicola Bertoldi <bertoldi@fbk.eu>
Cc: Hieu Hoang <hieu.hoang@ed.ac.uk>, moses-support
<moses-support@mit.edu>, "<joshua_technical@googlegroups.com>"
<joshua_technical@googlegroups.com>, "<cdec-users@googlegroups.com>"
<cdec-users@googlegroups.com>
Message-ID:
<CAEHEvxP-FMzKE17JFamBJkTK7mou3tCheSUxXK1TjrLt4+kkSQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

It's useful to have epsilons since it simplifies the creation of
lattices in some cases. Yes, you can convert them to a deterministic
equivalent, but that involves implementing FSA determinatization (or
using a tool like https://pypi.python.org/pypi/pyfst), which may not
be convenient.

Btw, I've also noticed that memory usage with lattices/CNs explodes
with non-binarized phrase tables (maybe also with binarized PTs?).
This is independent of the size of the phrase table and only seems to
be a function of the lattice structure. I'm not sure what's going on
(the code has changed substantially since I last looked at it). But,
you should always match paths in the lattice with paths in the phrase
table trie- maybe moses is now trying to extract all possible paths in
the lattice up to max-phrase-size or something?

On Fri, Oct 4, 2013 at 11:22 AM, Nicola Bertoldi <bertoldi@fbk.eu> wrote:
> I don't see any reason why a lattice should contain an EPSILON edge.
>
> In a confusion network, EPSILON are needed to allow the translation of input of different lengths.
> The sausage structure of the CN imposes the same amount of source words,
> and the EPSILONs overcome this constraint.
>
> This is not the case for lattice, because you can have any number of edges/words in a complete source path.
>
>
> cheers,
> Nicola
>
>
>
> On Oct 4, 2013, at 2:52 PM, Hieu Hoang wrote:
>
> I'm just looking at the lattices decoding, as implemented in moses.
>
> for confusion networks, it's fair to have EPSILON words (that represent blank words). However, I don't see the point of them in lattices.
>
> Anyone have an opinion? How is it implemented in cdec & joshua?
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu<mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


------------------------------

Message: 2
Date: Fri, 4 Oct 2013 21:02:42 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] lattices with EPSILON
To: cdec-users@googlegroups.com
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbh9N=TEXwRGU1mBT+V2Z7aEvxseX_ZBh3DVgDPqLMgTjA@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

@nicola - i didn't see a reason either but some lattices from a speech
recognizer contains them so was just curious. I think chris has a point -
they may be easier to create.

I think they may also more efficient to decode. In a non-deterministic
lattice, you might have the 2 edges with the same symbol coming out of 1
node. Each would have to be decoded separately.

However, its a pain to decode epsilons and there might be weird edge cases,
eg. consecutive, beginning and end epsilons, entirely epsiloms.

@chris - cheers for the explanation. i might use victor's code and see how
it goes.

Do you have an example (large) lattice that blows up memory that you can
share?

Yes - i've changed the code to extract all possible paths. In fact, i
extract all paths from beginning to end of sentence, without limit. 2
reasons for this
1. I also divorced extracting the path creation from the phrase-table
lookup. In the general case there's multiple phrase-tables so it's
difficult to keep track of the tries. Also, the intertwinning of the binary
pt loookup with lattices made it difficult to read.
2. I want to give each feature function the opprtunity to score with
full knowledge of the path.

This may have to be altered if the memory explosion is too drastic




On 4 October 2013 17:49, Chris Dyer <cdyer@cs.cmu.edu> wrote:

> It's useful to have epsilons since it simplifies the creation of
> lattices in some cases. Yes, you can convert them to a deterministic
> equivalent, but that involves implementing FSA determinatization (or
> using a tool like https://pypi.python.org/pypi/pyfst), which may not
> be convenient.
>
> Btw, I've also noticed that memory usage with lattices/CNs explodes
> with non-binarized phrase tables (maybe also with binarized PTs?).
> This is independent of the size of the phrase table and only seems to
> be a function of the lattice structure. I'm not sure what's going on
> (the code has changed substantially since I last looked at it). But,
> you should always match paths in the lattice with paths in the phrase
> table trie- maybe moses is now trying to extract all possible paths in
> the lattice up to max-phrase-size or something?
>
> On Fri, Oct 4, 2013 at 11:22 AM, Nicola Bertoldi <bertoldi@fbk.eu> wrote:
> > I don't see any reason why a lattice should contain an EPSILON edge.
> >
> > In a confusion network, EPSILON are needed to allow the translation of
> input of different lengths.
> > The sausage structure of the CN imposes the same amount of source words,
> > and the EPSILONs overcome this constraint.
> >
> > This is not the case for lattice, because you can have any number of
> edges/words in a complete source path.
> >
> >
> > cheers,
> > Nicola
> >
> >
> >
> > On Oct 4, 2013, at 2:52 PM, Hieu Hoang wrote:
> >
> > I'm just looking at the lattices decoding, as implemented in moses.
> >
> > for confusion networks, it's fair to have EPSILON words (that represent
> blank words). However, I don't see the point of them in lattices.
> >
> > Anyone have an opinion? How is it implemented in cdec & joshua?
> >
> > --
> > Hieu Hoang
> > Research Associate
> > University of Edinburgh
> > http://www.hoang.co.uk/hieu
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu<mailto:Moses-support@mit.edu>
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> You received this message because you are subscribed to the Google Groups
> "cdec users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cdec-users+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131004/f2148d3f/attachment-0001.htm

------------------------------

Message: 3
Date: Fri, 4 Oct 2013 16:18:40 -0400
From: Yulia Tsvetkov <yulia.tsvetkov@gmail.com>
Subject: Re: [Moses-support] lattices with EPSILON
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>, cdec-users@googlegroups.com
Message-ID:
<CA+Drf0vMCRwW_Ga6zpn0idh4Mzz=FMApKnG_OetQAfxOj-Y+eQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

>
> Do you have an example (large) lattice that blows up memory that you can
> share?
>

I attach (pruned) lattices from dev2010 IWSLT corpus that I couldn't decode
with 15G memory allocation.

Cheers
Yulia


> Yes - i've changed the code to extract all possible paths. In fact, i
> extract all paths from beginning to end of sentence, without limit. 2
> reasons for this
> 1. I also divorced extracting the path creation from the phrase-table
> lookup. In the general case there's multiple phrase-tables so it's
> difficult to keep track of the tries. Also, the intertwinning of the binary
> pt loookup with lattices made it difficult to read.
> 2. I want to give each feature function the opprtunity to score with
> full knowledge of the path.
>
> This may have to be altered if the memory explosion is too drastic
>
>
>
>
> On 4 October 2013 17:49, Chris Dyer <cdyer@cs.cmu.edu> wrote:
>
>> It's useful to have epsilons since it simplifies the creation of
>> lattices in some cases. Yes, you can convert them to a deterministic
>> equivalent, but that involves implementing FSA determinatization (or
>> using a tool like https://pypi.python.org/pypi/pyfst), which may not
>> be convenient.
>>
>> Btw, I've also noticed that memory usage with lattices/CNs explodes
>> with non-binarized phrase tables (maybe also with binarized PTs?).
>> This is independent of the size of the phrase table and only seems to
>> be a function of the lattice structure. I'm not sure what's going on
>> (the code has changed substantially since I last looked at it). But,
>> you should always match paths in the lattice with paths in the phrase
>> table trie- maybe moses is now trying to extract all possible paths in
>> the lattice up to max-phrase-size or something?
>>
>> On Fri, Oct 4, 2013 at 11:22 AM, Nicola Bertoldi <bertoldi@fbk.eu> wrote:
>> > I don't see any reason why a lattice should contain an EPSILON edge.
>> >
>> > In a confusion network, EPSILON are needed to allow the translation of
>> input of different lengths.
>> > The sausage structure of the CN imposes the same amount of source words,
>> > and the EPSILONs overcome this constraint.
>> >
>> > This is not the case for lattice, because you can have any number of
>> edges/words in a complete source path.
>> >
>> >
>> > cheers,
>> > Nicola
>> >
>> >
>> >
>> > On Oct 4, 2013, at 2:52 PM, Hieu Hoang wrote:
>> >
>> > I'm just looking at the lattices decoding, as implemented in moses.
>> >
>> > for confusion networks, it's fair to have EPSILON words (that represent
>> blank words). However, I don't see the point of them in lattices.
>> >
>> > Anyone have an opinion? How is it implemented in cdec & joshua?
>> >
>> > --
>> > Hieu Hoang
>> > Research Associate
>> > University of Edinburgh
>> > http://www.hoang.co.uk/hieu
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > Moses-support@mit.edu<mailto:Moses-support@mit.edu>
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "cdec users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cdec-users+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131004/b1bf7bcf/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dev2010.removed
Type: application/octet-stream
Size: 67594 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20131004/b1bf7bcf/attachment.obj

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 84, Issue 8
********************************************

0 Response to "Moses-support Digest, Vol 84, Issue 8"

Post a Comment