Moses-support Digest, Vol 93, Issue 4

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Using word embeddings in Moses (Hubert Soyer)
2. Re: Using word embeddings in Moses (Philipp Koehn)
3. Re: representation of coverage vector (bitset operations)?
(Hieu Hoang)
4. Re: Using word embeddings in Moses (Hubert Soyer)

----------------------------------------------------------------------

Message: 1
Date: Thu, 3 Jul 2014 00:23:42 +0900
From: Hubert Soyer <hubert.soyer@googlemail.com>
Subject: Re: [Moses-support] Using word embeddings in Moses
To: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Cc: moses-support@mit.edu
Message-ID:
<CAM7TO-hx=TawzQOXnQ+rzL26jeBJxQ-J6tQRWJHYdairQM09wg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Yes, I am thinking of a new feature function based on word vectors.
Thank you for your suggestion about the generation step, I'll look into it,
maybe I'll find a way.

I will also try to create a feature function directly.

Thanks again!

Best,

Hubert
On Jul 2, 2014 11:02 PM, "Philipp Koehn" <pkoehn@inf.ed.ac.uk> wrote:

> Hi,
>
> it would be better to include a word vector obtained by word2vec or other
> means
> as a single factor, and generate them with a generation step to avoid
> filling
> up the phrase table with redundant information. Unfortunately, there is no
> source side generation step, which may be a useful addition to the factored
> model.
>
> Of course, the question is what to do with these vectors. I assume that
> you have
> a new feature function in mind.
>
> -phi
>
> On Wed, Jul 2, 2014 at 5:04 AM, Hubert Soyer
> <hubert.soyer@googlemail.com> wrote:
> > Hello,
> >
> > I have checked the mailing list archive for this question but couldn't
> > find anything.
> > I'd be surprised if this question has not been asked yet, if it has,
> > I'd be happy if you could point me to the corresponding mails.
> >
> > Recently, word representations induced by neural networks have gained
> > a lot of momentum.
> > Particularly often cited in this context is:
> > http://code.google.com/p/word2vec/
> >
> > Those vector word representations are vectors that carry some semantic
> > meaning in them, i.e. semantically similar words have similar vectors
> > (small distances to each other).
> >
> > I have been wondering about the best way to incorporate them in Moses.
> >
> > One solution would be to incorporate them as factors in a factored model:
> >
> > http://www.statmt.org/moses/?n=Moses.FactoredTutorial
> >
> > It seems to me that I would have to treat each dimension of each word
> > vector as a separate factor which would lead to a lot of factors.
> > Usual dimensionalities of those word vectors are 200 or more.
> >
> > Is treating each dimension as a factor the best way to incorporate
> > those vectors or is there anything better I can do?
> > I don't have to stick to factors, if there is another way.
> >
> > Thank you in advance!
> >
> > Best,
> >
> > Hubert
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140703/e5afc74e/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 2 Jul 2014 13:45:51 -0400
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] Using word embeddings in Moses
To: Hubert Soyer <hubert.soyer@googlemail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDAxGVxryst3GbfSF9C1+VAkA2TJupKv83f46_m35ry81Q@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi,

you can also have your feature function read in the word vector mapping table.

-phi

On Wed, Jul 2, 2014 at 11:23 AM, Hubert Soyer
<hubert.soyer@googlemail.com> wrote:
> Yes, I am thinking of a new feature function based on word vectors.
> Thank you for your suggestion about the generation step, I'll look into it,
> maybe I'll find a way.
>
> I will also try to create a feature function directly.
>
> Thanks again!
>
> Best,
>
> Hubert
>
> On Jul 2, 2014 11:02 PM, "Philipp Koehn" <pkoehn@inf.ed.ac.uk> wrote:
>>
>> Hi,
>>
>> it would be better to include a word vector obtained by word2vec or other
>> means
>> as a single factor, and generate them with a generation step to avoid
>> filling
>> up the phrase table with redundant information. Unfortunately, there is no
>> source side generation step, which may be a useful addition to the
>> factored
>> model.
>>
>> Of course, the question is what to do with these vectors. I assume that
>> you have
>> a new feature function in mind.
>>
>> -phi
>>
>> On Wed, Jul 2, 2014 at 5:04 AM, Hubert Soyer
>> <hubert.soyer@googlemail.com> wrote:
>> > Hello,
>> >
>> > I have checked the mailing list archive for this question but couldn't
>> > find anything.
>> > I'd be surprised if this question has not been asked yet, if it has,
>> > I'd be happy if you could point me to the corresponding mails.
>> >
>> > Recently, word representations induced by neural networks have gained
>> > a lot of momentum.
>> > Particularly often cited in this context is:
>> > http://code.google.com/p/word2vec/
>> >
>> > Those vector word representations are vectors that carry some semantic
>> > meaning in them, i.e. semantically similar words have similar vectors
>> > (small distances to each other).
>> >
>> > I have been wondering about the best way to incorporate them in Moses.
>> >
>> > One solution would be to incorporate them as factors in a factored
>> > model:
>> >
>> > http://www.statmt.org/moses/?n=Moses.FactoredTutorial
>> >
>> > It seems to me that I would have to treat each dimension of each word
>> > vector as a separate factor which would lead to a lot of factors.
>> > Usual dimensionalities of those word vectors are 200 or more.
>> >
>> > Is treating each dimension as a factor the best way to incorporate
>> > those vectors or is there anything better I can do?
>> > I don't have to stick to factors, if there is another way.
>> >
>> > Thank you in advance!
>> >
>> > Best,
>> >
>> > Hubert
>> > _______________________________________________
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

Message: 3
Date: Wed, 02 Jul 2014 19:11:07 -0400
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] representation of coverage vector (bitset
operations)?
To: Adam Teichert <adamteichert@gmail.com>, moses-support@mit.edu
Message-ID: <53B4918B.7010701@gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

I believe we tested the speed of different implementation when moses 1st
started and the current implementation seemed like the fastest. We
haven't tested it since.

If there's faster/cleaner implementation, please let us know and do a
git pull request and we'll gladly pull it

On 02/07/14 10:57, Adam Teichert wrote:
> This is probably a very naive question, but I go for it anyway? Does
> anyone have any comments about why WordsBitmap doesn't use the stl or
> boost bit-vector / bit-set datastructures (for example
> std::vector<bool>, std::bitset or the boost dynamically sized version)?
>
> This is just coming up because I'm interested in doing some bit-vector
> operations with the coverage vector and I'd like them to look more
> like constant time operations than linear time operations in the size
> of the source sentence.
>
> Any comments?
>
> https://github.com/moses-smt/mosesdecoder/blob/73513c182dd3e12abf4a1df82a7e7808e3dfab83/moses/WordsBitmap.h
>
> Thanks!
>
> --Adam
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140702/d418d6e6/attachment-0001.htm

------------------------------

Message: 4
Date: Thu, 3 Jul 2014 09:49:02 +0900
From: Hubert Soyer <hubert.soyer@googlemail.com>
Subject: Re: [Moses-support] Using word embeddings in Moses
To: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAM7TO-j70L_wM0xMGw9t7Df9g=iQycss-7nq_MKpaMpbyPM5eg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

I think that's the easiest way to go for now.
I'll give that a try!

Thank you very much!

Best,

Hubert

On Thu, Jul 3, 2014 at 2:45 AM, Philipp Koehn <pkoehn@inf.ed.ac.uk> wrote:
> Hi,
>
> you can also have your feature function read in the word vector mapping table.
>
> -phi
>
> On Wed, Jul 2, 2014 at 11:23 AM, Hubert Soyer
> <hubert.soyer@googlemail.com> wrote:
>> Yes, I am thinking of a new feature function based on word vectors.
>> Thank you for your suggestion about the generation step, I'll look into it,
>> maybe I'll find a way.
>>
>> I will also try to create a feature function directly.
>>
>> Thanks again!
>>
>> Best,
>>
>> Hubert
>>
>> On Jul 2, 2014 11:02 PM, "Philipp Koehn" <pkoehn@inf.ed.ac.uk> wrote:
>>>
>>> Hi,
>>>
>>> it would be better to include a word vector obtained by word2vec or other
>>> means
>>> as a single factor, and generate them with a generation step to avoid
>>> filling
>>> up the phrase table with redundant information. Unfortunately, there is no
>>> source side generation step, which may be a useful addition to the
>>> factored
>>> model.
>>>
>>> Of course, the question is what to do with these vectors. I assume that
>>> you have
>>> a new feature function in mind.
>>>
>>> -phi
>>>
>>> On Wed, Jul 2, 2014 at 5:04 AM, Hubert Soyer
>>> <hubert.soyer@googlemail.com> wrote:
>>> > Hello,
>>> >
>>> > I have checked the mailing list archive for this question but couldn't
>>> > find anything.
>>> > I'd be surprised if this question has not been asked yet, if it has,
>>> > I'd be happy if you could point me to the corresponding mails.
>>> >
>>> > Recently, word representations induced by neural networks have gained
>>> > a lot of momentum.
>>> > Particularly often cited in this context is:
>>> > http://code.google.com/p/word2vec/
>>> >
>>> > Those vector word representations are vectors that carry some semantic
>>> > meaning in them, i.e. semantically similar words have similar vectors
>>> > (small distances to each other).
>>> >
>>> > I have been wondering about the best way to incorporate them in Moses.
>>> >
>>> > One solution would be to incorporate them as factors in a factored
>>> > model:
>>> >
>>> > http://www.statmt.org/moses/?n=Moses.FactoredTutorial
>>> >
>>> > It seems to me that I would have to treat each dimension of each word
>>> > vector as a separate factor which would lead to a lot of factors.
>>> > Usual dimensionalities of those word vectors are 200 or more.
>>> >
>>> > Is treating each dimension as a factor the best way to incorporate
>>> > those vectors or is there anything better I can do?
>>> > I don't have to stick to factors, if there is another way.
>>> >
>>> > Thank you in advance!
>>> >
>>> > Best,
>>> >
>>> > Hubert
>>> > _______________________________________________
>>> > Moses-support mailing list
>>> > Moses-support@mit.edu
>>> > http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 93, Issue 4
********************************************

Moses-support Digest, Vol 93, Issue 4

0 Response to "Moses-support Digest, Vol 93, Issue 4"

Post a Comment