Moses-support Digest, Vol 105, Issue 18

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. EMS and Factors (Marco Damonte)
2. Re: Getting counts in Moses instead of probabilities (Hieu Hoang)
3. Re: EMS and Factors (Philipp Koehn)
4. Re: EMS and Factors (Marco Damonte)
5. Re: Getting counts in Moses instead of probabilities
(Harshit Gupta)


----------------------------------------------------------------------

Message: 1
Date: Wed, 08 Jul 2015 16:27:14 +0000
From: Marco Damonte <mdtux89@gmail.com>
Subject: [Moses-support] EMS and Factors
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAD2JQQM73KAfSFk45q00y7irKQCfPTC_JsSSeTrZX4hP9Fckvw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi all,

I was wondering what is the best way to re-use the factorization files of
previous experiments. It seems to me that as soon as you change some of the
factors you are using (either in the input or in the output) the
CORPUS_nameofthecorpus_factorized.#EXP script will call again the scripts
to generate the factors and combine them. Should I write my scripts in a
way that they can figure out if the computation has already been done? Or
there is a way to accomplish this automatically?

Marco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150708/93aebc43/attachment-0001.htm

------------------------------

Message: 2
Date: Thu, 9 Jul 2015 11:59:14 +0400
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Getting counts in Moses instead of
probabilities
To: Harshit Gupta <harshitgupta165@gmail.com>, moses-support@mit.edu
Message-ID: <559E29D2.2010603@gmail.com>
Content-Type: text/plain; charset="utf-8"

The counts are written in the 5th column in the phrase table.
http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases
This is for debugging purposes only, they don't influence decoding in
anyway.

IF you want to know more about how it works - the counts are stored in
the file extract.*.sorted.gz and extract.*.inv.sorted.gz. The counts are
summed and the probability is calculated by the score program. The
source code for the score program is in
phrase-extract/score-main.cpp


On 08/07/2015 18:05, Harshit Gupta wrote:
> Hi, I am currently working on Moses platform and in the phrase tables,
> I am interested in the counts of phrases instead of phrase translation
> probabilities. Can I get to know this counts ?
> In the Moses manual, it is mentioned that in training process in
> calculating phrase scores that
> "To estimate the phrase translation probability ?(e|f) we proceed as
> follows: First, the extract file is sorted. This ensures that all
> English phrase translations for an foreign phrase are next to each
> other in the file. Thus, we can process the file, one foreign phrase
> at a time, *collect counts* and compute ?(e|f) for that foreign phrase f."
>
> Where are these counts collected ? Where can I get these counts ?
>
> Regards
> Harshit
>
> --
> Harshit Gupta
> Third Year Undergraduate
> Electrical Engineering
> IIT Madras
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

--
Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150709/e9f2a8d8/attachment-0001.htm

------------------------------

Message: 3
Date: Thu, 9 Jul 2015 15:45:04 +0700
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] EMS and Factors
To: Marco Damonte <mdtux89@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDDE8efjshmdsPua+07Z_S-iQmaNyC2LMjXqODOJh=URNw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi,

unfortunately there is currently no support to use some of the
factored data files, when the factors change. You have to do
this outside of EMS.

-phi

On Wed, Jul 8, 2015 at 11:27 PM, Marco Damonte <mdtux89@gmail.com> wrote:
> Hi all,
>
> I was wondering what is the best way to re-use the factorization files of
> previous experiments. It seems to me that as soon as you change some of the
> factors you are using (either in the input or in the output) the
> CORPUS_nameofthecorpus_factorized.#EXP script will call again the scripts to
> generate the factors and combine them. Should I write my scripts in a way
> that they can figure out if the computation has already been done? Or there
> is a way to accomplish this automatically?
>
> Marco
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

Message: 4
Date: Thu, 09 Jul 2015 08:46:16 +0000
From: Marco Damonte <mdtux89@gmail.com>
Subject: Re: [Moses-support] EMS and Factors
To: Philipp Koehn <phi@jhu.edu>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAD2JQQNC4c9jHONfVjWC0j+pF-J4nH23LjfsrrftarY3WPosAg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I understand..

Thank you.
Marco

On Thu, 9 Jul 2015 9:45 am Philipp Koehn <phi@jhu.edu> wrote:

> Hi,
>
> unfortunately there is currently no support to use some of the
> factored data files, when the factors change. You have to do
> this outside of EMS.
>
> -phi
>
> On Wed, Jul 8, 2015 at 11:27 PM, Marco Damonte <mdtux89@gmail.com> wrote:
> > Hi all,
> >
> > I was wondering what is the best way to re-use the factorization files of
> > previous experiments. It seems to me that as soon as you change some of
> the
> > factors you are using (either in the input or in the output) the
> > CORPUS_nameofthecorpus_factorized.#EXP script will call again the
> scripts to
> > generate the factors and combine them. Should I write my scripts in a way
> > that they can figure out if the computation has already been done? Or
> there
> > is a way to accomplish this automatically?
> >
> > Marco
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150709/9539a952/attachment-0001.htm

------------------------------

Message: 5
Date: Thu, 9 Jul 2015 15:49:43 +0530
From: Harshit Gupta <harshitgupta165@gmail.com>
Subject: Re: [Moses-support] Getting counts in Moses instead of
probabilities
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support@mit.edu
Message-ID:
<CAHgj_vuNVfcdtL+yrDgfCj3jwutBQC86YTpKTBF1w9U0mPj=3A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Hieu, Thanks fot the reply. However, I have some further doubts in this.
By count of a phrase, I want to know how many times a phrase is repeated in
the corpora. So, can I get this counts from the cpp source file you have
mentioned ?
Also, in the phrase tables, the first four columns are for lexical
weighting and phrase translation probabilities and then there are
alignments between the source and target language. Here also, is it
possible to get the counts of the phrases ?

Regards
Harshit

On Thu, Jul 9, 2015 at 1:29 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:

> The counts are written in the 5th column in the phrase table.
> http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases
> This is for debugging purposes only, they don't influence decoding in
> anyway.
>
> IF you want to know more about how it works - the counts are stored in the
> file extract.*.sorted.gz and extract.*.inv.sorted.gz. The counts are summed
> and the probability is calculated by the score program. The source code for
> the score program is in
> phrase-extract/score-main.cpp
>
>
> On 08/07/2015 18:05, Harshit Gupta wrote:
>
> Hi, I am currently working on Moses platform and in the phrase tables,
> I am interested in the counts of phrases instead of phrase translation
> probabilities. Can I get to know this counts ?
> In the Moses manual, it is mentioned that in training process in
> calculating phrase scores that
> "To estimate the phrase translation probability ?(e|f) we proceed as
> follows: First, the extract file is sorted. This ensures that all English
> phrase translations for an foreign phrase are next to each other in the
> file. Thus, we can process the file, one foreign phrase at a time, *collect
> counts* and compute ?(e|f) for that foreign phrase f."
>
> Where are these counts collected ? Where can I get these counts ?
>
> Regards
> Harshit
>
> --
> Harshit Gupta
> Third Year Undergraduate
> Electrical Engineering
> IIT Madras
>
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> --
> Hieu Hoang
> Researcher
> New York University, Abu Dhabihttp://www.hoang.co.uk/hieu
>
>


--
Harshit Gupta
Third Year Undergraduate
Electrical Engineering
IIT Madras
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150709/5df443bb/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: phrase-table
Type: application/octet-stream
Size: 176375 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150709/5df443bb/attachment.obj

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 105, Issue 18
**********************************************

0 Response to "Moses-support Digest, Vol 105, Issue 18"

Post a Comment