Moses-support Digest, Vol 104, Issue 63

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: please help me with the code - getting word index
(amir haghighi)
2. Re: Major bug found in Moses (Adam Lopez)
3. Re: please help me with the code - getting word index
(Rico Sennrich)

----------------------------------------------------------------------

Message: 1
Date: Sat, 20 Jun 2015 18:05:14 +0430
From: amir haghighi <amir.haghighi.64@gmail.com>
Subject: Re: [Moses-support] please help me with the code - getting
word index
To: Matthias Huck <mhuck@inf.ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CA+UVbEiSZrb1Yxw92oOcqEFjk4DzbdSek7NKMr41yPArmuk_9g@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Thanks Matthias
ChartHypothesis::GetCurrSourceRange() gets the source span that all
terminals and non terminals in the current hypothesis cover in the source
sentence. I'd like to know which terminals (non terminals) are corresponded
to which source word's index in the source. Could you guide me how to
obtain that?

Thanks again

On Thu, Jun 18, 2015 at 9:48 PM, Matthias Huck <mhuck@inf.ed.ac.uk> wrote:

> Hi,
>
> You can calculate absolute positions in the source sentence based on the
> words range of the current hypothesis and those of the direct
> predecessors (in case of right-hand side non-terminals).
>
> Take a look at these methods:
>
> InputPath::GetWordsRange()
> ChartHypothesis::GetCurrSourceRange()
> ChartCellLabel::GetCoverage()
>
> Cheers,
> Matthias
>
>
> On Thu, 2015-06-18 at 20:23 +0430, amir haghighi wrote:
> > Hi everybody
> >
> >
> > I wrote the following code to get an ordered list from the source words
> > inside a hypothesis. It gets the words in their translation order, but I
> > need not only the words' strings, but also the index of each word in the
> > original sentence.
> >
> > could you please help me how to get the index of each word in srcPhrase,
> in
> > the sentence?
> >
> >
> > void Amir::GetSourcePhrase2(const ChartHypothesis& cur_hypo,Phrase
> > &srcPhrase) const
> > {
> > AmirUtils utility;
> > TargetPhrase targetPh=cur_hypo.GetCurrTargetPhrase();
> > const Phrase *sourcePh=targetPh.GetRuleSource();
> > int targetWordsNum=cur_hypo.GetCurrTargetPhrase().GetSize();
> > std::vector <Word> source, orderedSource;
> > std::vector <int> alignmentVector;
> > std::vector <bool> isAligned;
> >
> > std::vector <std::set <size_t> > sourcePosSets;
> >
> > for(int targetP=0; targetP< targetWordsNum; targetP++ ){
> > //std::cerr<<"setting alignments for targetword:
> "<<targetP<<endl;
> >
> >
> sourcePosSets.push_back(cur_hypo.GetCurrTargetPhrase().GetAlignTerm().GetAlignmentsForTarget(targetP));
> > }
> >
> >
> > for(int ii=targetWordsNum-1; ii>=0; ii--){
> > std::set <size_t> cur_srcPosSet=sourcePosSets[ii];
> > for (std::set <size_t>::const_iterator alignmet =
> > cur_srcPosSet.begin();alignmet != cur_srcPosSet.end(); ++alignmet) {
> > int alignmentElement=*alignmet;
> > for(int index=0; index<ii; index++ ){ //keep the rightmost one
> and
> > remove the othres
> > //remove it from the list
> > if(sourcePosSets[index].size()>0){
> > // std::cerr<<" removing "<<*alignmet<<endl;
> > //std::cerr<<" for set with size:
> > "<<sourcePosSets[index].size()<<endl;
> > sourcePosSets[index].erase(alignmentElement);
> > }
> >
> > }
> > }
> > }
> >
> > for (size_t posT = 0; posT < cur_hypo.GetCurrTargetPhrase().GetSize();
> > ++posT) {
> > const Word &word = cur_hypo.GetCurrTargetPhrase().GetWord(posT);
> > if (word.IsNonTerminal()){
> > // non-term. fill out with prev hypo
> >
> > size_t nonTermInd =
> >
> cur_hypo.GetCurrTargetPhrase().GetAlignNonTerm().GetNonTermIndexMap()[posT];
> > const ChartHypothesis *prevHypo =
> cur_hypo.GetPrevHypo(nonTermInd);
> >
> > GetSourcePhrase2(*prevHypo,srcPhrase);
> > }
> > else{
> >
> > for(std::set<size_t>::const_iterator
> > it=sourcePosSets[posT].begin();it != sourcePosSets[posT].end() ;
> it++
> > ){
> > srcPhrase.AddWord(sourcePh->GetWord(*it));
> > }
> > }
> > }
> >
> >
> > }
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150620/e2a8262f/attachment-0001.htm

------------------------------

Message: 2
Date: Sat, 20 Jun 2015 13:42:55 +0000
From: Adam Lopez <alopez@inf.ed.ac.uk>
Subject: Re: [Moses-support] Major bug found in Moses
To: Rico Sennrich <rico.sennrich@gmx.ch>, moses-support@mit.edu
Message-ID:
<CAE-ScgtQ0ARcE4XxGuTurYPaLmQudkiNoaQdv+2aZtxR1tCz3w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

>
> Can and
> should we make a wider effort to facilitate the reproduction of systems
> by disseminating settings or configuration files? This dissemination is
> partially done by system description papers, but they cannot cover all
> settings [this would make for a very boring paper]. I put some effort
> into documenting my WMT submission by releasing EMS configuration files
> ( https://github.com/rsennrich/wmt2014-scripts/tree/master/example ),
> and I would be happy to see this done more often.
>

Compare with speech recognition, where the major open source toolkit is
Kaldi. One of its stated goals is to collect a set of recipes for
reproducing state-of-the-art results.
http://kaldi.sourceforge.net/about.html

I don't know how well they've succeeded at this. But it's an admirable goal.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150620/a3208923/attachment-0001.htm
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150620/a3208923/attachment-0001.bat

------------------------------

Message: 3
Date: Sat, 20 Jun 2015 14:44:15 +0100
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] please help me with the code - getting
word index
To: moses-support@mit.edu
Message-ID: <55856E2F.4000004@gmx.ch>
Content-Type: text/plain; charset="windows-1252"

Hi Amir,

There is currently no method that returns this, but BilingualLM
(moses/LM/BilingualLM) calculates and uses the absolute source position
of each terminal - search for absolute_source_position.

best wishes,
Rico

On 20/06/15 14:35, amir haghighi wrote:
> Thanks Matthias
> ChartHypothesis::GetCurrSourceRange() gets the source span that all
> terminals and non terminals in the current hypothesis cover in the
> source sentence. I'd like to know which terminals (non terminals) are
> corresponded to which source word's index in the source. Could you
> guide me how to obtain that?
>
> Thanks again
>
>
> On Thu, Jun 18, 2015 at 9:48 PM, Matthias Huck <mhuck@inf.ed.ac.uk
> <mailto:mhuck@inf.ed.ac.uk>> wrote:
>
> Hi,
>
> You can calculate absolute positions in the source sentence based
> on the
> words range of the current hypothesis and those of the direct
> predecessors (in case of right-hand side non-terminals).
>
> Take a look at these methods:
>
> InputPath::GetWordsRange()
> ChartHypothesis::GetCurrSourceRange()
> ChartCellLabel::GetCoverage()
>
> Cheers,
> Matthias
>
>
> On Thu, 2015-06-18 at 20:23 +0430, amir haghighi wrote:
> > Hi everybody
> >
> >
> > I wrote the following code to get an ordered list from the
> source words
> > inside a hypothesis. It gets the words in their translation
> order, but I
> > need not only the words' strings, but also the index of each
> word in the
> > original sentence.
> >
> > could you please help me how to get the index of each word in
> srcPhrase, in
> > the sentence?
> >
> >
> > void Amir::GetSourcePhrase2(const ChartHypothesis& cur_hypo,Phrase
> > &srcPhrase) const
> > {
> > AmirUtils utility;
> > TargetPhrase targetPh=cur_hypo.GetCurrTargetPhrase();
> > const Phrase *sourcePh=targetPh.GetRuleSource();
> > int targetWordsNum=cur_hypo.GetCurrTargetPhrase().GetSize();
> > std::vector <Word> source, orderedSource;
> > std::vector <int> alignmentVector;
> > std::vector <bool> isAligned;
> >
> > std::vector <std::set <size_t> > sourcePosSets;
> >
> > for(int targetP=0; targetP< targetWordsNum; targetP++ ){
> > //std::cerr<<"setting alignments for targetword:
> "<<targetP<<endl;
> >
> >
> sourcePosSets.push_back(cur_hypo.GetCurrTargetPhrase().GetAlignTerm().GetAlignmentsForTarget(targetP));
> > }
> >
> >
> > for(int ii=targetWordsNum-1; ii>=0; ii--){
> > std::set <size_t> cur_srcPosSet=sourcePosSets[ii];
> > for (std::set <size_t>::const_iterator alignmet =
> > cur_srcPosSet.begin();alignmet != cur_srcPosSet.end(); ++alignmet) {
> > int alignmentElement=*alignmet;
> > for(int index=0; index<ii; index++ ){ //keep the
> rightmost one and
> > remove the othres
> > //remove it from the list
> > if(sourcePosSets[index].size()>0){
> > // std::cerr<<" removing "<<*alignmet<<endl;
> > //std::cerr<<" for set with size:
> > "<<sourcePosSets[index].size()<<endl;
> > sourcePosSets[index].erase(alignmentElement);
> > }
> >
> > }
> > }
> > }
> >
> > for (size_t posT = 0; posT <
> cur_hypo.GetCurrTargetPhrase().GetSize();
> > ++posT) {
> > const Word &word = cur_hypo.GetCurrTargetPhrase().GetWord(posT);
> > if (word.IsNonTerminal()){
> > // non-term. fill out with prev hypo
> >
> > size_t nonTermInd =
> >
> cur_hypo.GetCurrTargetPhrase().GetAlignNonTerm().GetNonTermIndexMap()[posT];
> > const ChartHypothesis *prevHypo =
> cur_hypo.GetPrevHypo(nonTermInd);
> >
> > GetSourcePhrase2(*prevHypo,srcPhrase);
> > }
> > else{
> >
> > for(std::set<size_t>::const_iterator
> > it=sourcePosSets[posT].begin();it != sourcePosSets[posT].end() ;
> it++
> > ){
> > srcPhrase.AddWord(sourcePh->GetWord(*it));
> > }
> > }
> > }
> >
> >
> > }
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150620/a9fc5355/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 104, Issue 63
**********************************************

Moses-support Digest, Vol 104, Issue 63

0 Response to "Moses-support Digest, Vol 104, Issue 63"

Post a Comment