Moses-support Digest, Vol 97, Issue 94

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: WG: Unknown single words that are part of phrases
(Matthias Huck)
2. Re: WG: Unknown single words that are part of phrases
(Barry Haddow)
3. Re: WG: Unknown single words that are part of phrases
(Matthias Huck)


----------------------------------------------------------------------

Message: 1
Date: Thu, 27 Nov 2014 16:15:02 +0000
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] WG: Unknown single words that are part of
phrases
To: "Vera Aleksic, Linguatec GmbH" <v.aleksic@linguatec.de>
Cc: "moses-support \(moses-support@mit.edu\)" <moses-support@mit.edu>
Message-ID: <1417104902.2175.56.camel@portedgar>
Content-Type: text/plain; charset="UTF-8"

Hi Vera,

It's odd that the lexical translation model contains such an entry if
the pair is always unaligned. Maybe you used a different word alignment
when you extracted the lexicon model?

You should manually have a look at your word alignment in order to check
whether it has reasonable quality. There's a visualization tool called
"Picaro" in Moses:

$ moses/contrib/picaro/picaro.py -a1 model/aligned.1.grow-diag-final-and -f model/aligned.1.0.de -e model/aligned.1.0.en

In order to find out whether the symmetrization heuristic is an issue
for you, you can compare the standard and inverse GIZA alignments with
the symmetrized alignment.

Ways to experiment with word alignment quality are for instance:

- Choosing a different symmetrization heuristic
- Modifying the GIZA settings, e.g. by training with a different number
of EM iterations or a different sequence of IBM/HMM models
- Using some other method for training word alignments, e.g. a
discriminative word aligner

Also, if the amount of parallel training data is small, you shouldn't be
surprised if you are not able to train reliable models.

Cheers,
Matthias


On Thu, 2014-11-27 at 14:45 +0100, Vera Aleksic, Linguatec GmbH wrote:
> Hi,
>
> I have one more question:
> In the lex.e2f file there is a translation Gitarre->guitar:
>
> Gitarre guitar 0.4000000
> Gitarre using 0.0000284
> Gitarre ; 0.0000017
>
> Why has not it became part of the phrase table?
>
> Thanks again!
> Vera
>
> -----Urspr?ngliche Nachricht-----
> Von: Vera Aleksic, Linguatec GmbH
> Gesendet: Donnerstag, 27. November 2014 09:42
> An: 'Matthias Huck'; Raj Dabre
> Betreff: AW: [Moses-support] Unknown single words that are part of phrases
>
> Hi,
> Thank you for your answers.
> @Raj, one-word-translations do not exist, I have searched for them. If the grow-diag method probably causes such phenomena, are there any better alternatives?
> @Matthias, you are right, the pair Gitarre-guitar is always unaligned, but I do not really understand why. Why is "guitar" in the example below aligned to "Musikinstrument Gittare", and not to "Gitarre" only? I assume, decomposing "Musik + Instrument" would help? How else could I improve the word alignment quality?
> Thanks!
> Best,
> Vera
>
> f?r ein Musikinstrument wie eine elektrische Gitarre , NULL ({ }) for ({ 1 }) a ({ 2 }) musical ({ }) instrument ({ }) , ({ }) such ({ }) as ({ 4 }) an ({ 5 }) electric ({ 6 }) guitar ({ 3 7 }) ; ({ 8 })
>
> -----Urspr?ngliche Nachricht-----
> Von: Matthias Huck [mailto:mhuck@inf.ed.ac.uk]
> Gesendet: Mittwoch, 26. November 2014 17:54
> An: Raj Dabre
> Cc: Vera Aleksic, Linguatec GmbH; moses-support
> Betreff: Re: [Moses-support] Unknown single words that are part of phrases
>
> Hi,
>
> Supposedly your phrase table does not contain an entry "Gitarre ||| guitar" because this word pair is always unaligned in your training data. You could try to improve your word alignment quality.
>
> Alternatively, you could implement a procedure in the manner of the "forced single word heuristic" as described in:
> D. Stein, D. Vilar, S. Peitz, M. Freitag, M. Huck, and H. Ney. A Guide to Jane, an Open Source Hierarchical Translation Toolkit. The Prague Bulletin of Mathematical Linguistics, number 95, pages 5-18, Prague, Czech Republic, April 2011.
> http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
> (see Fig. 1c).
>
> But the latter would rather be a workaround.
>
> Cheers,
> Matthias
>
>
> On Thu, 2014-11-27 at 01:18 +0900, Raj Dabre wrote:
> > Hello,
> >
> >
> > If I am not wrong this is most likely due to the grow (-diag) method applied to the word aligned data (both directions) before phrase extraction.
> >
> > Furthermore..... one word translations should exist (but not always).... search for them.
> >
> >
> >
> > Regards.
> >
> >
> > On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH <v.aleksic@linguatec.de> wrote:
> > Hi,
> >
> > I have observed many times that some words do not exist as single word translations in the phrase table, although they exist in the training corpus and in multiword phrases.
> > An example:
> > German-English translation for "Gitarre" is unknown, i.e. there is no single word entry for "Gitarre" in the phrase table, although some other phrases containing this word exist (see below).
> > How is it possible?
> > Thanks and best regards,
> > Vera
> >
> >
> > Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| ||| 1 1
> > Gitarre darstellt , unter Beanspruchung ||| guitar using ||| 0.25 2.7351e-11 1 0.0625119 2.718 ||| ||| 4 1
> > Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05 1 0.0625119 2.718 ||| ||| 4 1
> > Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1 0.0625119 2.718 ||| ||| 4 1
> > Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1 0.0625119 2.718 ||| ||| 4 1
> > Kopfplatte einer Gitarre darstellt , ||| head of a guitar using ||| 0.5 4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
> > Kopfplatte einer Gitarre darstellt ||| head of a guitar using ||| 0.5 2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
> > eine elektrische Gitarre , ||| an electric guitar ; ||| 1 0.00107982 1 0.00163632 2.718 ||| ||| 1 1
> > einer Gitarre darstellt , unter ||| of a guitar using ||| 0.333333 6.4754e-07 1 0.00471281 2.718 ||| ||| 3 1
> > einer Gitarre darstellt , ||| of a guitar using ||| 0.333333 0.000309961 1 0.00471281 2.718 ||| ||| 3 1
> > einer Gitarre darstellt ||| of a guitar using ||| 0.333333 0.00217827 1 0.00471281 2.718 ||| ||| 3 1
> > elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1 0.0142097 2.718 ||| ||| 1 1
> > wie eine elektrische Gitarre , ||| as an electric guitar ; |||
> > 1 0.000177339 1 0.000809485 2.718 ||| ||| 1 1
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> > --
> > Raj Dabre.
> > Research Student,
> >
> > Graduate School of Informatics,
> > Kyoto University.
> > CSE MTech, IITB., 2011-2014
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 2
Date: Thu, 27 Nov 2014 16:51:38 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] WG: Unknown single words that are part of
phrases
To: Matthias Huck <mhuck@inf.ed.ac.uk>, "Vera Aleksic, Linguatec GmbH"
<v.aleksic@linguatec.de>
Cc: "moses-support \(moses-support@mit.edu\)" <moses-support@mit.edu>
Message-ID: <5477569A.7010903@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hi Vera

I think the situation you describe could happen even without unaligned
words. Suppose that you have a 2 word sentence on each side, and the
alignment points are (0,0), (0,1) and (1,0) - I think this is possible
with the usual symmetrisation algorithm. Then you would extract the
phrase pair containing 2 2-word phrases, but no phrase pairs containing
1-word phrases. (see below for an example)

You still get lexical weights for the translation of word-0 to word-0
though, since there is an alignment point there

cheers - Barry


[hyperion]bhaddow: cat c.en
a b
[hyperion]bhaddow: cat c.fr
A B
[hyperion]bhaddow: cat c.align
0-0 1-0 0-1
[hyperion]bhaddow: ~/moses.new/bin/extract c.en c.fr c.align e 5
PhraseExtract v1.4, written by Philipp Koehn
phrase extraction from an aligned parallel corpus
[hyperion]bhaddow: cat e
A B ||| a b ||| 0-0 1-0 0-1
[hyperion]bhaddow: ~/moses.new/scripts/training/get-lexical.perl c.en
c.fr c.align c
(c.en,c.fr,c)
FILE: c.fr
FILE: c.en
FILE: c.align
!
Saved: c.f2e and c.e2f
[hyperion]bhaddow: cat c.e2f
a A 0.5000000
a B 1.0000000
b A 0.5000000
[hyperion]bhaddow: cat c.f2e
A a 0.5000000
B a 0.5000000
A b 1.0000000


On 27/11/14 16:15, Matthias Huck wrote:
> Hi Vera,
>
> It's odd that the lexical translation model contains such an entry if
> the pair is always unaligned. Maybe you used a different word alignment
> when you extracted the lexicon model?
>
> You should manually have a look at your word alignment in order to check
> whether it has reasonable quality. There's a visualization tool called
> "Picaro" in Moses:
>
> $ moses/contrib/picaro/picaro.py -a1 model/aligned.1.grow-diag-final-and -f model/aligned.1.0.de -e model/aligned.1.0.en
>
> In order to find out whether the symmetrization heuristic is an issue
> for you, you can compare the standard and inverse GIZA alignments with
> the symmetrized alignment.
>
> Ways to experiment with word alignment quality are for instance:
>
> - Choosing a different symmetrization heuristic
> - Modifying the GIZA settings, e.g. by training with a different number
> of EM iterations or a different sequence of IBM/HMM models
> - Using some other method for training word alignments, e.g. a
> discriminative word aligner
>
> Also, if the amount of parallel training data is small, you shouldn't be
> surprised if you are not able to train reliable models.
>
> Cheers,
> Matthias
>
>
> On Thu, 2014-11-27 at 14:45 +0100, Vera Aleksic, Linguatec GmbH wrote:
>> Hi,
>>
>> I have one more question:
>> In the lex.e2f file there is a translation Gitarre->guitar:
>>
>> Gitarre guitar 0.4000000
>> Gitarre using 0.0000284
>> Gitarre ; 0.0000017
>>
>> Why has not it became part of the phrase table?
>>
>> Thanks again!
>> Vera
>>
>> -----Urspr?ngliche Nachricht-----
>> Von: Vera Aleksic, Linguatec GmbH
>> Gesendet: Donnerstag, 27. November 2014 09:42
>> An: 'Matthias Huck'; Raj Dabre
>> Betreff: AW: [Moses-support] Unknown single words that are part of phrases
>>
>> Hi,
>> Thank you for your answers.
>> @Raj, one-word-translations do not exist, I have searched for them. If the grow-diag method probably causes such phenomena, are there any better alternatives?
>> @Matthias, you are right, the pair Gitarre-guitar is always unaligned, but I do not really understand why. Why is "guitar" in the example below aligned to "Musikinstrument Gittare", and not to "Gitarre" only? I assume, decomposing "Musik + Instrument" would help? How else could I improve the word alignment quality?
>> Thanks!
>> Best,
>> Vera
>>
>> f?r ein Musikinstrument wie eine elektrische Gitarre , NULL ({ }) for ({ 1 }) a ({ 2 }) musical ({ }) instrument ({ }) , ({ }) such ({ }) as ({ 4 }) an ({ 5 }) electric ({ 6 }) guitar ({ 3 7 }) ; ({ 8 })
>>
>> -----Urspr?ngliche Nachricht-----
>> Von: Matthias Huck [mailto:mhuck@inf.ed.ac.uk]
>> Gesendet: Mittwoch, 26. November 2014 17:54
>> An: Raj Dabre
>> Cc: Vera Aleksic, Linguatec GmbH; moses-support
>> Betreff: Re: [Moses-support] Unknown single words that are part of phrases
>>
>> Hi,
>>
>> Supposedly your phrase table does not contain an entry "Gitarre ||| guitar" because this word pair is always unaligned in your training data. You could try to improve your word alignment quality.
>>
>> Alternatively, you could implement a procedure in the manner of the "forced single word heuristic" as described in:
>> D. Stein, D. Vilar, S. Peitz, M. Freitag, M. Huck, and H. Ney. A Guide to Jane, an Open Source Hierarchical Translation Toolkit. The Prague Bulletin of Mathematical Linguistics, number 95, pages 5-18, Prague, Czech Republic, April 2011.
>> http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
>> (see Fig. 1c).
>>
>> But the latter would rather be a workaround.
>>
>> Cheers,
>> Matthias
>>
>>
>> On Thu, 2014-11-27 at 01:18 +0900, Raj Dabre wrote:
>>> Hello,
>>>
>>>
>>> If I am not wrong this is most likely due to the grow (-diag) method applied to the word aligned data (both directions) before phrase extraction.
>>>
>>> Furthermore..... one word translations should exist (but not always).... search for them.
>>>
>>>
>>>
>>> Regards.
>>>
>>>
>>> On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH <v.aleksic@linguatec.de> wrote:
>>> Hi,
>>>
>>> I have observed many times that some words do not exist as single word translations in the phrase table, although they exist in the training corpus and in multiword phrases.
>>> An example:
>>> German-English translation for "Gitarre" is unknown, i.e. there is no single word entry for "Gitarre" in the phrase table, although some other phrases containing this word exist (see below).
>>> How is it possible?
>>> Thanks and best regards,
>>> Vera
>>>
>>>
>>> Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| ||| 1 1
>>> Gitarre darstellt , unter Beanspruchung ||| guitar using ||| 0.25 2.7351e-11 1 0.0625119 2.718 ||| ||| 4 1
>>> Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05 1 0.0625119 2.718 ||| ||| 4 1
>>> Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1 0.0625119 2.718 ||| ||| 4 1
>>> Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1 0.0625119 2.718 ||| ||| 4 1
>>> Kopfplatte einer Gitarre darstellt , ||| head of a guitar using ||| 0.5 4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
>>> Kopfplatte einer Gitarre darstellt ||| head of a guitar using ||| 0.5 2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
>>> eine elektrische Gitarre , ||| an electric guitar ; ||| 1 0.00107982 1 0.00163632 2.718 ||| ||| 1 1
>>> einer Gitarre darstellt , unter ||| of a guitar using ||| 0.333333 6.4754e-07 1 0.00471281 2.718 ||| ||| 3 1
>>> einer Gitarre darstellt , ||| of a guitar using ||| 0.333333 0.000309961 1 0.00471281 2.718 ||| ||| 3 1
>>> einer Gitarre darstellt ||| of a guitar using ||| 0.333333 0.00217827 1 0.00471281 2.718 ||| ||| 3 1
>>> elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1 0.0142097 2.718 ||| ||| 1 1
>>> wie eine elektrische Gitarre , ||| as an electric guitar ; |||
>>> 1 0.000177339 1 0.000809485 2.718 ||| ||| 1 1
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> --
>>> Raj Dabre.
>>> Research Student,
>>>
>>> Graduate School of Informatics,
>>> Kyoto University.
>>> CSE MTech, IITB., 2011-2014
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



------------------------------

Message: 3
Date: Thu, 27 Nov 2014 17:19:33 +0000
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] WG: Unknown single words that are part of
phrases
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Cc: "moses-support \(moses-support@mit.edu\)" <moses-support@mit.edu>
Message-ID: <1417108773.2175.71.camel@portedgar>
Content-Type: text/plain; charset="UTF-8"

Yes, that's right. That's a situation as illustrated in Fig. 1b of
http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
and a "single word heuristic" as proposed in that paper can be a remedy.


On Thu, 2014-11-27 at 16:51 +0000, Barry Haddow wrote:
> Hi Vera
>
> I think the situation you describe could happen even without unaligned
> words. Suppose that you have a 2 word sentence on each side, and the
> alignment points are (0,0), (0,1) and (1,0) - I think this is possible
> with the usual symmetrisation algorithm. Then you would extract the
> phrase pair containing 2 2-word phrases, but no phrase pairs containing
> 1-word phrases. (see below for an example)
>
> You still get lexical weights for the translation of word-0 to word-0
> though, since there is an alignment point there
>
> cheers - Barry
>
>
> [hyperion]bhaddow: cat c.en
> a b
> [hyperion]bhaddow: cat c.fr
> A B
> [hyperion]bhaddow: cat c.align
> 0-0 1-0 0-1
> [hyperion]bhaddow: ~/moses.new/bin/extract c.en c.fr c.align e 5
> PhraseExtract v1.4, written by Philipp Koehn
> phrase extraction from an aligned parallel corpus
> [hyperion]bhaddow: cat e
> A B ||| a b ||| 0-0 1-0 0-1
> [hyperion]bhaddow: ~/moses.new/scripts/training/get-lexical.perl c.en
> c.fr c.align c
> (c.en,c.fr,c)
> FILE: c.fr
> FILE: c.en
> FILE: c.align
> !
> Saved: c.f2e and c.e2f
> [hyperion]bhaddow: cat c.e2f
> a A 0.5000000
> a B 1.0000000
> b A 0.5000000
> [hyperion]bhaddow: cat c.f2e
> A a 0.5000000
> B a 0.5000000
> A b 1.0000000
>
>
> On 27/11/14 16:15, Matthias Huck wrote:
> > Hi Vera,
> >
> > It's odd that the lexical translation model contains such an entry if
> > the pair is always unaligned. Maybe you used a different word alignment
> > when you extracted the lexicon model?
> >
> > You should manually have a look at your word alignment in order to check
> > whether it has reasonable quality. There's a visualization tool called
> > "Picaro" in Moses:
> >
> > $ moses/contrib/picaro/picaro.py -a1 model/aligned.1.grow-diag-final-and -f model/aligned.1.0.de -e model/aligned.1.0.en
> >
> > In order to find out whether the symmetrization heuristic is an issue
> > for you, you can compare the standard and inverse GIZA alignments with
> > the symmetrized alignment.
> >
> > Ways to experiment with word alignment quality are for instance:
> >
> > - Choosing a different symmetrization heuristic
> > - Modifying the GIZA settings, e.g. by training with a different number
> > of EM iterations or a different sequence of IBM/HMM models
> > - Using some other method for training word alignments, e.g. a
> > discriminative word aligner
> >
> > Also, if the amount of parallel training data is small, you shouldn't be
> > surprised if you are not able to train reliable models.
> >
> > Cheers,
> > Matthias
> >
> >
> > On Thu, 2014-11-27 at 14:45 +0100, Vera Aleksic, Linguatec GmbH wrote:
> >> Hi,
> >>
> >> I have one more question:
> >> In the lex.e2f file there is a translation Gitarre->guitar:
> >>
> >> Gitarre guitar 0.4000000
> >> Gitarre using 0.0000284
> >> Gitarre ; 0.0000017
> >>
> >> Why has not it became part of the phrase table?
> >>
> >> Thanks again!
> >> Vera
> >>
> >> -----Urspr?ngliche Nachricht-----
> >> Von: Vera Aleksic, Linguatec GmbH
> >> Gesendet: Donnerstag, 27. November 2014 09:42
> >> An: 'Matthias Huck'; Raj Dabre
> >> Betreff: AW: [Moses-support] Unknown single words that are part of phrases
> >>
> >> Hi,
> >> Thank you for your answers.
> >> @Raj, one-word-translations do not exist, I have searched for them. If the grow-diag method probably causes such phenomena, are there any better alternatives?
> >> @Matthias, you are right, the pair Gitarre-guitar is always unaligned, but I do not really understand why. Why is "guitar" in the example below aligned to "Musikinstrument Gittare", and not to "Gitarre" only? I assume, decomposing "Musik + Instrument" would help? How else could I improve the word alignment quality?
> >> Thanks!
> >> Best,
> >> Vera
> >>
> >> f?r ein Musikinstrument wie eine elektrische Gitarre , NULL ({ }) for ({ 1 }) a ({ 2 }) musical ({ }) instrument ({ }) , ({ }) such ({ }) as ({ 4 }) an ({ 5 }) electric ({ 6 }) guitar ({ 3 7 }) ; ({ 8 })
> >>
> >> -----Urspr?ngliche Nachricht-----
> >> Von: Matthias Huck [mailto:mhuck@inf.ed.ac.uk]
> >> Gesendet: Mittwoch, 26. November 2014 17:54
> >> An: Raj Dabre
> >> Cc: Vera Aleksic, Linguatec GmbH; moses-support
> >> Betreff: Re: [Moses-support] Unknown single words that are part of phrases
> >>
> >> Hi,
> >>
> >> Supposedly your phrase table does not contain an entry "Gitarre ||| guitar" because this word pair is always unaligned in your training data. You could try to improve your word alignment quality.
> >>
> >> Alternatively, you could implement a procedure in the manner of the "forced single word heuristic" as described in:
> >> D. Stein, D. Vilar, S. Peitz, M. Freitag, M. Huck, and H. Ney. A Guide to Jane, an Open Source Hierarchical Translation Toolkit. The Prague Bulletin of Mathematical Linguistics, number 95, pages 5-18, Prague, Czech Republic, April 2011.
> >> http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
> >> (see Fig. 1c).
> >>
> >> But the latter would rather be a workaround.
> >>
> >> Cheers,
> >> Matthias
> >>
> >>
> >> On Thu, 2014-11-27 at 01:18 +0900, Raj Dabre wrote:
> >>> Hello,
> >>>
> >>>
> >>> If I am not wrong this is most likely due to the grow (-diag) method applied to the word aligned data (both directions) before phrase extraction.
> >>>
> >>> Furthermore..... one word translations should exist (but not always).... search for them.
> >>>
> >>>
> >>>
> >>> Regards.
> >>>
> >>>
> >>> On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH <v.aleksic@linguatec.de> wrote:
> >>> Hi,
> >>>
> >>> I have observed many times that some words do not exist as single word translations in the phrase table, although they exist in the training corpus and in multiword phrases.
> >>> An example:
> >>> German-English translation for "Gitarre" is unknown, i.e. there is no single word entry for "Gitarre" in the phrase table, although some other phrases containing this word exist (see below).
> >>> How is it possible?
> >>> Thanks and best regards,
> >>> Vera
> >>>
> >>>
> >>> Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| ||| 1 1
> >>> Gitarre darstellt , unter Beanspruchung ||| guitar using ||| 0.25 2.7351e-11 1 0.0625119 2.718 ||| ||| 4 1
> >>> Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05 1 0.0625119 2.718 ||| ||| 4 1
> >>> Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1 0.0625119 2.718 ||| ||| 4 1
> >>> Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1 0.0625119 2.718 ||| ||| 4 1
> >>> Kopfplatte einer Gitarre darstellt , ||| head of a guitar using ||| 0.5 4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
> >>> Kopfplatte einer Gitarre darstellt ||| head of a guitar using ||| 0.5 2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
> >>> eine elektrische Gitarre , ||| an electric guitar ; ||| 1 0.00107982 1 0.00163632 2.718 ||| ||| 1 1
> >>> einer Gitarre darstellt , unter ||| of a guitar using ||| 0.333333 6.4754e-07 1 0.00471281 2.718 ||| ||| 3 1
> >>> einer Gitarre darstellt , ||| of a guitar using ||| 0.333333 0.000309961 1 0.00471281 2.718 ||| ||| 3 1
> >>> einer Gitarre darstellt ||| of a guitar using ||| 0.333333 0.00217827 1 0.00471281 2.718 ||| ||| 3 1
> >>> elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1 0.0142097 2.718 ||| ||| 1 1
> >>> wie eine elektrische Gitarre , ||| as an electric guitar ; |||
> >>> 1 0.000177339 1 0.000809485 2.718 ||| ||| 1 1
> >>>
> >>> _______________________________________________
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>
> >>>
> >>>
> >>> --
> >>> Raj Dabre.
> >>> Research Student,
> >>>
> >>> Graduate School of Informatics,
> >>> Kyoto University.
> >>> CSE MTech, IITB., 2011-2014
> >>>
> >>>
> >>> _______________________________________________
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> >> --
> >> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
>
>



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 97, Issue 94
*********************************************

0 Response to "Moses-support Digest, Vol 97, Issue 94"

Post a Comment