Moses-support Digest, Vol 97, Issue 89

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Unknown single words that are part of phrases (Matthias Huck)


----------------------------------------------------------------------

Message: 1
Date: Wed, 26 Nov 2014 16:53:51 +0000
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] Unknown single words that are part of
phrases
To: Raj Dabre <prajdabre@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <1417020831.2175.34.camel@portedgar>
Content-Type: text/plain; charset="UTF-8"

Hi,

Supposedly your phrase table does not contain an entry "Gitarre |||
guitar" because this word pair is always unaligned in your training
data. You could try to improve your word alignment quality.

Alternatively, you could implement a procedure in the manner of the
"forced single word heuristic" as described in:
D. Stein, D. Vilar, S. Peitz, M. Freitag, M. Huck, and H. Ney. A Guide
to Jane, an Open Source Hierarchical Translation Toolkit. The Prague
Bulletin of Mathematical Linguistics, number 95, pages 5-18, Prague,
Czech Republic, April 2011.
http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
(see Fig. 1c).

But the latter would rather be a workaround.

Cheers,
Matthias


On Thu, 2014-11-27 at 01:18 +0900, Raj Dabre wrote:
> Hello,
>
>
> If I am not wrong this is most likely due to the grow (-diag) method applied to the word aligned data (both directions) before phrase extraction.
>
> Furthermore..... one word translations should exist (but not always).... search for them.
>
>
>
> Regards.
>
>
> On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH <v.aleksic@linguatec.de> wrote:
> Hi,
>
> I have observed many times that some words do not exist as single word translations in the phrase table, although they exist in the training corpus and in multiword phrases.
> An example:
> German-English translation for "Gitarre" is unknown, i.e. there is no single word entry for "Gitarre" in the phrase table, although some other phrases containing this word exist (see below).
> How is it possible?
> Thanks and best regards,
> Vera
>
>
> Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| ||| 1 1
> Gitarre darstellt , unter Beanspruchung ||| guitar using ||| 0.25 2.7351e-11 1 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05 1 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1 0.0625119 2.718 ||| ||| 4 1
> Kopfplatte einer Gitarre darstellt , ||| head of a guitar using ||| 0.5 4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
> Kopfplatte einer Gitarre darstellt ||| head of a guitar using ||| 0.5 2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
> eine elektrische Gitarre , ||| an electric guitar ; ||| 1 0.00107982 1 0.00163632 2.718 ||| ||| 1 1
> einer Gitarre darstellt , unter ||| of a guitar using ||| 0.333333 6.4754e-07 1 0.00471281 2.718 ||| ||| 3 1
> einer Gitarre darstellt , ||| of a guitar using ||| 0.333333 0.000309961 1 0.00471281 2.718 ||| ||| 3 1
> einer Gitarre darstellt ||| of a guitar using ||| 0.333333 0.00217827 1 0.00471281 2.718 ||| ||| 3 1
> elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1 0.0142097 2.718 ||| ||| 1 1
> wie eine elektrische Gitarre , ||| as an electric guitar ; ||| 1 0.000177339 1 0.000809485 2.718 ||| ||| 1 1
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> Raj Dabre.
> Research Student,
>
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 97, Issue 89
*********************************************

0 Response to "Moses-support Digest, Vol 97, Issue 89"

Post a Comment