Moses-support Digest, Vol 136, Issue 10

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. giza++ - retrieving specific phrase-aligned bi-sentences
(noam ordan)

----------------------------------------------------------------------

Message: 1
Date: Fri, 23 Feb 2018 14:16:12 +0200
From: noam ordan <noam.ordan@gmail.com>
Subject: [Moses-support] giza++ - retrieving specific phrase-aligned
bi-sentences
To: moses-support@mit.edu
Message-ID:
<CAF_8zKapxbAyk871bTFqdpZEZ8omfDLDQbQct8AdVrx4gEMBCg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear friends,

After giza++ is done iterating I normally use the phrase tables for various
purposes. I currently work on a little project of translations and I'd like
to retrieve parallel sentences for translations with low probabilities. So
I look up in the phrase table translation equivalents was low - but still
valid - probability, and my aim is to retrieve those sentences which
include the source-language phrase in a sentence and the respective
target-language sentences which contain the lower probability translation
equivalents.

Looked at by example, consider the preposition 'fii' in Arabic which
translates normally to 'in', then to NULL, then to 'on', 'at', 'by' 'into'
and more (trained on UN corpus). How would you retrieve lines where the
source sentences include 'fii', and the target ones NULL or 'at' but not
'in. Since 'in' is very common generally, searching for lines which *do
not* include 'in', but do include 'at' or NULL yield very results with low
recall, and also, it's not necessary precise, since these low-probability
equivalents can just as well be translations of something else (and they
normally are).

I was wondering:
(a) whether in one of the d-output files there is information about the
most recent probability update and which sentences these updates were taken
from?
(b) whether using this method to retrieve the desired sentences is viable?
(c) are there better ways to go about it?
(d) is anyone for collaborating on this project which I can elaborate in
personal communication?

Thanks a lot,
Noam
------------------------------------------------------
https://sites.google.com/site/noamordan/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20180223/6197fb07/attachment-0001.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 136, Issue 10
**********************************************

Moses-support Digest, Vol 136, Issue 10

0 Response to "Moses-support Digest, Vol 136, Issue 10"

Post a Comment