Moses-support Digest, Vol 147, Issue 9

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. GIZA-format to Pharaoh format (Bram Vanroy)


----------------------------------------------------------------------

Message: 1
Date: Wed, 23 Jan 2019 14:58:26 +0000
From: Bram Vanroy <Bram.Vanroy@UGent.be>
Subject: [Moses-support] GIZA-format to Pharaoh format
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <a4e78c1101154427b6f33e1423962630@xmail103.UGent.be>
Content-Type: text/plain; charset="iso-8859-1"

Dear support list

I am a PhD student in machine translation and as part of my research I would want to compare some alignment tools such as MGIZA++, fast align, efmeral. The easiest format to use for me would be the Pharaoh format, i.e. the format where the source and target token indices are separated by a dash. For instance:

0-0 1-1 2-2 2-3 3-4 4-5

However, when having run(m) GIZA++, I can only find the extended format in the A3 tables, which look like this:

# Sentence pair (1236078) source length 12 target length 17 alignment score : 1.07364e-29
zorg ervoor dat het voer of het extraatje dat het geneesmiddel bevat , volledig wordt opgenomen .
NULL ({ 3 7 12 13 }) ensure ({ 1 2 }) the ({ 4 }) food ({ 5 }) or ({ 6 }) treat ({ 8 }) containing ({ 9 }) the ({ 10 }) medication ({ 11 }) is ({ 15 }) completely ({ 14 }) consumed ({ 16 }) . ({ 17 })

Where the first line is a comment with some info, the second is the target sentence, and the last one is the source tokens and their alignments.

If I'm not mistaken, a full Moses training does produce the desired format so I suspect that there is a script somewhere in Moses that converts the format - but I cannot seem to find it. Could you perhaps nudge me in the right direction?


Kind regards

Bram Vanroy

Doctoral Researcher at Ghent University
Language and Translation Technology Team (LT?)
Bram.Vanroy@UGent.be<mailto:Bram.Vanroy@UGent.be>

https://research.flw.ugent.be/en/bram.vanroy
https://www.linkedin.com/in/bramvanroy/
https://github.com/BramVanroy/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20190123/1643921b/attachment-0001.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 147, Issue 9
*********************************************

0 Response to "Moses-support Digest, Vol 147, Issue 9"

Post a Comment