Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. GIZA-format to Pharaoh format (Bram Vanroy)
----------------------------------------------------------------------
Message: 1
Date: Wed, 23 Jan 2019 14:58:26 +0000
From: Bram Vanroy <Bram.Vanroy@UGent.be>
Subject: [Moses-support] GIZA-format to Pharaoh format
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <a4e78c1101154427b6f33e1423962630@xmail103.UGent.be>
Content-Type: text/plain; charset="iso-8859-1"
Dear support list
I am a PhD student in machine translation and as part of my research I would want to compare some alignment tools such as MGIZA++, fast align, efmeral. The easiest format to use for me would be the Pharaoh format, i.e. the format where the source and target token indices are separated by a dash. For instance:
0-0 1-1 2-2 2-3 3-4 4-5
However, when having run(m) GIZA++, I can only find the extended format in the A3 tables, which look like this:
# Sentence pair (1236078) source length 12 target length 17 alignment score : 1.07364e-29
zorg ervoor dat het voer of het extraatje dat het geneesmiddel bevat , volledig wordt opgenomen .
NULL ({ 3 7 12 13 }) ensure ({ 1 2 }) the ({ 4 }) food ({ 5 }) or ({ 6 }) treat ({ 8 }) containing ({ 9 }) the ({ 10 }) medication ({ 11 }) is ({ 15 }) completely ({ 14 }) consumed ({ 16 }) . ({ 17 })
Where the first line is a comment with some info, the second is the target sentence, and the last one is the source tokens and their alignments.
If I'm not mistaken, a full Moses training does produce the desired format so I suspect that there is a script somewhere in Moses that converts the format - but I cannot seem to find it. Could you perhaps nudge me in the right direction?
Kind regards
Bram Vanroy
Doctoral Researcher at Ghent University
Language and Translation Technology Team (LT?)
Bram.Vanroy@UGent.be<mailto:Bram.Vanroy@UGent.be>
https://research.flw.ugent.be/en/bram.vanroy
https://www.linkedin.com/in/bramvanroy/
https://github.com/BramVanroy/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20190123/1643921b/attachment-0001.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 147, Issue 9
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 147, Issue 9"
Post a Comment