Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Unicode Issues when Using Compact Phrase Table, Binaries vs.
Own Build (????????? ????? (Ventsislav Zhechev))
----------------------------------------------------------------------
Message: 1
Date: Mon, 30 Mar 2015 10:53:39 +0200
From: "????????? ????? (Ventsislav Zhechev)"
<contact@VentsislavZhechev.eu>
Subject: [Moses-support] Unicode Issues when Using Compact Phrase
Table, Binaries vs. Own Build
To: moses-support@mit.edu
Message-ID:
<AB42EB15-5221-4D23-849A-0510F26174DD@VentsislavZhechev.eu>
Content-Type: text/plain; charset="utf-8"
Hi all,
I?m having this really weird Unicode issue when using compact phrase tables that could be related to endianness somehow, but I?ve no idea how.
I compiled the training tools from v3 on my Mac and built a few models using compact phrase (and reordering) tables and KenLM, including (for simplicity) a recasing model for DE (download it from https://autodesk.box.com/DE-Recaser <https://autodesk.box.com/DE-Recaser>). Things become strange when I try to use the models, though:
1. All works fine when I use the decoder binary I compiled myself on the Mac (10.10.2, self-built Boost 1.57)
2. Unicode input is not recognised when I use the binary from http://www.statmt.org/moses/RELEASE-3.0/binaries/macosx-yosemite/ <http://www.statmt.org/moses/RELEASE-3.0/binaries/macosx-yosemite/> i.e. words like ?f?r? or ?ausf?hrlich? are marked as UNK.
3. Unicode input is not recognised when I use a binary I compiled myself on Ubuntu 12.04.5 (self-built Boost 1.57)
4. All works fine when I use the binary from http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/ <http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/>
I tested the above with the queryPhraseTableMin tool (rather than the decoder) and got the same results, which is what makes me think this could be somehow related to binary incompatibility with the way the phrase table is compacted. Haven?t investigated deeper than that, though.
Any clues?
One would say, just use the Linux binary then on Linux... However, I have a number of CentOS/RHEL 5 and 6 boxes, where the pre-compiled binary doesn?t work, as the system glibc is too old. So there I need to compile Moses myself, but then Unicode isn?t recognised...
Cheers,
Ventzi
???????
Dr. Ventsislav Zhechev
Computational Linguist, Certified ScrumMaster?
Platform Architecture and Technologies
Localisation Services
MAIN +41 32 723 91 22
FAX +41 32 723 93 99
http://VentsislavZhechev.eu <http://ventsislavzhechev.eu/>
Autodesk, Inc.
Rue de Puits-Godet 6
2000 Neuch?tel, Switzerland
www.autodesk.com <http://www.autodesk.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150330/36599e36/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 14277 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150330/36599e36/attachment.jpg
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 101, Issue 79
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 101, Issue 79"
Post a Comment