Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Unicode Issues when Using Compact Phrase Table, Binaries
vs. Own Build (Nikolay Bogoychev)
----------------------------------------------------------------------
Message: 1
Date: Mon, 30 Mar 2015 12:17:22 +0100
From: Nikolay Bogoychev <nheart@gmail.com>
Subject: Re: [Moses-support] Unicode Issues when Using Compact Phrase
Table, Binaries vs. Own Build
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAJzPUEw0K+2r-FxceL=WewxFjdWcHsqKqmCTqnW4RoZS5w9eJw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hey ?????,
Did you by any chance binarize your phrase tables from a raw text format or
from gunzip (or any other supported compressed text formats)? I recently
run into similar issues with my phrase table (ProbingPT) if the input
phrase table had not been compressed during binary creation. I wasn't able
to trace the issue, i just make sure I gz any phrase table before
binarizing.
Cheers,
Nick
On Mon, Mar 30, 2015 at 10:11 AM, Marcin Junczys-Dowmunt <junczys@amu.edu.pl
> wrote:
> Forgot to add that we use the compact phrase table and Moses on older
> and newer Ubuntu version with Arabic, Chinese, Korean, Japanese, Russian in
> both directions and no problems. Those puny German umlauts should not be a
> challenge. :)
>
> W dniu 30.03.2015 o 11:08, Marcin Junczys-Dowmunt pisze:
>
> Hi,
> the phrase-table and as far as I know Moses in general are
> unicode-agnostic, as long as you use utf-8. Input is handled as raw byte
> sequences, most of the time there are numeric identifiers only.
> Sounds more like a couple of messed up systems on your side, especially
> the part where self-compiled systems work or don't work. Cannot give you
> much more insight, unfortunately.
> Best,
> Marcin
>
> W dniu 30.03.2015 o 10:53, "????????? ????? (Ventsislav Zhechev)" pisze:
>
> Hi all,
>
> I?m having this really weird Unicode issue when using compact phrase
> tables that could be related to endianness somehow, but I?ve no idea how.
> I compiled the training tools from v3 on my Mac and built a few models
> using compact phrase (and reordering) tables and KenLM, including (for
> simplicity) a recasing model for DE (download it from
> https://autodesk.box.com/DE-Recaser). Things become strange when I try to
> use the models, though:
> 1. All works fine when I use the decoder binary I compiled myself on the
> Mac (10.10.2, self-built Boost 1.57)
> 2. Unicode input is not recognised when I use the binary from
> http://www.statmt.org/moses/RELEASE-3.0/binaries/macosx-yosemite/ i.e.
> words like ?f?r? or ?ausf?hrlich? are marked as UNK.
> 3. Unicode input is not recognised when I use a binary I compiled myself
> on Ubuntu 12.04.5 (self-built Boost 1.57)
> 4. All works fine when I use the binary from
> http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/
>
> I tested the above with the queryPhraseTableMin tool (rather than the
> decoder) and got the same results, which is what makes me think this could
> be somehow related to binary incompatibility with the way the phrase table
> is compacted. Haven?t investigated deeper than that, though.
>
>
> Any clues?
> One would say, just use the Linux binary then on Linux... However, I have
> a number of CentOS/RHEL 5 and 6 boxes, where the pre-compiled binary
> doesn?t work, as the system glibc is too old. So there I need to compile
> Moses myself, but then Unicode isn?t recognised...
>
>
>
> Cheers,
>
> Ventzi
>
> ???????
> *Dr. Ventsislav Zhechev*
> Computational Linguist, Certified ScrumMaster?
> Platform Architecture and Technologies
> Localisation Services
>
> *MAIN* +41 32 723 91 22
> *FAX* +41 32 723 93 99
>
> *http://VentsislavZhechev.eu <http://VentsislavZhechev.eu>*
>
> *Autodesk, Inc.*
> Rue de Puits-Godet 6
> 2000 Neuch?tel, Switzerland
> *www.autodesk.com <http://www.autodesk.com/>*
>
>
>
>
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150330/8948d42f/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 14277 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150330/8948d42f/attachment.jpg
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 101, Issue 82
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 101, Issue 82"
Post a Comment