Moses-support Digest, Vol 100, Issue 21

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Using boost for prefix/suffix checks (Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Thu, 5 Feb 2015 16:25:39 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Using boost for prefix/suffix checks
To: Jeroen Vermeulen <jtv@precisiontranslationtools.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbh97GzJ05CBEyuQdMNTd1N5KvxyTKUhWMZo2EeHrxxYkw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

great, thanks. committed

https://github.com/moses-smt/mosesdecoder/commit/70e8eb54ce75feb0a7d4ed00d275e56652c0a914
I ran the regression tests before I commit, it's more extensive

Hieu Hoang
Research Associate (until March 2015)
** searching for interesting commercial MT position **
University of Edinburgh
http://www.hoang.co.uk/hieu


On 5 February 2015 at 15:49, Jeroen Vermeulen <
jtv@precisiontranslationtools.com> wrote:

> Here's a minor patch in case it's of use - but feel free to tell me to
> shut up if it isn't.
>
> Looking at some of the file-handling code I noticed that a lot of places
> check a string for a particular prefix or suffix with this kind of pattern:
>
> if (
> (text.size() >= suffix.size()) &&
> (text.substr(text.size()-suffix.size()) == suffix)) {
>
> It's a bit hard to read, and could lead to strange crashes if you forget
> the length check. For example, checking for...
>
> filename.substr(filename.size()-3) == ".gz"
>
> ...would crash if filename was less than 3 characters long.
>
> If anyone's interested, I'm attaching a patch that replaces all
> prefix/suffix checks that I could find with BOOST's starts_with() and
> ends_with(). It's a little safer and easier to follow, and doesn't make
> you count the characters in a fixed-length suffix:
>
> if (ends_with(text, suffix)) {
> if (ends_with(filename, ".gz")) {
> if (starts_with(item, "[") && ends_with(item, "]")) {
>
> None of these cases looked particularly performance-sensitive, but I
> checked just in case. If anything, the BOOST code looks more
> optimizer-friendly. It compares characters in-place (so no need to copy
> a substring) and seems to optimize for the known length of string
> constants (so it knows at compile time that ".gz" is 3 characters long).
>
> I haven't done any manual testing, but the unit tests pass. Is that
> considered a reasonable guarantee?
>
>
> Jeroen
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150205/435505e6/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 100, Issue 21
**********************************************

0 Response to "Moses-support Digest, Vol 100, Issue 21"

Post a Comment