Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Legacy tokenizer.perl functionality. (Tom Hoar)
----------------------------------------------------------------------
Message: 1
Date: Fri, 16 Jan 2015 23:31:03 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] Legacy tokenizer.perl functionality.
To: moses-support@mit.edu
Message-ID: <54B93CC7.3050807@precisiontranslationtools.com>
Content-Type: text/plain; charset="windows-1252"
"It was a pain..." See, aren't you glad you didn't rely on my Perl skills?
I'm almost sorry I raised the controversy. I forgot Moses started with
and got away from versioned script folders. Maybe the responsibility for
version tracking is best left to those who critically need it.
On 01/16/2015 11:16 PM, Hieu Hoang wrote:
>
>
> On 16 January 2015 at 16:13, Barry Haddow <bhaddow@staffmail.ed.ac.uk
> <mailto:bhaddow@staffmail.ed.ac.uk>> wrote:
>
> Hi
>
> Yes, the EMS (experiment management system) included with Moses will
> also deal with this by checking timestamps on the tokeniser scripts.
>
> If you use your models outside the EMS (or Eman etc) however then
> there's no easy way to ensure compatibility between tokeniser and
> model.
> I agree that the tokeniser shouldn't be doing text normalisation,
> but it
> was, and fixing it could cause more pain than leaving things as
> they are,
>
> oh, i just moved the normalisation to normalize-punctuation.perl. It
> was a pain
> https://github.com/moses-smt/mosesdecoder/commit/19d7c44aad1d1b06b884833cc3a7b0e14d2a6c36
> https://github.com/moses-smt/mosesdecoder/commit/30e31d4a95713cd5340941b15d566f09b3b1a2d7
> Lets see what happens!
>
>
> cheers - Barry
>
> On 16/01/15 15:07, Ondrej Bojar wrote:
> > Hi, Christian,
> >
> > when the scripts directory of moses was first created back in
> 2006, we had the same issues with versioning. At that point, I
> created the (ugly) need 'install' the scripts, mainly to provide
> all of them with a version number. Fortunately, we now got rid of
> this and the scripts are meant to be used rightaway after checkout.
> >
> > I'm saying this just to point out that there is probably no
> ideal way of keeping up to date and yet ensuring compatibility for
> existing models with toolkits as complex as moses is.
> >
> > For this, I use my eman, an experiment manager where even moses
> toolkit itself is something timestamped. So I have a a couple of
> moses checkouts, timestamped, and my models depend on one of them.
> Moving to a fresher moses checkout is easy (a new timestamped
> directory gets created), but requires to redo all the models
> (well, eman does this for me, so it's just waste of computer space
> and time, not mine).
> >
> > Cheers, O.
> >
> > ----- Original Message -----
> >> From: "Christian Hardmeier" <ch@rax.ch <mailto:ch@rax.ch>>
> >> To: "Hieu Hoang" <hieuhoang@gmail.com <mailto:hieuhoang@gmail.com>>
> >> Cc: "Tom Hoar" <tahoar@precisiontranslationtools.com
> <mailto:tahoar@precisiontranslationtools.com>>, "moses-support
> support" <moses-support@mit.edu <mailto:moses-support@mit.edu>>
> >> Sent: Friday, 16 January, 2015 15:26:15
> >> Subject: Re: [Moses-support] Legacy tokenizer.perl functionality.
> >> On Jan 16, 2015, at 12:46 PM, Hieu Hoang wrote:
> >>
> >>> i think it's too difficult to police.
> >> You'd probably need a regression test that checks if the
> tokenised output is
> >> still the same so changes don't go unnoticed. But of course
> it's still some
> >> extra work.
> >>
> >>> Another idea is to get the script to md5 its own source code,
> and the non-prefix
> >>> files it uses.
> >> That would definitely be better than nothing, even though it
> would raise false
> >> alarms from time to time.
> >>
> >>> On 16/01/15 11:12, Christian Hardmeier wrote:
> >>>> On Jan 16, 2015, at 11:51 AM, Tom Hoar wrote:
> >>>>
> >>>>> I agree with versioning. Could be added to the command line.
> >>>>>
> >>>>> Also agree that this proposed change qualifies as a version
> change.
> >>>>>
> >>>>> How to you propose managing the issue of output changes due to
> >>>>> command-line switches, like -no-escape?
> >>>> Very good question. To be consistent, you'd probably have to
> increment the
> >>>> version number even if the change only applies when you use a
> certain
> >>>> command-line switch. But not if it doesn't affect the input,
> and maybe not if
> >>>> you just add a new command-line switch that is off by
> default. What do you
> >>>> think?
> >>>>
> >>>>
> >>>>
> >>>>> On 01/16/2015 05:36 PM, Christian Hardmeier wrote:
> >>>>>> I'd like to suggest that there should be a version number
> in the tokeniser that
> >>>>>> is incremented whenever the output changes, even if the
> change is minor and
> >>>>>> even if it's just a bugfix. Otherwise when you pull a new
> version of moses you
> >>>>>> don't know if the output of tokenizer.perl is still
> compatible with your
> >>>>>> existing models. (Moving functionality from tokenizer.perl to
> >>>>>> normalize-punctuation.perl would count as a change from my
> point of view. I
> >>>>>> don't always use normalize-punctutation.)
> >>>>>>
> >>>>>> /Christian
> >>>>>>
> >>>>>> On Jan 16, 2015, at 10:36 AM, Hieu Hoang wrote:
> >>>>>>
> >>>>>>> it's probably a good idea to make this change. If you've
> done it
> >>>>>>> already, please send me the updated scripts and I'll check
> it in. If
> >>>>>>> not, I'll do it myself
> >>>>>>>
> >>>>>>> there's hopefully a fast, C++ tokenizer replacement coming
> soon.
> >>>>>>> Highlighting these issues now is useful to understanding
> exactly how the
> >>>>>>> tokenizer works/should work
> >>>>>>>
> >>>>>>> On 15/01/15 01:52, Tom Hoar wrote:
> >>>>>>>> This is a separate issue from the parallel "Tokenization
> problem" thread...
> >>>>>>>>
> >>>>>>>> The tokenizer.perl has had one line that transforms the
> grave accent (`)
> >>>>>>>> to apostrophe and another that transforms double
> apostrophe ('') to to
> >>>>>>>> single quote. I suspect these have been in the script
> since the
> >>>>>>>> beginning. However, they recently "bit" me on a recent
> project. Easy
> >>>>>>>> enough to work around.
> >>>>>>>>
> >>>>>>>> Still, I'm wondering. Do they still belong in the
> tokenizer.perl script?
> >>>>>>>> Or, should they moved into one of the other scripts? The
> >>>>>>>> normalize-punctuation.perl script seems to be a good
> candidate.
> >>>>>>>> _______________________________________________
> >>>>>>>> Moses-support mailing list
> >>>>>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Moses-support mailing list
> >>>>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>> _______________________________________________
> >>>>>> Moses-support mailing list
> >>>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>> _______________________________________________
> >>>>> Moses-support mailing list
> >>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>> _______________________________________________
> >>>> Moses-support mailing list
> >>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150116/e3d91139/attachment.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 99, Issue 36
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 99, Issue 36"
Post a Comment