Moses-support Digest, Vol 99, Issue 35

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Legacy tokenizer.perl functionality. (Hieu Hoang)
2. Re: MGIZA is slower than GIZA (Marcin Junczys-Dowmunt)
3. MT position in Dublin, Ireland (John Tinsley)


----------------------------------------------------------------------

Message: 1
Date: Fri, 16 Jan 2015 16:16:15 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Legacy tokenizer.perl functionality.
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Cc: Tom Hoar <tahoar@precisiontranslationtools.com>, Ondrej Bojar
<bojar@ufal.mff.cuni.cz>, moses-support support
<moses-support@mit.edu>
Message-ID:
<CAEKMkbihSwVVfCvWpUCG1C=2cZed2qEMB_=Smm5mzq98cOS3vQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On 16 January 2015 at 16:13, Barry Haddow <bhaddow@staffmail.ed.ac.uk>
wrote:

> Hi
>
> Yes, the EMS (experiment management system) included with Moses will
> also deal with this by checking timestamps on the tokeniser scripts.
>
> If you use your models outside the EMS (or Eman etc) however then
> there's no easy way to ensure compatibility between tokeniser and model.
> I agree that the tokeniser shouldn't be doing text normalisation, but it
> was, and fixing it could cause more pain than leaving things as they are,
>
oh, i just moved the normalisation to normalize-punctuation.perl. It was a
pain

https://github.com/moses-smt/mosesdecoder/commit/19d7c44aad1d1b06b884833cc3a7b0e14d2a6c36

https://github.com/moses-smt/mosesdecoder/commit/30e31d4a95713cd5340941b15d566f09b3b1a2d7
Lets see what happens!


>
> cheers - Barry
>
> On 16/01/15 15:07, Ondrej Bojar wrote:
> > Hi, Christian,
> >
> > when the scripts directory of moses was first created back in 2006, we
> had the same issues with versioning. At that point, I created the (ugly)
> need 'install' the scripts, mainly to provide all of them with a version
> number. Fortunately, we now got rid of this and the scripts are meant to be
> used rightaway after checkout.
> >
> > I'm saying this just to point out that there is probably no ideal way of
> keeping up to date and yet ensuring compatibility for existing models with
> toolkits as complex as moses is.
> >
> > For this, I use my eman, an experiment manager where even moses toolkit
> itself is something timestamped. So I have a a couple of moses checkouts,
> timestamped, and my models depend on one of them. Moving to a fresher moses
> checkout is easy (a new timestamped directory gets created), but requires
> to redo all the models (well, eman does this for me, so it's just waste of
> computer space and time, not mine).
> >
> > Cheers, O.
> >
> > ----- Original Message -----
> >> From: "Christian Hardmeier" <ch@rax.ch>
> >> To: "Hieu Hoang" <hieuhoang@gmail.com>
> >> Cc: "Tom Hoar" <tahoar@precisiontranslationtools.com>, "moses-support
> support" <moses-support@mit.edu>
> >> Sent: Friday, 16 January, 2015 15:26:15
> >> Subject: Re: [Moses-support] Legacy tokenizer.perl functionality.
> >> On Jan 16, 2015, at 12:46 PM, Hieu Hoang wrote:
> >>
> >>> i think it's too difficult to police.
> >> You'd probably need a regression test that checks if the tokenised
> output is
> >> still the same so changes don't go unnoticed. But of course it's still
> some
> >> extra work.
> >>
> >>> Another idea is to get the script to md5 its own source code, and the
> non-prefix
> >>> files it uses.
> >> That would definitely be better than nothing, even though it would
> raise false
> >> alarms from time to time.
> >>
> >>> On 16/01/15 11:12, Christian Hardmeier wrote:
> >>>> On Jan 16, 2015, at 11:51 AM, Tom Hoar wrote:
> >>>>
> >>>>> I agree with versioning. Could be added to the command line.
> >>>>>
> >>>>> Also agree that this proposed change qualifies as a version change.
> >>>>>
> >>>>> How to you propose managing the issue of output changes due to
> >>>>> command-line switches, like -no-escape?
> >>>> Very good question. To be consistent, you'd probably have to
> increment the
> >>>> version number even if the change only applies when you use a certain
> >>>> command-line switch. But not if it doesn't affect the input, and
> maybe not if
> >>>> you just add a new command-line switch that is off by default. What
> do you
> >>>> think?
> >>>>
> >>>>
> >>>>
> >>>>> On 01/16/2015 05:36 PM, Christian Hardmeier wrote:
> >>>>>> I'd like to suggest that there should be a version number in the
> tokeniser that
> >>>>>> is incremented whenever the output changes, even if the change is
> minor and
> >>>>>> even if it's just a bugfix. Otherwise when you pull a new version
> of moses you
> >>>>>> don't know if the output of tokenizer.perl is still compatible with
> your
> >>>>>> existing models. (Moving functionality from tokenizer.perl to
> >>>>>> normalize-punctuation.perl would count as a change from my point of
> view. I
> >>>>>> don't always use normalize-punctutation.)
> >>>>>>
> >>>>>> /Christian
> >>>>>>
> >>>>>> On Jan 16, 2015, at 10:36 AM, Hieu Hoang wrote:
> >>>>>>
> >>>>>>> it's probably a good idea to make this change. If you've done it
> >>>>>>> already, please send me the updated scripts and I'll check it in.
> If
> >>>>>>> not, I'll do it myself
> >>>>>>>
> >>>>>>> there's hopefully a fast, C++ tokenizer replacement coming soon.
> >>>>>>> Highlighting these issues now is useful to understanding exactly
> how the
> >>>>>>> tokenizer works/should work
> >>>>>>>
> >>>>>>> On 15/01/15 01:52, Tom Hoar wrote:
> >>>>>>>> This is a separate issue from the parallel "Tokenization problem"
> thread...
> >>>>>>>>
> >>>>>>>> The tokenizer.perl has had one line that transforms the grave
> accent (`)
> >>>>>>>> to apostrophe and another that transforms double apostrophe ('')
> to to
> >>>>>>>> single quote. I suspect these have been in the script since the
> >>>>>>>> beginning. However, they recently "bit" me on a recent project.
> Easy
> >>>>>>>> enough to work around.
> >>>>>>>>
> >>>>>>>> Still, I'm wondering. Do they still belong in the tokenizer.perl
> script?
> >>>>>>>> Or, should they moved into one of the other scripts? The
> >>>>>>>> normalize-punctuation.perl script seems to be a good candidate.
> >>>>>>>> _______________________________________________
> >>>>>>>> Moses-support mailing list
> >>>>>>>> Moses-support@mit.edu
> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Moses-support mailing list
> >>>>>>> Moses-support@mit.edu
> >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>> _______________________________________________
> >>>>>> Moses-support mailing list
> >>>>>> Moses-support@mit.edu
> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>> _______________________________________________
> >>>>> Moses-support mailing list
> >>>>> Moses-support@mit.edu
> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>> _______________________________________________
> >>>> Moses-support mailing list
> >>>> Moses-support@mit.edu
> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150116/ac295d98/attachment-0001.htm

------------------------------

Message: 2
Date: Fri, 16 Jan 2015 17:21:48 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] MGIZA is slower than GIZA
To: Li Xiang <lixiang.ict@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <48266d063eb0702c252d9c9913d1a6b0@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"



Hi,

I have been complaining about MGiza performance, too, lately. With more
cores it get's slower instead of faster, at least above 8 cores. If it
is slower than GIZA then this is really bad.

W dniu 2015-01-16 16:53, Li Xiang napisa?(a):

> Hi all,
>
> I trained the alignment model on the same data with the same parameters using GIZA and MGIZA respectively. The training corpus includes 200K sentences. My server has an Intel Quad CPU i4790K which has 4 cores and each core has 2 threads. It costs 2905 seconds for GIZA. But it costs 5259 seconds for MGIZA with 3 threads. I think MGIZA is much faster than GIZA. But I got bad result. I do not know the reason is the compile way or others.
>
> Does anyone has relative experience? Thanks.
>
> The following is the training command for MGIZA. And the training data is the FBIS zh-en data. But I can not public the data because of copyright.
>
> ${mosesScript}/training/train-model.perl
> --external-bin-dir "${binDir}"
> --root-dir "${trainDir}"
> --corpus train
> --f src
> --e ref
> --alignment grow-diag-final-and
> --parallel
> --first-step 1
> --last-step 3
> --mgiza --mgiza-cpus 3
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support [1]



Links:
------
[1] http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150116/8b939b0b/attachment-0001.htm

------------------------------

Message: 3
Date: Fri, 16 Jan 2015 16:26:18 +0000
From: John Tinsley <jtinsley@computing.dcu.ie>
Subject: [Moses-support] MT position in Dublin, Ireland
To: moses-support@mit.edu
Message-ID:
<CAHfkK=5ZQaG+oH2dn_h7kF2E4Gcqc2Xfr8Uvdp62cmgqEQhZUw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi all, apologies for cross-posting.

We have a vacancy for a *Machine Translation Engineer* at Iconic
Translation Machines in Dublin. Full details on the role can be found at
http://iconictranslation.com/about/careers/ but I've included a summary
below also.

We're looking for someone with PhD/MSc qualification in MT/NLP to join our
Language Technology Team and work on developing real-world practical MT
applications. The best aspect of this role is that it?s never dull! You
will never be stuck working on the same task constantly because we are
always working with new clients on new data, interesting languages, and
wide-ranging domains. It is a fantastic opportunity to both refine and
broaden your MT skillset to try out new techniques, and implement
established processes that will be used by our clients to translate
hundreds of millions of words each year.

If you're interested, follow the link above for more information or drop me
a mail directly at john@iptranslator.com

Cheers
John

--
Dr. John Tinsley
CEO & Co-Founder
Iconic Translation Machines Ltd.


Machine Translation with Subject Matter Expertise

*Join our upcoming webinar on February 4th!*
*www.iconictranslation.com/webinar/signup
<http://iconictranslation.com/2014/12/taus-translation-technology-webinar-jan-7th/>*

*Web: *http://www.IconicTranslation.com <http://www.iconictranslation.com/>
*Follow us on Twitter:* @iconictrans
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150116/983f5abc/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 99, Issue 35
*********************************************

0 Response to "Moses-support Digest, Vol 99, Issue 35"

Post a Comment