Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Tokenizer - basic-protected-patterns fix (Tomas Fulajtar)
2. Re: nist-bleu evaluation with EMS (Tom Hoar)
3. Re: Tokenizer - basic-protected-patterns fix (Hieu Hoang)
----------------------------------------------------------------------
Message: 1
Date: Fri, 21 Nov 2014 13:39:47 +0000
From: Tomas Fulajtar <TomasFu@moravia.com>
Subject: [Moses-support] Tokenizer - basic-protected-patterns fix
To: "moses-support (moses-support@mit.edu)" <moses-support@mit.edu>
Message-ID:
<8d403d60d90a40f2a92b69ed9e59eadc@BY1PR0201MB0965.namprd02.prod.outlook.com>
Content-Type: text/plain; charset="iso-8859-2"
Hello all,
I guess the pattern for e-mail protected pattern should be changed from :
(\w\-\_\.)+\@((\w\-\_)+\.)+[a-zA-Z]{2,}
to
[\w\-\_\.]+\@([\w\-\_]+\.)+[a-zA-Z]{2,}
Inside https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/basic-protected-patterns.
Best regards,
Tom?? Fulajt?r | Researcher
T: +420-545-552-340
tomasfu@moravia.com<mailto:tomasfu@moravia.com> | moravia.com<http://www.moravia.com/> | Skype: tomasfulajtar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141121/7d151de0/attachment-0001.htm
------------------------------
Message: 2
Date: Fri, 21 Nov 2014 20:43:47 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] nist-bleu evaluation with EMS
To: moses-support@mit.edu
Message-ID: <546F4193.5050601@precisiontranslationtools.com>
Content-Type: text/plain; charset="windows-1252"
I've contributed this Python script before, but it never made its way to
the Moses repository.
This script has command-line help to guide you through making the three
files necessary for the mteval-v12.pl script. I can't explain the
purpose of the various tags, but they are correctly interpreted by the
script. The mteval-v12.pl script accepts SGML or XML. This creates XML.
I don't know if it works on v11 or v13.
You'll need to run the script three times, once for each "set type":
srcset, tstset and refset. If the ini file exists in the same folder, it
uses the configuration in it. The command line arguments override the
configuration in the ini file.
It hasn't been updated for a while. I can see now that if we create
[common] [srcset] tstset] and [refset] sections, the script could parse
all sections and create all three files at once. It goes on my "to-do"
list :)
Hieu, please feel free to add it to Moses' contrib or scripts folder.
Tom
On 11/21/2014 08:01 PM, Barry Haddow wrote:
> Hi
>
> You could use multi-bleu.perl (in scripts/generic). It works on plain
> tokenised text, not sgml,
>
> cheers - Barry
>
> On 21/11/14 12:43, Hieu Hoang wrote:
>> SGML/nist-bleu is a complete pain. When you find out how to do this,
>> please enlighten us!
>>
>> You might want to look at a working config file
>> http://www.statmt.org/moses/RELEASE-2.1/models/cs-en/config.pb.recase
>>
>> On 18 November 2014 10:10, Gary Daine <gdaine@gmail.com
>> <mailto:gdaine@gmail.com>> wrote:
>>
>> Hi,
>>
>> I have Moses installed and running properly, but I can't work out
>> how to
>> set up the config file for NIST-BLEU.
>>
>> The error message I get is:
>>
>> executing /home/gary/working/steps/4/EVALUATION_testcorpus_wrap.4
>> via sh (1 active)
>> number of steps doable or running: 1 at mar nov 18 10:51:09 CET 2014
>> doable: EVALUATION:testcorpus:nist-bleu
>> ERROR: you need to define GENERAL:input-sgm
>>
>> I understand that NIST-BLEU requires sgm-formatted files. My corpus is
>> in utf-8, and I've specified raw input for all the other steps, which
>> seems to work fine. I've read and re-read all the documentation I can
>> find, and I can't work out:
>>
>> (1) which file(s) need to be in sgm format, and
>> (2) how to specify this in the config file
>>
>> (obviously I need to specify 'input-sgm =', but what do I use as a
>> parameter? Do I need to convert the tuning(?) file manually
>> beforehand?)
>>
>> No
>>
>>
>> I would appreciate any pointers.
>>
>> Thanks,
>> Gary
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> --
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makemteval.ini
Type: application/x-wine-extension-ini
Size: 122 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141121/1aba0370/attachment-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makemteval.py
Type: text/x-python
Size: 7757 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141121/1aba0370/attachment-0001.py
------------------------------
Message: 3
Date: Fri, 21 Nov 2014 13:58:29 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Tokenizer - basic-protected-patterns fix
To: Tomas Fulajtar <TomasFu@moravia.com>, Tom Hoar
<tahoar@precisiontranslationtools.com>
Cc: "moses-support \(moses-support@mit.edu\)" <moses-support@mit.edu>
Message-ID:
<CAEKMkbi=7EQ-+j_Q1C+CWcEcPR7bg8udNBsTxArx1j48WdERZw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
thanks guys
https://github.com/moses-smt/mosesdecoder/commit/c0be182bfaa7528b8cdbdf97a607fcab859947e3
On 21 November 2014 13:39, Tomas Fulajtar <TomasFu@moravia.com> wrote:
> Hello all,
>
>
>
> I guess the pattern for e-mail protected pattern should be changed from :
>
>
>
> (\w\-\_\.)+\@((\w\-\_)+\.)+[a-zA-Z]{2,}
>
> to
>
>
>
> [\w\-\_\.]+\@([\w\-\_]+\.)+[a-zA-Z]{2,}
>
>
>
>
>
> Inside
> https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/basic-protected-patterns
> .
>
>
>
>
>
> Best regards,
>
>
>
> *Tom?? Fulajt?r* | Researcher
> *T:* +420-545-552-340
> tomasfu@moravia.com | moravia.com <http://www.moravia.com/> | *Skype:*
> tomasfulajtar
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141121/4c3796c5/attachment.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 97, Issue 65
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 97, Issue 65"
Post a Comment