Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. cleaning the corpus prunes entire dataset (Jaya Kumaran)
2. Mosesserver Paramters to be defined by client (Leo Kriese)
3. Re: Mosesserver Paramters to be defined by client (Barry Haddow)
4. Re: Mosesserver Paramters to be defined by client (Barry Haddow)
5. mert-moses.pl and distortion-limit (Tom Hoar)
----------------------------------------------------------------------
Message: 1
Date: Fri, 5 Dec 2014 11:24:32 +0530
From: Jaya Kumaran <jayakarayil@gmail.com>
Subject: [Moses-support] cleaning the corpus prunes entire dataset
To: moses-support@mit.edu
Message-ID:
<CAPwTSQRrv0AJSRvaVP2-5M0S9NZbqngxCfOXanm58OgS9YLMVQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
When I run clean-corpus-n.perl with max-1000 on the dataset with
14k(tourism corpus) lines, I get only 2.5k lines as clean corpus.
I see the script in addition to removing blank lines, and lines >1000(max)
words, the script is removing lines which violates 9-1 sentence ratio of
Giza. I don't understand 9-1 sentence ratio.
How do i increase my clean corpus size.
Thanks,
Jaya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141205/53fec263/attachment-0001.htm
------------------------------
Message: 2
Date: Fri, 5 Dec 2014 10:10:20 +0100
From: Leo Kriese <Leonard.Kriese@hotmail.de>
Subject: [Moses-support] Mosesserver Paramters to be defined by client
To: moses-support@mit.edu
Message-ID: <BLU436-SMTP1229435B39BF2FEB5A3890DFD790@phx.gbl>
Content-Type: text/plain; charset="utf-8"
Hello Moses-Team,
I do want to use the MosesServer with additional parameters defined on
the client's side and I am just wondering how:
If we just take for example this line of code from the client's perl
script you can easily find on the internet:
my %param = ("text" => $encoded, "align" => "true", "report-all-factors"
=> "true");
these "Key"words like "test, align, report-all-factors" seem just a
little bit arbitrary to me.
Or the question is: which Keywords exist among them?
All parameters which you can find in the */Parameter.cpp/* as well?
I will explain my problem and provide ways I think they could be solved:
_I do want to translate documents and also want to have a corresponding
n-best-list for each document._
1st Way) Is there a "Key"word for sending documents and also outputting
n-best-lists and a target document?
If this posibility exists, it would be nice
2nd Way) If there ain't a way to translate documents via mosesserver I
could also send every line of my documents to the server and write the
STDOUT to a file, but then I would have use at least the -n-best-list
parameter in the following way:
"-n-best-list: file and size of n-best-list to be generated; specify -
as the file in order to
write to STDOUT"
but I don't get the explanation for putting the list to STDOUT.
Thanks for your help in advance, I hope you understood everything, its
not my native language.
Best,
Leo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141205/f14b8b95/attachment-0001.htm
------------------------------
Message: 3
Date: Fri, 05 Dec 2014 09:43:42 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Mosesserver Paramters to be defined by
client
To: Leo Kriese <Leonard.Kriese@hotmail.de>, moses-support@mit.edu
Message-ID: <54817E4E.5030506@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Hi Leo
The parameters that mosesserver takes in via its xml-rpc api are quite
different from the Parameter.cpp parameters. Unfortunately there has
been a few new ones added recently and they have not yet been documented.
If you look at lines 232-268 of the current version of mosesserver.cpp
you can see the parameter processing. In particular, the parameter
"nbest" requests an nbest list, and the argument of the parameter is the
size of the nbest list. There is no mechanism to force the server to
write an nbest list to file -- it sends it back to the client. In
principle, you could implement writing to file in the server, although
you would have to decide what to do if there are multiple simultaneous
clients. For me, it seems simpler to have the client write the nbest
list to file.
The server processes one sentence at a time, so if you want to process
documents then you should create your own wrapper.
cheers - Barry
On 05/12/14 09:10, Leo Kriese wrote:
> Hello Moses-Team,
>
> I do want to use the MosesServer with additional parameters defined on
> the client's side and I am just wondering how:
>
> If we just take for example this line of code from the client's perl
> script you can easily find on the internet:
>
> my %param = ("text" => $encoded, "align" => "true",
> "report-all-factors" => "true");
>
> these "Key"words like "test, align, report-all-factors" seem just a
> little bit arbitrary to me.
>
> Or the question is: which Keywords exist among them?
>
> All parameters which you can find in the */Parameter.cpp/* as well?
>
> I will explain my problem and provide ways I think they could be solved:
>
>
> _I do want to translate documents and also want to have a
> corresponding n-best-list for each document._
>
> 1st Way) Is there a "Key"word for sending documents and also
> outputting n-best-lists and a target document?
> If this posibility exists, it would be nice
>
> 2nd Way) If there ain't a way to translate documents via mosesserver I
> could also send every line of my documents to the server and write the
> STDOUT to a file, but then I would have use at least the -n-best-list
> parameter in the following way:
>
> "-n-best-list: file and size of n-best-list to be generated; specify -
> as the file in order to
> write to STDOUT"
>
> but I don't get the explanation for putting the list to STDOUT.
>
>
> Thanks for your help in advance, I hope you understood everything, its
> not my native language.
> Best,
> Leo
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
------------------------------
Message: 4
Date: Fri, 05 Dec 2014 11:09:22 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Mosesserver Paramters to be defined by
client
To: Leo Kriese <Leonard.Kriese@hotmail.de>, Moses support
<Moses-support@mit.edu>
Message-ID: <54819262.1080408@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Hi Leo
I do not have a perl snippet, but in python it looks like:
params = {"text" : source}
params["nbest"] = nbest_size
params["add-score-breakdown"] = 1
params["nbest-distinct"] = 1
result = proxy.translate(params)
print result['text'].encode("utf8")
for nbest_item in result['nbest']:
print>>nbest_fh, line_no,"|||", nbest_item['hyp'].encode('utf8'),
"|||", nbest_item['fvals'], "|||", nbest_item['totalScore']
This prints the 1-best to stdout, and the nbest to file, in the same
format as the standard moses decoder,
cheers - Barry
On 05/12/14 10:40, Leo Kriese wrote:
>
> Hey Barry,
>
> thank you for this fast response.
>
> Yeah, I also found out about the *mosesserver.cpp* right before you
> wrote me, but I struggle a little bit of parsing the source code visually.
>
> Given the perl script as a client there is an example of how to print
> the Phrase Alignments.
>
> if ($result->{'align'}) {
> print "Phrase alignments: \n";
> $aligns = $result->{'align'};
> foreach my $align (@$aligns) {
> print $align->{'tgt-start'} . "," . $align->{'src-start'} . ","
> . $align->{'src-end'} . "\n";
> }
> }
>
> Could you provide me an example of code how to make this work for the
> "nbest" parameter?
> Does is work similar to the "align" parameter?
>
> best,
>
> Leo
>
>
>
> El 05/12/2014 a las 10:43, Barry Haddow escribi?:
>> Hi Leo
>>
>> The parameters that mosesserver takes in via its xml-rpc api are
>> quite different from the Parameter.cpp parameters. Unfortunately
>> there has been a few new ones added recently and they have not yet
>> been documented.
>>
>> If you look at lines 232-268 of the current version of
>> mosesserver.cpp you can see the parameter processing. In particular,
>> the parameter "nbest" requests an nbest list, and the argument of the
>> parameter is the size of the nbest list. There is no mechanism to
>> force the server to write an nbest list to file -- it sends it back
>> to the client. In principle, you could implement writing to file in
>> the server, although you would have to decide what to do if there are
>> multiple simultaneous clients. For me, it seems simpler to have the
>> client write the nbest list to file.
>>
>> The server processes one sentence at a time, so if you want to
>> process documents then you should create your own wrapper.
>>
>> cheers - Barry
>>
>> On 05/12/14 09:10, Leo Kriese wrote:
>>> Hello Moses-Team,
>>>
>>> I do want to use the MosesServer with additional parameters defined
>>> on the client's side and I am just wondering how:
>>>
>>> If we just take for example this line of code from the client's perl
>>> script you can easily find on the internet:
>>>
>>> my %param = ("text" => $encoded, "align" => "true",
>>> "report-all-factors" => "true");
>>>
>>> these "Key"words like "test, align, report-all-factors" seem just a
>>> little bit arbitrary to me.
>>>
>>> Or the question is: which Keywords exist among them?
>>>
>>> All parameters which you can find in the */Parameter.cpp/* as well?
>>>
>>> I will explain my problem and provide ways I think they could be
>>> solved:
>>>
>>>
>>> _I do want to translate documents and also want to have a
>>> corresponding n-best-list for each document._
>>>
>>> 1st Way) Is there a "Key"word for sending documents and also
>>> outputting n-best-lists and a target document?
>>> If this posibility exists, it would be nice
>>>
>>> 2nd Way) If there ain't a way to translate documents via mosesserver
>>> I could also send every line of my documents to the server and write
>>> the STDOUT to a file, but then I would have use at least the
>>> -n-best-list parameter in the following way:
>>>
>>> "-n-best-list: file and size of n-best-list to be generated; specify
>>> - as the file in order to
>>> write to STDOUT"
>>>
>>> but I don't get the explanation for putting the list to STDOUT.
>>>
>>>
>>> Thanks for your help in advance, I hope you understood everything,
>>> its not my native language.
>>> Best,
>>> Leo
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
------------------------------
Message: 5
Date: Fri, 05 Dec 2014 23:59:28 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: [Moses-support] mert-moses.pl and distortion-limit
To: moses-support@mit.edu
Message-ID: <5481E470.5040905@precisiontranslationtools.com>
Content-Type: text/plain; charset=utf-8; format=flowed
I just want to verify if I want to set the moses distortion-limit value,
should I tune the configuration using the new distortion-limit value in
the mert-moses.perl "decoder-flags" value?
Thanks.
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 98, Issue 20
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 98, Issue 20"
Post a Comment