Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: TRAINING_extract-phrases ERROR: malformed XML (Ergun Bicici)
2. Re: Moses Server Performance/Optimization and Moses2 (Hieu Hoang)
----------------------------------------------------------------------
Message: 1
Date: Sat, 13 May 2017 10:59:13 +0300
From: Ergun Bicici <bicici@gmail.com>
Subject: Re: [Moses-support] TRAINING_extract-phrases ERROR: malformed
XML
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAB59qTOyL=jo7WgeoYourYYJOWmkMBGQ6Cn6rGEqhE6b=feqsA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Ok, thank you. Turns out that a dataset I used was not tokenized.
You already mentioned that these characters are escaped in a previous
thread:
https://www.mail-archive.com/moses-support@mit.edu/msg10412.html
> Also, it does not do tokenization, so if you want your data tokenized,
> you should use the tokenizer instead, which also escapes special
> characters.
Regards,
Ergun
On Fri, May 12, 2017 at 9:06 PM, Philipp Koehn <phi@jhu.edu> wrote:
> Hi,
>
> you should replace the "<" and ">" with < and >
>
> scripts/tokenizer/escape-special-chars.perl does that for you.
>
> -phi
>
> On Thu, May 11, 2017 at 3:12 PM, Ergun Bicici <bicici@gmail.com> wrote:
>
>>
>> clean-corpus-n.perl can clean XML tags before tokenization:
>>
>> sub word_count {
>> my ($line) = @_;
>> if ($ignore_xml) {
>> $line =~ s/<\S[^>]*\S>/ /g;
>> $line =~ s/\s+/ /g;
>> $line =~ s/^ //g;
>> $line =~ s/ $//g;
>> }
>> my @w = split(/ /,$line);
>> return scalar @w;
>> }
>>
>> Ergun
>>
>> On Thu, May 11, 2017 at 10:33 AM, Ergun Bicici <bicici@gmail.com> wrote:
>>
>>>
>>> Similarly:
>>> ERROR: some opened tags were never closed: it shares some features in
>>> common with the SGML < ! [ CDATA [ ] ] > construct , in that it declares a
>>> block of text which is not for parsing .
>>>
>>>
>>> On Thu, May 11, 2017 at 10:32 AM, Ergun Bicici <bicici@gmail.com> wrote:
>>>
>>>>
>>>> TRAINING_extract-phrases is giving
>>>> ERROR: malformed XML: Wirtschaftsjahr Betriebsgr?sse < 50.000 kg
>>>> 120.000 kg
>>>> ERROR: malformed XML: < ! -- / * Font Definitions *
>>>>
>>>> etc.
>>>>
>>>> this appears to be due to the tokenization of html tags.
>>>>
>>>> Is there an option of Moses to handle these?
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> Ergun
>>>>
>>>> Ergun Bi?ici
>>>> http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> Ergun
>>>
>>> Ergun Bi?ici
>>> http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
>>>
>>
>>
>>
>> --
>>
>> Regards,
>> Ergun
>>
>> Ergun Bi?ici
>> http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
--
Regards,
Ergun
Ergun Bi?ici
http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170513/bc209d93/attachment-0001.html
------------------------------
Message: 2
Date: Sat, 13 May 2017 16:03:50 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Moses Server Performance/Optimization and
Moses2
To: Steve Braich <stevebpdx@gmail.com>, moses-support
<moses-support@mit.edu>
Message-ID: <08582cca-ca19-b2bb-45a7-3ba026cf5357@gmail.com>
Content-Type: text/plain; charset="windows-1252"
Moses2 doesn't support the Compact phrase-table, only ProbingPT.
You have to rebinarize the phrase-table with CreateProbingPT. It's
better if you also integrate the lexicalised reordering table the
phrase-table. The script
scripts/generic/binarize4moses2.perl
does this all in 1 go
On 13/05/2017 04:14, Steve Braich wrote:
> Hello,
> I am trying to improve the performance of Moses in a server
> environment (not training).
>
> I have tried various things, including the following which is
> described on http://www.statmt.org/moses/?n=Moses.Optimize.
> It has had no impact:
>
> * *Multi-threading: *Up to 16 Cores on Google's Cloud
> I used the option -threads all and -threads 16
>
> * *Memory:*
> I have launched Moses with as much as 60 GB of memory
>
> * *Caching the Models*
> Note: I only have the following files that I understand to be what
> you need to translate, including Phrase Table, Reordering Table,
> and target language model.
> Am I missing anything? Your documentation suggests that I might
> at http://www.statmt.org/moses/?n=Moses.Optimize#ntoc2
> <http://www.statmt.org/moses/?n=Moses.Optimize>. I see these other
> files in my EMS directory but I get no errors without them.
>
> cat /home/autom8tr8n/mt_publish/model_enzh/phrase-table.minphr >
> /dev/null
>
> cat
> /home/autom8tr8n/mt_publish/model_enzh/reordering-table.minlexr >
> /dev/null
>
> cat /home/autom8tr8n/mt_publish/model_enzh/target.blm.zh > /dev/null
>
> * *Memory:
> *I have launched Moses with as much as 60 GB of memory
>
> * *Other:*
> I made sure "transparent huge pages" are enabled, and I use the
> compact phrase and reordering tables.
>
> So, I am now trying Moses2 and I am following the instructions here
> (http://www.statmt.org/moses/?n=Site.Moses2). I am having some issues
> with this. Here are my questions:
>
> * *Does the Moses executable have to match for Training and Server?*
> In other words, do I have to train the models with Moses2 if I
> want to run them with Moses2 Server?
>
> * *Moses2 Error Messages:*
> I tried just running it with Moses2 (without server and just basic
> options):
> ~/mosesdecoder/bin/moses2 -f ~/mt_publish/model_enzh/moses.ini
>
> I get this error message:
> Starting...
> Defined parameters (per moses.ini or switch):
> config: /home/autom8tr8n/mt_publish/model_enzh/moses.ini
> distortion-limit: 6
> feature: UnknownWordPenalty WordPenalty PhrasePenalty
> PhraseDictionaryCompact name=TranslationModel0 num-features=4
> path=/home/autom8tr8n/mt_publish/model_enzh/phrase-table.minphr
> input-factor=0 output-factor=0 LexicalReordering
> name=LexicalReordering0 num-features=6
> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
> path=/home/autom8tr8n/mt_publish/model_enzh/reordering-table
> Distortion KENLM name=LM0 factor=0
> path=/home/autom8tr8n/mt_publish/model_enzh/target.blm.zh order=3
> input-factors: 0
> mapping: 0 T 0
> mark-unknown:
> server:
> server-port: 3001
> unknown-word-prefix: __UNK__
> unknown-word-suffix: __UNK__
> weight: LexicalReordering0= 0.0892686 0.0768025 0.0930783
> 0.0716109 0.00303172 0.0492278 Distortion0= 0.0399769 LM0=
> 0.107799 WordPenalty0= -0.242343 PhrasePenalty0= 0.0403541
> TranslationModel0= 0.0396487 0.0448175 0.0681687 0.0338723
> UnknownWordPenalty0= 1
> *Feature name PhraseDictionaryCompact is not registered.Aborted
> (core dumped)
>
> ^This error suggests that I didn't compile with CMPH. I did. See
> below.*
>
> * *Compilation*
> From the file dates, Moses2 appears to have been compiled when I
> initially compiled Moses (1). Here is how I compiled it:
> ./bjam -a --with-boost=/home/autom8tr8n/mosesdecoder/boost_1_63_0
> --with-cmph=/home/autom8tr8n/mosesdecoder/cmph/cmph2.0
> --with-xmlrpc-c=/home/autom8tr8n/mosesdecoder/opt
>
> Where is the compilation log file where I can see what Moses 2 was
> compiled with?
>
> Please answer my questions about Moses2, look over the other
> optimization I did, and please, I am all ears if you have other
> suggestions to boost performance. My very unscientific manual testing
> has shown about 1.2 seconds to translate a phrase. No improvement at
> all has been made with any optimization that I have tried so far. A
> VM with a single core and 3.75 GB of memory performs just as good as a
> 16 core, 60 GB memory VM.
>
> Thanks in advance,
>
> Steve
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
--
Hieu Hoang
http://moses-smt.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170513/2adada3a/attachment.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 127, Issue 18
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 127, Issue 18"
Post a Comment