Moses-support Digest, Vol 127, Issue 17

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Parallel Subsampling (Philipp Koehn)
2. Re: TRAINING_extract-phrases ERROR: malformed XML (Philipp Koehn)
3. Moses Server Performance/Optimization and Moses2 (Steve Braich)


----------------------------------------------------------------------

Message: 1
Date: Fri, 12 May 2017 14:04:44 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] Parallel Subsampling
To: Sanjanashree Palanivel <sanjanashree@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDBykQ9tAbtjKaQ7BnxPQajNkyBmTKkKOZEKFs+_DGMeuA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

could you be a bit more specific?

Are you referring to the subsampling method that selects relevant data from
a parallel corpus, based on similarity to in-domain data (a.k.a. "modified
Moore-Lewis")?
If so, what is your question?

-phi

On Thu, May 11, 2017 at 7:43 AM, Sanjanashree Palanivel <
sanjanashree@gmail.com> wrote:

> Dear All,
>
> What is parallel subsampling in bilingual corpus. IT would be great
> if I am getting an earnest reply
>
> --
> Thanks and regards,
>
> Sanjanasri J.P
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170512/6c2d4ecf/attachment-0001.html

------------------------------

Message: 2
Date: Fri, 12 May 2017 14:06:52 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] TRAINING_extract-phrases ERROR: malformed
XML
To: Ergun Bicici <bicici@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAAFADDCRYGB-f89Gm8yfo8QDZi6jt2uw+tHr3NstS-y7NKosSA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

you should replace the "<" and ">" with &lt; and &gt;

scripts/tokenizer/escape-special-chars.perl does that for you.

-phi

On Thu, May 11, 2017 at 3:12 PM, Ergun Bicici <bicici@gmail.com> wrote:

>
> clean-corpus-n.perl can clean XML tags before tokenization:
>
> sub word_count {
> my ($line) = @_;
> if ($ignore_xml) {
> $line =~ s/<\S[^>]*\S>/ /g;
> $line =~ s/\s+/ /g;
> $line =~ s/^ //g;
> $line =~ s/ $//g;
> }
> my @w = split(/ /,$line);
> return scalar @w;
> }
>
> Ergun
>
> On Thu, May 11, 2017 at 10:33 AM, Ergun Bicici <bicici@gmail.com> wrote:
>
>>
>> Similarly:
>> ERROR: some opened tags were never closed: it shares some features in
>> common with the SGML < ! [ CDATA [ ] ] > construct , in that it declares a
>> block of text which is not for parsing .
>>
>>
>> On Thu, May 11, 2017 at 10:32 AM, Ergun Bicici <bicici@gmail.com> wrote:
>>
>>>
>>> TRAINING_extract-phrases is giving
>>> ERROR: malformed XML: Wirtschaftsjahr Betriebsgr?sse < 50.000 kg
>>> 120.000 kg
>>> ERROR: malformed XML: < ! -- / * Font Definitions *
>>>
>>> etc.
>>>
>>> this appears to be due to the tokenization of html tags.
>>>
>>> Is there an option of Moses to handle these?
>>>
>>> --
>>>
>>> Regards,
>>> Ergun
>>>
>>> Ergun Bi?ici
>>> http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
>>>
>>
>>
>>
>> --
>>
>> Regards,
>> Ergun
>>
>> Ergun Bi?ici
>> http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
>>
>
>
>
> --
>
> Regards,
> Ergun
>
> Ergun Bi?ici
> http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170512/d03e049d/attachment-0001.html

------------------------------

Message: 3
Date: Fri, 12 May 2017 20:14:56 -0700
From: Steve Braich <stevebpdx@gmail.com>
Subject: [Moses-support] Moses Server Performance/Optimization and
Moses2
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAELTAkmXSH7ZTZf-=xhw7Q8G8fA3DDgRM6JqUQrN6Wo+OgvPyw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,
I am trying to improve the performance of Moses in a server environment
(not training).

I have tried various things, including the following which is described on
http://www.statmt.org/moses/?n=Moses.Optimize.
It has had no impact:

- *Multi-threading: *Up to 16 Cores on Google's Cloud
I used the option -threads all and -threads 16

- *Memory:*
I have launched Moses with as much as 60 GB of memory

- *Caching the Models*
Note: I only have the following files that I understand to be what you
need to translate, including Phrase Table, Reordering Table, and target
language model.
Am I missing anything? Your documentation suggests that I might at
http://www.statmt.org/moses/?n=Moses.Optimize#ntoc2
<http://www.statmt.org/moses/?n=Moses.Optimize>. I see these other
files in my EMS directory but I get no errors without them.

cat /home/autom8tr8n/mt_publish/model_enzh/phrase-table.minphr >
/dev/null

cat /home/autom8tr8n/mt_publish/model_enzh/reordering-table.minlexr >
/dev/null

cat /home/autom8tr8n/mt_publish/model_enzh/target.blm.zh > /dev/null

-
*Memory: *I have launched Moses with as much as 60 GB of memory

- *Other:*
I made sure "transparent huge pages" are enabled, and I use the compact
phrase and reordering tables.


So, I am now trying Moses2 and I am following the instructions here (
http://www.statmt.org/moses/?n=Site.Moses2). I am having some issues with
this. Here are my questions:

- *Does the Moses executable have to match for Training and Server?*
In other words, do I have to train the models with Moses2 if I want to
run them with Moses2 Server?

- *Moses2 Error Messages:*
I tried just running it with Moses2 (without server and just basic
options):
~/mosesdecoder/bin/moses2 -f ~/mt_publish/model_enzh/moses.ini

I get this error message:
Starting...
Defined parameters (per moses.ini or switch):
config: /home/autom8tr8n/mt_publish/model_enzh/moses.ini
distortion-limit: 6
feature: UnknownWordPenalty WordPenalty PhrasePenalty
PhraseDictionaryCompact name=TranslationModel0 num-features=4
path=/home/autom8tr8n/mt_publish/model_enzh/phrase-table.minphr
input-factor=0 output-factor=0 LexicalReordering name=LexicalReordering0
num-features=6 type=wbe-msd-bidirectional-fe-allff input-factor=0
output-factor=0
path=/home/autom8tr8n/mt_publish/model_enzh/reordering-table Distortion
KENLM name=LM0 factor=0
path=/home/autom8tr8n/mt_publish/model_enzh/target.blm.zh order=3
input-factors: 0
mapping: 0 T 0
mark-unknown:
server:
server-port: 3001
unknown-word-prefix: __UNK__
unknown-word-suffix: __UNK__
weight: LexicalReordering0= 0.0892686 0.0768025 0.0930783 0.0716109
0.00303172 0.0492278 Distortion0= 0.0399769 LM0= 0.107799 WordPenalty0=
-0.242343 PhrasePenalty0= 0.0403541 TranslationModel0= 0.0396487 0.0448175
0.0681687 0.0338723 UnknownWordPenalty0= 1


*Feature name PhraseDictionaryCompact is not registered.Aborted (core
dumped)^This error suggests that I didn't compile with CMPH. I did. See
below.*


- *Compilation*
From the file dates, Moses2 appears to have been compiled when I
initially compiled Moses (1). Here is how I compiled it:
./bjam -a --with-boost=/home/autom8tr8n/mosesdecoder/boost_1_63_0
--with-cmph=/home/autom8tr8n/mosesdecoder/cmph/cmph2.0
--with-xmlrpc-c=/home/autom8tr8n/mosesdecoder/opt

Where is the compilation log file where I can see what Moses 2 was
compiled with?


Please answer my questions about Moses2, look over the other optimization I
did, and please, I am all ears if you have other suggestions to boost
performance. My very unscientific manual testing has shown about 1.2
seconds to translate a phrase. No improvement at all has been made with
any optimization that I have tried so far. A VM with a single core and
3.75 GB of memory performs just as good as a 16 core, 60 GB memory VM.

Thanks in advance,

Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170512/b60c371b/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 127, Issue 17
**********************************************

0 Response to "Moses-support Digest, Vol 127, Issue 17"

Post a Comment