Moses-support Digest, Vol 100, Issue 1

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: KenLM's query utility (Kenneth Heafield)
2. Re: KenLM's query utility (Tom Hoar)
3. Re: KenLM's query utility (Kenneth Heafield)
4. Clarification of "Compact Phrase Table" instructions (Tom Hoar)
5. Re: Clarification of "Compact Phrase Table" instructions
(Raj Dabre)
6. Suggested change to train-model.perl (Tom Hoar)


----------------------------------------------------------------------

Message: 1
Date: Sat, 31 Jan 2015 13:17:27 -0500
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] KenLM's query utility
To: moses-support@mit.edu
Message-ID: <54CD1C37.7090806@kheafield.com>
Content-Type: text/plain; charset=windows-1252

Hi Tom,

I can. Have you looked at the python API though? It's much cleaner
and faster than what you're doing now.

Kenneth

On 01/31/2015 06:21 AM, Tom Hoar wrote:
> The KenLM `query` utility has changed in how it pipes to stdout. I'm
> using Python's subprocess.Popen() and stdin.write() with
> stdout.readline(). In Release 1, the output was unbuffered and piping
> line-by-line worked. In the newest version (RC-3), piping hangs at
> stdout.readline() as though `query` is buffering the output.
>
> Is it possible to add a command line switch to disable output buffers,
> similar to what we added to the tokenizer.perl and detokenizer.perl
> scripts (-b)?
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

Message: 2
Date: Sun, 01 Feb 2015 07:58:27 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] KenLM's query utility
To: moses-support@mit.edu
Message-ID: <54CD7A33.2090608@precisiontranslationtools.com>
Content-Type: text/plain; charset=windows-1252; format=flowed

Thanks, Ken.

On second thought, hold off on the request. We're updating utilities for
native Windows compatibility. We submit the changes (including
buffering) to you for review. Was there any reason specific reason you
changed the functionality from unbuffered to buffered output?




On 02/01/2015 01:17 AM, Kenneth Heafield wrote:
> Hi Tom,
>
> I can. Have you looked at the python API though? It's much cleaner
> and faster than what you're doing now.
>
> Kenneth
>
> On 01/31/2015 06:21 AM, Tom Hoar wrote:
>> The KenLM `query` utility has changed in how it pipes to stdout. I'm
>> using Python's subprocess.Popen() and stdin.write() with
>> stdout.readline(). In Release 1, the output was unbuffered and piping
>> line-by-line worked. In the newest version (RC-3), piping hangs at
>> stdout.readline() as though `query` is buffering the output.
>>
>> Is it possible to add a command line switch to disable output buffers,
>> similar to what we added to the tokenizer.perl and detokenizer.perl
>> scripts (-b)?
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



------------------------------

Message: 3
Date: Sat, 31 Jan 2015 20:52:11 -0500
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] KenLM's query utility
To: moses-support@mit.edu
Message-ID: <54CD86CB.7030400@kheafield.com>
Content-Type: text/plain; charset=windows-1252

Performance. Actually, if I had a fast integer to string converter
(with appropriate license and such), I'd add it to FakeOFStream and
remove the use of std::cout. There seem to be plenty floating around on
the Internet. Regardless, FakeOFStream supports Flush() so this doesn't
change much other than the call.

Kenneth

On 01/31/2015 07:58 PM, Tom Hoar wrote:
> Thanks, Ken.
>
> On second thought, hold off on the request. We're updating utilities for
> native Windows compatibility. We submit the changes (including
> buffering) to you for review. Was there any reason specific reason you
> changed the functionality from unbuffered to buffered output?
>
>
>
>
> On 02/01/2015 01:17 AM, Kenneth Heafield wrote:
>> Hi Tom,
>>
>> I can. Have you looked at the python API though? It's much cleaner
>> and faster than what you're doing now.
>>
>> Kenneth
>>
>> On 01/31/2015 06:21 AM, Tom Hoar wrote:
>>> The KenLM `query` utility has changed in how it pipes to stdout. I'm
>>> using Python's subprocess.Popen() and stdin.write() with
>>> stdout.readline(). In Release 1, the output was unbuffered and piping
>>> line-by-line worked. In the newest version (RC-3), piping hangs at
>>> stdout.readline() as though `query` is buffering the output.
>>>
>>> Is it possible to add a command line switch to disable output buffers,
>>> similar to what we added to the tokenizer.perl and detokenizer.perl
>>> scripts (-b)?
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

Message: 4
Date: Sun, 01 Feb 2015 09:16:18 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: [Moses-support] Clarification of "Compact Phrase Table"
instructions
To: moses-support@mit.edu
Message-ID: <54CD8C72.5000207@precisiontranslationtools.com>
Content-Type: text/plain; charset="utf-8"

The AdvancedFeatures page,
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures, includes this text:

Here is an example (standard phrase table phrase-table, with 5
scores) which produces a single file phrase-table.minphr:

mosesdecoder/bin/processPhraseTableMin -in phrase-table.gz -out
phrase-table -nscores 4 -threads 4


The text refers to "5 scores" yet the example shows "-nscores 4". Is
that intended?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150201/f24ae8d7/attachment-0001.htm

------------------------------

Message: 5
Date: Sun, 1 Feb 2015 14:20:15 +0900
From: Raj Dabre <prajdabre@gmail.com>
Subject: Re: [Moses-support] Clarification of "Compact Phrase Table"
instructions
To: Tom Hoar <tahoar@precisiontranslationtools.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAB3gfjCq2WVgsuAT8Bnj70hZq7DSnxbCk7fGxb6K767Tip-NWA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

AFAIK the phrase table used to have 5 scores per phrase pair... the 5th one
being 2.18.
Now it has 4.
Maybe someone forgot to update the description ?

Regards.

On Sun, Feb 1, 2015 at 11:16 AM, Tom Hoar <
tahoar@precisiontranslationtools.com> wrote:

> The AdvancedFeatures page,
> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures, includes this text:
>
> Here is an example (standard phrase table phrase-table, with 5 scores)
> which produces a single file phrase-table.minphr:
>
> mosesdecoder/bin/processPhraseTableMin -in phrase-table.gz -out
> phrase-table -nscores 4 -threads 4
>
>
> The text refers to "5 scores" yet the example shows "-nscores 4". Is that
> intended?
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
Raj Dabre.
Research Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150201/caccde13/attachment-0001.htm

------------------------------

Message: 6
Date: Sun, 01 Feb 2015 13:11:53 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: [Moses-support] Suggested change to train-model.perl
To: moses-support@mit.edu
Message-ID: <54CDC3A9.6080303@precisiontranslationtools.com>
Content-Type: text/plain; charset=utf-8; format=flowed

I acknowledge changes to train-model.perl are done with caution. Here's
one suggestion I think will benefit anyone using MGIZA++.

In the current RC-3 branch, the code looks like this starting at line 253:

# supporting binaries from other packages
my $MKCLS = "$_EXTERNAL_BINDIR/mkcls";
my $MGIZA_MERGE_ALIGN = "$_EXTERNAL_BINDIR/merge_alignment.py";
my $GIZA;
my $SNT2COOC;


A "standard" build of MGIZA includes these steps and creates a directory
tree with $MGIZAPP_PREFIX/bin, $MGIZAPP_PREFIX/lib and
$MGIZAPP_PREFIX/scripts.

cmake . -DCMAKE_INSTALL_PREFIX="$MGIZAPP_PREFIX"
make
make install


Today, the user must copy (symlink) the merge_alignment.py to complete
the setup:

$MGIZAPP_PREFIX/scripts/merge_alignment.py -->
$MGIZAPP_PREFIX/bin/merge_alignment.py

Then, the user sets the --external-bin-dir on the train-model.perl
command line:

--external-bin-dir $MGIZAPP_PREFIX/bin


I propose this modification to train-model.perl

# supporting binaries from other packages
my $MKCLS = "$_EXTERNAL_BINDIR/mkcls";
my $MGIZA_MERGE_ALIGN = "$_EXTERNAL_BINDIR/merge_alignment.py";
# added to fall back to MGIZA++ default install directory
if (! -x $MGIZA_MERGE_ALIGN) {
$MGIZA_MERGE_ALIGN="$_EXTERNAL_BINDIR/../scripts/merge_alignment.py";
}
my $GIZA;
my $SNT2COOC;


By adding this fall back, users who setup with a "standard" MGIZA++
`make install` don't have to copy the merge_alignment.py file, without
losing today's functionality.


------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 100, Issue 1
*********************************************

0 Response to "Moses-support Digest, Vol 100, Issue 1"

Post a Comment