Moses-support Digest, Vol 110, Issue 11

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. New Box release (amittai)
2. Re: System requiremnts for Moses (Hegde, Sujay)
3. decoder question (Vincent Nguyen)
4. Re: Moses + MPI (Benson Muite)
5. Re: decoder question (John D Burger)
6. Re: decoder question (Vincent Nguyen)

----------------------------------------------------------------------

Message: 1
Date: Fri, 4 Dec 2015 11:38:51 +0700
From: amittai <amittai.box@gmail.com>
Subject: [Moses-support] New Box release
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <566118DB.3040401@gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed

Hullo --

I've released a new version of Box, a full-featured dev computer that
runs on the Amazon cloud. You'll need to sign up again, because I
switched to Amazon's fanciest back-end selling platform. Signing up is
still free. The new link is:

https://aws.amazon.com/marketplace/pp/B017IT0ZPU/?ref=_ptnr_151204

Changes:

* GPUs now available (choose g2.2xlarge or g2.8xlarge instance types).
* Many more instance types and sizes!
* Launch into all global regions. (us-east-1 is still cheapest)
* Instances can be paused and restarted (no cost while paused).
* More transparent billing.

More details on the website: www.boxresear.ch

Any questions, requests, or comments, please send me an email.
And if you're at the IWSLT workshop right now and want a demo, come find
me :)

Cheers,
~amittai

Here's what Box v2015-10-10 (current release) includes:
cdec/ Popular SMT framework
cmph/ Hashing library (for compact phrase tables)
ducttape/ Experiment management system
eigen3/ Linear algebra library
fast_align/ Word alignment tool
giza-pp/ Word alignment package (for Moses)
kenlm/ Language modeling toolkit
mgiza/ Multi-threaded Giza++
mosesdecoder/ Popular SMT framework
multeval/ MT evaluation tool
rnnlm/ Neural network language modeling toolkit
salm/ Suffix-array toolkit for NLP
scala/ Programming language
vowpal_wabbit/ Machine learning toolkit compatible with Moses
word2vec/ Continuous word representations

------------------------------

Message: 2
Date: Fri, 4 Dec 2015 05:22:10 +0000
From: "Hegde, Sujay" <Sujay.Hegde@xerox.com>
Subject: Re: [Moses-support] System requiremnts for Moses
To: Philipp Koehn <phi@jhu.edu>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>,
"MudaliarMudaliar, Preeti J" <preeti.mudaliarmudaliar@xerox.com>
Message-ID:
<586EA7C483504E48870F5BF54319B6EC4ED414@USA7109MB006.na.xerox.net>
Content-Type: text/plain; charset="utf-8"

Hi Phillip,

How do we limit phrase length during training .Is there a config parameter in moses training config file?

Is the phrase table the biggest model or the language model? ----> We have 6-7 phrase tables that are combined in a log-linear fashion during decoding.

Thanks and Regards,
Sujay,
Xerox Business Services, Bangalore, India

-----Original Message-----
From: phkoehn@gmail.com [mailto:phkoehn@gmail.com] On Behalf Of Philipp Koehn
Sent: 03 December 2015 21:52
To: Hegde, Sujay
Cc: moses-support@mit.edu; MudaliarMudaliar, Preeti J
Subject: Re: [Moses-support] System requiremnts for Moses

Hi,

having such long sentences should cause all kinds of problems with word alignment, so I am bit puzzled that they still show up when pruning the phrase table.

A good way to prune the phrase table is to limit the length of phrases (max 5 does no harm, even max 4 is not a big deal), and reduce low probability phrase pairs ($MOSES/scripts/training/threshold-filter.perl).

Is the phrase table the biggest model or the language model? For the latter, there are several compression options.

-phi

On Thu, Dec 3, 2015 at 12:32 AM, Hegde, Sujay <Sujay.Hegde@xerox.com> wrote:
> HI Philipp,
>
>
>
> Thanks a lot.
>
>
>
> Actually it?s a VIRTUAL machine.
>
>
>
> Also we have compressed the models into .minphr and
> .minlexr but we couldn?t prune it as while pruning we got an error
> saying some of the sentences in the Corpus are too long and it cannot be pruned.
>
>
>
> We used pruning using SALM and get the following error:
>
>
>
> /mnt/hd1/git/salm/Bin/Linux/Index/IndexSA.O64
> opensub.train.it
>
> Initialize vocabulary file: opensub.train.it.id_voc
>
> Loading existing vocabulary file: opensub.train.it.id_voc
>
> Total 100 word types loaded
>
> Max VocID=100
>
> Sentence 4152148 has more than 256 words. Can not handle such long sentence.
> Please cut it short first!
>
>
>
> Is there anything we could do about the above?
>
>
>
>
>
>
>
> Thanks and Regards,
>
> Sujay,
>
> Xerox Business Services, Bangalore, India
>
>
>
> From: phkoehn@gmail.com [mailto:phkoehn@gmail.com] On Behalf Of
> Philipp Koehn
> Sent: 03 December 2015 03:13
> To: Hegde, Sujay
> Cc: moses-support@mit.edu
> Subject: Re: [Moses-support] System requiremnts for Moses
>
>
>
> Hi,
>
>
>
> the machine you have is certainly sufficient even for large models.
>
>
>
> If you are running two language pairs in parallel and run into RAM
> problems, you may want to look into ways to compress the model files
> (phrase table, reordering table, language model) using either more
> efficient data structures (e.g., various KENLM options), or pruning the models.
>
>
>
> -phi
>
>
>
>
>
> On Tue, Dec 1, 2015 at 5:08 AM, Hegde, Sujay <Sujay.Hegde@xerox.com> wrote:
>
> Dear Moses Admin,
>
>
>
> We are using Moses decoder for commercial environment.
>
>
>
> We have 132GB RAM, 1TB disk and quadcore Virtual
> Machine with CentOs OS.
>
>
>
> We have 2 language pairs installed, and when running
> both the models together the Translation hangs(Takes a LONG time).
>
> It is fine when we run only one language model.
>
>
>
> Is there any Specific System requirements needed for moses?
>
> Please let me know
>
>
>
> Thanks and Regards,
>
> Sujay,
>
> Xerox Business Services, Bangalore, India
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

------------------------------

Message: 3
Date: Fri, 4 Dec 2015 10:43:38 +0100
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: [Moses-support] decoder question
To: moses-support <moses-support@mit.edu>
Message-ID: <5661604A.5050607@neuf.fr>
Content-Type: text/plain; charset=utf-8; format=flowed

Actually I don't know if this is a decoder question or such.

Here is my issue

Let's say I have a text string with 2 sentences, with a period ending
the first sentence, but no CR+LF, just a space before the second sentence.

When I pass the full string to the pipe :
tokenizer + truecaser + moses + detruecase + detokenizer
the output is only one sentence, the period at the end of the first
sentence has been eliminated, the sentence is nonsense (well not good at
all)

If I insert a CRLF just after the period of the first sentence and send
the whole thing to the pipe, the output is correct.

Am I missing something ?

Should we only send string to moses segment by segment ?

thanks,
Vincent

------------------------------

Message: 4
Date: Fri, 4 Dec 2015 13:07:45 +0200
From: Benson Muite <benson.muite@ut.ee>
Subject: Re: [Moses-support] Moses + MPI
To: Philipp Koehn <phi@jhu.edu>, Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <56617401.8000001@ut.ee>
Content-Type: text/plain; charset=utf-8

Hi,

Thanks. This seems to be the script:
https://github.com/moses-smt/mosesdecoder/blob/82527fc8b247ed34947c33ba808a92a1c07b054b/scripts/generic/moses-parallel.pl

Looks like can be modified to use other scheduling systems.

Benson

On 12/3/15 5:18 PM, Philipp Koehn wrote:
> Hi,
>
> the MPI support is only for MIRA training and I do not know how well
> maintained it is. It is not in active use.
>
> Since machine translation of large documents is easily parallelizable
> (one sentence per thread), there was never a strong push for even more
> parallelization.
>
> -phi
>
> On Thu, Dec 3, 2015 at 6:35 AM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>> There is support for multithreaded and distributed processing using grid
>> engines. This works for both training, tuning & decoding. But I don't think
>> there's wide support for MPI in Moses
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 25 November 2015 at 12:28, Benson Muite <benson.muite@ut.ee> wrote:
>>> Hi,
>>>
>>> IS there any information on performance improvement that can be obtained
>>> using Moses + MPI? Source code seems to have some MPI available for
>>> training using MIRA:
>>>
>>>
>>>
>>> https://github.com/moses-smt/mosesdecoder/blob/82527fc8b247ed34947c33ba808a92a1c07b054b/contrib/mira/Main.cpp
>>>
>>> It would be helpful to know how useful this is if anyone has tried it.
>>>
>>> Thanks,
>>> Benson
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>

--
Hajuss?steemide Teadur
Arvutiteaduse Instituut
Tartu ?likool
J.Liivi 2, 50409
Tartu
http://kodu.ut.ee/~benson
----
Research Fellow of Distributed Systems
Institute of Computer Science
University of Tartu
J.Liivi 2, 50409
Tartu, Estonia
http://kodu.ut.ee/~benson

------------------------------

Message: 5
Date: Fri, 4 Dec 2015 07:52:15 -0500
From: John D Burger <john@mitre.org>
Subject: Re: [Moses-support] decoder question
To: Vincent Nguyen <vnguyen@neuf.fr>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <886FCFD1-E255-45BB-AEE9-9F17247F39C8@mitre.org>
Content-Type: text/plain; charset="us-ascii"

I think you're asking if Moses translates one sentence at a time. The answer is yes.

- John Burger
MITRE

> On Dec 4, 2015, at 04:43, Vincent Nguyen <vnguyen@neuf.fr> wrote:
>
> Actually I don't know if this is a decoder question or such.
>
> Here is my issue
>
> Let's say I have a text string with 2 sentences, with a period ending
> the first sentence, but no CR+LF, just a space before the second sentence.
>
> When I pass the full string to the pipe :
> tokenizer + truecaser + moses + detruecase + detokenizer
> the output is only one sentence, the period at the end of the first
> sentence has been eliminated, the sentence is nonsense (well not good at
> all)
>
> If I insert a CRLF just after the period of the first sentence and send
> the whole thing to the pipe, the output is correct.
>
> Am I missing something ?
>
> Should we only send string to moses segment by segment ?
>
> thanks,
> Vincent
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

Message: 6
Date: Fri, 4 Dec 2015 14:18:53 +0100
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] decoder question
To: John D Burger <john@mitre.org>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <566192BD.3080807@neuf.fr>
Content-Type: text/plain; charset=windows-1252; format=flowed

well not exactly my question. I know Moses translate one "line" at a
time, meaning a string ending with a line feed.

My question is more, if the string contains a PERIOD (tokenized as
such), separating the line in 2 "sentences" then how does it behave ?

given my observation I have the feeling that we really need to
"sentence-tokenize" first before word-tokenizing.

Le 04/12/2015 13:52, John D Burger a ?crit :
> I think you're asking if Moses translates one sentence at a time. The answer is yes.
>
> - John Burger
> MITRE
>
>> On Dec 4, 2015, at 04:43, Vincent Nguyen <vnguyen@neuf.fr> wrote:
>>
>> Actually I don't know if this is a decoder question or such.
>>
>> Here is my issue
>>
>> Let's say I have a text string with 2 sentences, with a period ending
>> the first sentence, but no CR+LF, just a space before the second sentence.
>>
>> When I pass the full string to the pipe :
>> tokenizer + truecaser + moses + detruecase + detokenizer
>> the output is only one sentence, the period at the end of the first
>> sentence has been eliminated, the sentence is nonsense (well not good at
>> all)
>>
>> If I insert a CRLF just after the period of the first sentence and send
>> the whole thing to the pipe, the output is correct.
>>
>> Am I missing something ?
>>
>> Should we only send string to moses segment by segment ?
>>
>> thanks,
>> Vincent
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 110, Issue 11
**********************************************

Moses-support Digest, Vol 110, Issue 11

0 Response to "Moses-support Digest, Vol 110, Issue 11"

Post a Comment