Moses-support Digest, Vol 107, Issue 2

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: clarification CBPT vs MMSAPT (Vincent Nguyen)

----------------------------------------------------------------------

Message: 1
Date: Tue, 1 Sep 2015 14:11:41 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] clarification CBPT vs MMSAPT
To: ugermann@inf.ed.ac.uk, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID: <55E595FD.4010906@neuf.fr>
Content-Type: text/plain; charset="utf-8"

Hi Uli,

For your point3. here is what I would like to do / understand :

I have an LM and a TM built with EMS but alignment being done by
FastAlign. So there is no vcb files for the baseline.

In this context I don't see if I can to integrate a new incremental
corpus to the previous baseline corpus.

hope this is clearer.

Vincent

Le 23/08/2015 00:36, Ulrich Germann a ?crit :
> Hi Vincent,
>
> 1. I don't use EMS, so I'm the wrong person to ask.
> 2. Please always post questions to the moses-support mailing list, so
> that others can benefit from questions and answers as well.
> 3. Can you briefly explain what you are trying to accomplish? I don't
> think I understand what you are actually trying to do.
>
> Best regards - Uli
>
> On Sat, Aug 22, 2015 at 10:45 PM, Vincent Nguyen <vnguyen@neuf.fr
> <mailto:vnguyen@neuf.fr>> wrote:
>
>
> I kept reading again and again this
> http://www.statmt.org/moses/?n=Advanced.Incremental
> but this is not clear enough for a newbie like me for use with EMS.
> I also see a section in the EMS config file :
> use of baseline aligment model (incremental training)
> and I don't really see how it comes with the rest of parameters.
>
>
>
> Le 22/08/2015 16:31, vnguyen@neuf.fr <mailto:vnguyen@neuf.fr> a
> ?crit :
>> Oops
>> Using EMS i built the phrase table with the mmsapt=
>> Option and it went through
>> But i had not added the training-options
>> -final-alignment-model hmm
>>
>> Do i need to start again?
>>
>> The thing is i use dyers aligner because of the giga corpus and i
>> am not sure that training option is compatible since the tuto
>> mentions giza++ modified...
>>
>>
>>
>> ____________________
>>
>> De : "Ulrich Germann"
>> Date : 21 ao?t 2015 15:54:08
>> A : Vincent Nguyen
>> Cc : prashant@fbk.eu <mailto:prashant@fbk.eu>,
>> moses-support@mit.edu <mailto:moses-support@mit.edu>
>> Sujet : Re: [Moses-support] clarification CBPT vs MMSAPT
>>
>>
>>
>> On Thu, Aug 20, 2015 at 5:40 PM, Vincent Nguyen <vnguyen@neuf.fr
>> <mailto:vnguyen@neuf.fr>> wrote:
>>
>> Thanks to both of you. I will it a try to both solutions.
>>
>> For MMSAPT :
>> Will I be able to make it work with the Giga corpus fr-en ?
>> If everything is loaded in memory I may be short of ram
>> rather quickly.
>>
>>
>> For the WMT-15 fr-en data, mmsapt's files are about 20GB in
>> total, but not all of it will normally be kept in memory. Mmsapt
>> degrades gracefully, it just gets slow if the VM manager has to
>> drop memory pages and re-load them. The LM is about 40GB, so for
>> optimal performance you should calculate 60+GB of RAM. Provided
>> you have enough RAM, cat all model files to /dev/null prior to
>> starting moses. Sequential disk access is much faster than random
>> disk access, and the cat to /dev/null will push them into the
>> OS's file cache.
>>
>> Plus I was using dyers fast align ... so do I need to realign
>> the whole corpus with the modified version of giza++ ?
>>
>> You need word alignments in the output format produced by symal
>> (ie. row-column pairs 1-1 2-2 3-4 etc.). How these alignments are
>> produced doesn't matter for Mmsapts ability to handle them. It
>> may, of course, affect the alignment quality, but that's
>> independent of which phrase table implementation you use.
>>
>> - Uli
>>
>> For CBPT :
>> I would like to give the the MT adative server a try but I
>> don't really understand how to adapt the given "adaptive
>> model" and "updater model"
>> in a context where my language pair is different. these
>> preliminary steps are not part of the tutorial. (especially
>> the updater_models/alignment folders ...)
>>
>> The only glitch I see in the CBPT is that adaptive changes
>> cannot be made permanent.
>>
>>
>>
>>
>> Le 20/08/2015 <tel:20/08/2015> 16:17, Ulrich Germann a ?crit :
>>> Memory-mapped phrase tables are an alternative to
>>> conventional phrase tables. They are much, much faster to
>>> build, only slightly slower than CompactPT at runtime, and
>>> at the very least competitive in terms of BLEU performance.
>>> I usually observe slightly higher BLEU scores, but for each
>>> individual evaluation, the difference is usually not
>>> significant. They support only phrase-based MT, but not
>>> syntax-based MT.
>>>
>>> Both Mmsapt and CBPT also cater to post-editing scenarios
>>> (CBPT were specifically developed for this purpose). They
>>> allow adding new material to the phrase tables at run time.
>>> I can't say much about CBPT (apparently you add phrase table
>>> entries, and there is a decay function that rewards more
>>> recent choices approved by the translator), but in the case
>>> of Mmsapt (since it samples at lookup time anyway), you can
>>> add new word-aligned parallel text at run time to the
>>> training data (or additional material at start-up; additions
>>> are currently not stored on disk by the server (do NOT use
>>> mosesserver, use moses --server --port ...) and are lost
>>> when the server exits, but can be loaded at startup time
>>> from text files, if they are available (in other words: it's
>>> currently up to the user/client who submits the additions to
>>> also store them on disk if they are meant to be permanent).
>>> Mmsapt offers numerous configuration options (separate
>>> scores or joint scores for background and foreground corpus,
>>> a provenance feature, etc.) that affect the number of
>>> features, and there is no established best practice for use
>>> in interactive MT (unless Michael Denkowski has advice to
>>> offer in this respect).
>>>
>>> For phrase-based MT I recommend Mmsapt (see also my paper in
>>> the coming issue of PBML), as it saves you a lot of phrase
>>> table building agony. For interactive use, the
>>> infrastructure is there but additional research is required
>>> to figure out the optimal configuration of feature functions
>>> and associated parameters.
>>>
>>> Best regards - Uli Germann
>>>
>>> On Thu, Aug 20, 2015 at 12:56 AM, Prashant Mathur
>>> <prashant@fbk.eu <mailto:prashant@fbk.eu>> wrote:
>>>
>>> Hi Vincent,
>>>
>>> The goal is incremental adaptation but these two are
>>> different techniques in principle.
>>> CBPT adds additional dynamic phrase table (with 1
>>> additional feature) which allows deletion, insertion of
>>> phrase pairs at any given time. For incremental
>>> adaptation CBPT can be used in conjunction with
>>> constraint based decoding as in [1] or cascading
>>> onlineMgiza++ and normal phrase extractor as in [2].
>>> I don't have much idea about memory mapped suffix array
>>> implementation but afaik with MMSAPT (which uses 7
>>> features) you can do incremental updates to your model
>>> by adding stream of parallel data along with the
>>> alignments.
>>>
>>> --Prashant
>>>
>>> [1]
>>> http://www.cl.uni-heidelberg.de/~riezler/publications/papers/MTJOURNAL2014.pdf
>>> <http://www.cl.uni-heidelberg.de/%7Eriezler/publications/papers/MTJOURNAL2014.pdf>
>>> [2] http://mt4cat.org/software/adaptive-mt-server
>>>
>>>
>>> On Wed, Aug 19, 2015 at 6:53 PM, Vincent Nguyen
>>> <vnguyen@neuf.fr <mailto:vnguyen@neuf.fr>> wrote:
>>>
>>> Hello support,
>>>
>>> Going into advanced features of Moses, I am a bit
>>> confused by the
>>> differences and therefore which path to follow,
>>> regarding the 2 features
>>> CBPT and MMSAPT.
>>>
>>> I have the feeling the ultimate goal of both is the
>>> same but maybe I am
>>> wrong.
>>>
>>> Can someone explain the actual difference ?
>>>
>>> by the way the "update" feature of this page
>>> http://demo.statmt.org/ is
>>> based on which one ?
>>>
>>> Thanks
>>>
>>> Vincent.
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>> --
>>> Ulrich Germann
>>> Senior Researcher
>>> School of Informatics
>>> University of Edinburgh
>>
>>
>>
>>
>> --
>> Ulrich Germann
>> Senior Researcher
>> School of Informatics
>> University of Edinburgh
>
>
>
>
> --
> Ulrich Germann
> Senior Researcher
> School of Informatics
> University of Edinburgh

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150901/995c16a8/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 107, Issue 2
*********************************************

Moses-support Digest, Vol 107, Issue 2

0 Response to "Moses-support Digest, Vol 107, Issue 2"

Post a Comment