Moses-support Digest, Vol 106, Issue 43

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: clarification CBPT vs MMSAPT (Ulrich Germann)


----------------------------------------------------------------------

Message: 1
Date: Fri, 21 Aug 2015 14:54:08 +0100
From: Ulrich Germann <ulrich.germann@gmail.com>
Subject: Re: [Moses-support] clarification CBPT vs MMSAPT
To: Vincent Nguyen <vnguyen@neuf.fr>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAHQSRUqLdNYq1rzOi67vzqfS9PC+yswpOC87qJKetXWLV4FyWA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Thu, Aug 20, 2015 at 5:40 PM, Vincent Nguyen <vnguyen@neuf.fr> wrote:

> Thanks to both of you. I will it a try to both solutions.
>
> For MMSAPT :
> Will I be able to make it work with the Giga corpus fr-en ? If everything
> is loaded in memory I may be short of ram rather quickly.
>

For the WMT-15 fr-en data, mmsapt's files are about 20GB in total, but not
all of it will normally be kept in memory. Mmsapt degrades gracefully, it
just gets slow if the VM manager has to drop memory pages and re-load them.
The LM is about 40GB, so for optimal performance you should calculate 60+GB
of RAM. Provided you have enough RAM, cat all model files to /dev/null
prior to starting moses. Sequential disk access is much faster than random
disk access, and the cat to /dev/null will push them into the OS's file
cache.



> Plus I was using dyers fast align ... so do I need to realign the whole
> corpus with the modified version of giza++ ?
>
> You need word alignments in the output format produced by symal (ie.
row-column pairs 1-1 2-2 3-4 etc.). How these alignments are produced
doesn't matter for Mmsapts ability to handle them. It may, of course,
affect the alignment quality, but that's independent of which phrase table
implementation you use.

- Uli



> For CBPT :
> I would like to give the the MT adative server a try but I don't really
> understand how to adapt the given "adaptive model" and "updater model"
> in a context where my language pair is different. these preliminary steps
> are not part of the tutorial. (especially the updater_models/alignment
> folders ...)
>
> The only glitch I see in the CBPT is that adaptive changes cannot be made
> permanent.
>
>
>
>
> Le 20/08/2015 16:17, Ulrich Germann a ?crit :
>
> Memory-mapped phrase tables are an alternative to conventional phrase
> tables. They are much, much faster to build, only slightly slower than
> CompactPT at runtime, and at the very least competitive in terms of BLEU
> performance. I usually observe slightly higher BLEU scores, but for each
> individual evaluation, the difference is usually not significant. They
> support only phrase-based MT, but not syntax-based MT.
>
> Both Mmsapt and CBPT also cater to post-editing scenarios (CBPT were
> specifically developed for this purpose). They allow adding new material to
> the phrase tables at run time. I can't say much about CBPT (apparently you
> add phrase table entries, and there is a decay function that rewards more
> recent choices approved by the translator), but in the case of Mmsapt
> (since it samples at lookup time anyway), you can add new word-aligned
> parallel text at run time to the training data (or additional material at
> start-up; additions are currently not stored on disk by the server (do NOT
> use mosesserver, use moses --server --port ...) and are lost when the
> server exits, but can be loaded at startup time from text files, if they
> are available (in other words: it's currently up to the user/client who
> submits the additions to also store them on disk if they are meant to be
> permanent). Mmsapt offers numerous configuration options (separate scores
> or joint scores for background and foreground corpus, a provenance feature,
> etc.) that affect the number of features, and there is no established best
> practice for use in interactive MT (unless Michael Denkowski has advice to
> offer in this respect).
>
> For phrase-based MT I recommend Mmsapt (see also my paper in the coming
> issue of PBML), as it saves you a lot of phrase table building agony. For
> interactive use, the infrastructure is there but additional research is
> required to figure out the optimal configuration of feature functions and
> associated parameters.
>
> Best regards - Uli Germann
>
> On Thu, Aug 20, 2015 at 12:56 AM, Prashant Mathur <prashant@fbk.eu> wrote:
>
>> Hi Vincent,
>>
>> The goal is incremental adaptation but these two are different techniques
>> in principle.
>> CBPT adds additional dynamic phrase table (with 1 additional feature)
>> which allows deletion, insertion of phrase pairs at any given time. For
>> incremental adaptation CBPT can be used in conjunction with constraint
>> based decoding as in [1] or cascading onlineMgiza++ and normal phrase
>> extractor as in [2].
>> I don't have much idea about memory mapped suffix array implementation
>> but afaik with MMSAPT (which uses 7 features) you can do incremental
>> updates to your model by adding stream of parallel data along with the
>> alignments.
>>
>> --Prashant
>>
>> [1]
>> http://www.cl.uni-heidelberg.de/~riezler/publications/papers/MTJOURNAL2014.pdf
>> [2] http://mt4cat.org/software/adaptive-mt-server
>>
>>
>> On Wed, Aug 19, 2015 at 6:53 PM, Vincent Nguyen < <vnguyen@neuf.fr>
>> vnguyen@neuf.fr> wrote:
>>
>>> Hello support,
>>>
>>> Going into advanced features of Moses, I am a bit confused by the
>>> differences and therefore which path to follow, regarding the 2 features
>>> CBPT and MMSAPT.
>>>
>>> I have the feeling the ultimate goal of both is the same but maybe I am
>>> wrong.
>>>
>>> Can someone explain the actual difference ?
>>>
>>> by the way the "update" feature of this page <http://demo.statmt.org/>
>>> http://demo.statmt.org/ is
>>> based on which one ?
>>>
>>> Thanks
>>>
>>> Vincent.
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> Ulrich Germann
> Senior Researcher
> School of Informatics
> University of Edinburgh
>
>
>


--
Ulrich Germann
Senior Researcher
School of Informatics
University of Edinburgh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150821/8afbeb25/attachment-0001.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 106, Issue 43
**********************************************

0 Response to "Moses-support Digest, Vol 106, Issue 43"

Post a Comment