Moses-support Digest, Vol 106, Issue 42

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: clarification CBPT vs MMSAPT (Vincent Nguyen)
2. Re: clarification CBPT vs MMSAPT (Prashant Mathur)

----------------------------------------------------------------------

Message: 1
Date: Thu, 20 Aug 2015 18:40:05 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] clarification CBPT vs MMSAPT
To: ugermann@inf.ed.ac.uk, Prashant Mathur <prashant@fbk.eu>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <55D602E5.9060304@neuf.fr>
Content-Type: text/plain; charset="utf-8"

Thanks to both of you. I will it a try to both solutions.

For MMSAPT :
Will I be able to make it work with the Giga corpus fr-en ? if
everything is loaded in memory I may be short of ram rather quickly.
Plus I was using dyers fast align ... so do I need to realign the whole
corpus with the modified version of giza++ ?

For CBPT :
I would like to give the the MT adative server a try but I don't really
understand how to adapt the given "adaptive model" and "updater model"
in a context where my language pair is different. these preliminary
steps are not part of the tutorial. (especially the
updater_models/alignment folders ...)

The only glitch I see in the CBPT is that adaptive changes cannot be
made permanent.

Le 20/08/2015 16:17, Ulrich Germann a ?crit :
> Memory-mapped phrase tables are an alternative to conventional phrase
> tables. They are much, much faster to build, only slightly slower than
> CompactPT at runtime, and at the very least competitive in terms of
> BLEU performance. I usually observe slightly higher BLEU scores, but
> for each individual evaluation, the difference is usually not
> significant. They support only phrase-based MT, but not syntax-based MT.
>
> Both Mmsapt and CBPT also cater to post-editing scenarios (CBPT were
> specifically developed for this purpose). They allow adding new
> material to the phrase tables at run time. I can't say much about CBPT
> (apparently you add phrase table entries, and there is a decay
> function that rewards more recent choices approved by the translator),
> but in the case of Mmsapt (since it samples at lookup time anyway),
> you can add new word-aligned parallel text at run time to the training
> data (or additional material at start-up; additions are currently not
> stored on disk by the server (do NOT use mosesserver, use moses
> --server --port ...) and are lost when the server exits, but can be
> loaded at startup time from text files, if they are available (in
> other words: it's currently up to the user/client who submits the
> additions to also store them on disk if they are meant to be
> permanent). Mmsapt offers numerous configuration options (separate
> scores or joint scores for background and foreground corpus, a
> provenance feature, etc.) that affect the number of features, and
> there is no established best practice for use in interactive MT
> (unless Michael Denkowski has advice to offer in this respect).
>
> For phrase-based MT I recommend Mmsapt (see also my paper in the
> coming issue of PBML), as it saves you a lot of phrase table building
> agony. For interactive use, the infrastructure is there but
> additional research is required to figure out the optimal
> configuration of feature functions and associated parameters.
>
> Best regards - Uli Germann
>
> On Thu, Aug 20, 2015 at 12:56 AM, Prashant Mathur <prashant@fbk.eu
> <mailto:prashant@fbk.eu>> wrote:
>
> Hi Vincent,
>
> The goal is incremental adaptation but these two are different
> techniques in principle.
> CBPT adds additional dynamic phrase table (with 1 additional
> feature) which allows deletion, insertion of phrase pairs at any
> given time. For incremental adaptation CBPT can be used in
> conjunction with constraint based decoding as in [1] or cascading
> onlineMgiza++ and normal phrase extractor as in [2].
> I don't have much idea about memory mapped suffix array
> implementation but afaik with MMSAPT (which uses 7 features) you
> can do incremental updates to your model by adding stream of
> parallel data along with the alignments.
>
> --Prashant
>
> [1]
> http://www.cl.uni-heidelberg.de/~riezler/publications/papers/MTJOURNAL2014.pdf
> <http://www.cl.uni-heidelberg.de/%7Eriezler/publications/papers/MTJOURNAL2014.pdf>
> [2] http://mt4cat.org/software/adaptive-mt-server
>
>
> On Wed, Aug 19, 2015 at 6:53 PM, Vincent Nguyen <vnguyen@neuf.fr
> <mailto:vnguyen@neuf.fr>> wrote:
>
> Hello support,
>
> Going into advanced features of Moses, I am a bit confused by the
> differences and therefore which path to follow, regarding the
> 2 features
> CBPT and MMSAPT.
>
> I have the feeling the ultimate goal of both is the same but
> maybe I am
> wrong.
>
> Can someone explain the actual difference ?
>
> by the way the "update" feature of this page
> http://demo.statmt.org/ is
> based on which one ?
>
> Thanks
>
> Vincent.
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Ulrich Germann
> Senior Researcher
> School of Informatics
> University of Edinburgh

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150820/4afa3c7b/attachment-0001.html

------------------------------

Message: 2
Date: Thu, 20 Aug 2015 20:35:46 +0200
From: Prashant Mathur <prashant@fbk.eu>
Subject: Re: [Moses-support] clarification CBPT vs MMSAPT
To: Vincent Nguyen <vnguyen@neuf.fr>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <B41B4B81-2DEF-4BDC-A399-0C9C985A2014@fbk.eu>
Content-Type: text/plain; charset="utf-8"

Hi Vincent,

Changes to CBPT can be made permanent, you can set cbtm-constant=true in moses config to disable aging of cache entries.

For other queries anyone from MateCat want to chip in here??

Best,
?Prashant

> On Aug 20, 2015, at 6:40 PM, Vincent Nguyen <vnguyen@neuf.fr> wrote:
>
> Thanks to both of you. I will it a try to both solutions.
>
> For MMSAPT :
> Will I be able to make it work with the Giga corpus fr-en ? if everything is loaded in memory I may be short of ram rather quickly.
> Plus I was using dyers fast align ... so do I need to realign the whole corpus with the modified version of giza++ ?
>
> For CBPT :
> I would like to give the the MT adative server a try but I don't really understand how to adapt the given "adaptive model" and "updater model"
> in a context where my language pair is different. these preliminary steps are not part of the tutorial. (especially the updater_models/alignment folders ...)
>
> The only glitch I see in the CBPT is that adaptive changes cannot be made permanent.
>
>
>
> Le 20/08/2015 16:17, Ulrich Germann a ?crit :
>> Memory-mapped phrase tables are an alternative to conventional phrase tables. They are much, much faster to build, only slightly slower than CompactPT at runtime, and at the very least competitive in terms of BLEU performance. I usually observe slightly higher BLEU scores, but for each individual evaluation, the difference is usually not significant. They support only phrase-based MT, but not syntax-based MT.
>>
>> Both Mmsapt and CBPT also cater to post-editing scenarios (CBPT were specifically developed for this purpose). They allow adding new material to the phrase tables at run time. I can't say much about CBPT (apparently you add phrase table entries, and there is a decay function that rewards more recent choices approved by the translator), but in the case of Mmsapt (since it samples at lookup time anyway), you can add new word-aligned parallel text at run time to the training data (or additional material at start-up; additions are currently not stored on disk by the server (do NOT use mosesserver, use moses --server --port ...) and are lost when the server exits, but can be loaded at startup time from text files, if they are available (in other words: it's currently up to the user/client who submits the additions to also store them on disk if they are meant to be permanent). Mmsapt offers numerous configuration options (separate scores or joint scores for background and foreg!
round corpus, a provenance feature, etc.) that affect the number of features, and there is no established best practice for use in interactive MT (unless Michael Denkowski has advice to offer in this respect).
>>
>> For phrase-based MT I recommend Mmsapt (see also my paper in the coming issue of PBML), as it saves you a lot of phrase table building agony. For interactive use, the infrastructure is there but additional research is required to figure out the optimal configuration of feature functions and associated parameters.
>>
>> Best regards - Uli Germann
>>
>> On Thu, Aug 20, 2015 at 12:56 AM, Prashant Mathur <prashant@fbk.eu <mailto:prashant@fbk.eu>> wrote:
>> Hi Vincent,
>>
>> The goal is incremental adaptation but these two are different techniques in principle.
>> CBPT adds additional dynamic phrase table (with 1 additional feature) which allows deletion, insertion of phrase pairs at any given time. For incremental adaptation CBPT can be used in conjunction with constraint based decoding as in [1] or cascading onlineMgiza++ and normal phrase extractor as in [2].
>> I don't have much idea about memory mapped suffix array implementation but afaik with MMSAPT (which uses 7 features) you can do incremental updates to your model by adding stream of parallel data along with the alignments.
>>
>> --Prashant
>>
>> [1] http://www.cl.uni-heidelberg.de/~riezler/publications/papers/MTJOURNAL2014.pdf <http://www.cl.uni-heidelberg.de/%7Eriezler/publications/papers/MTJOURNAL2014.pdf>
>> [2] http://mt4cat.org/software/adaptive-mt-server <http://mt4cat.org/software/adaptive-mt-server>
>>
>>
>> On Wed, Aug 19, 2015 at 6:53 PM, Vincent Nguyen < <mailto:vnguyen@neuf.fr>vnguyen@neuf.fr <mailto:vnguyen@neuf.fr>> wrote:
>> Hello support,
>>
>> Going into advanced features of Moses, I am a bit confused by the
>> differences and therefore which path to follow, regarding the 2 features
>> CBPT and MMSAPT.
>>
>> I have the feeling the ultimate goal of both is the same but maybe I am
>> wrong.
>>
>> Can someone explain the actual difference ?
>>
>> by the way the "update" feature of this page <http://demo.statmt.org/>http://demo.statmt.org/ <http://demo.statmt.org/> is
>> based on which one ?
>>
>> Thanks
>>
>> Vincent.
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>
>>
>>
>>
>> --
>> Ulrich Germann
>> Senior Researcher
>> School of Informatics
>> University of Edinburgh
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150820/af50b5ee/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 106, Issue 42
**********************************************

Moses-support Digest, Vol 106, Issue 42

0 Response to "Moses-support Digest, Vol 106, Issue 42"

Post a Comment