Moses-support Digest, Vol 107, Issue 6

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: clarification CBPT vs MMSAPT (Vincent Nguyen)


----------------------------------------------------------------------

Message: 1
Date: Tue, 1 Sep 2015 18:17:50 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] clarification CBPT vs MMSAPT
To: ugermann@inf.ed.ac.uk
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <55E5CFAE.90001@neuf.fr>
Content-Type: text/plain; charset="utf-8"

I didn't make myself clear.
I don't want to add material "dynamically", I want to add some new
corpus to an existing one.

What I don't know is how to align the new corpus incrementally to an
existing one that has been aligned with fastalign.

clearer ?

by the way,in what you mention below, the new material alignment file is
generated based on the new material only or including the existing
alignment ?


Le 01/09/2015 15:32, Ulrich Germann a ?crit :
> Hi Vincent,
>
> 1. To seed the foreground corpus at start-up, you need to provide
> three files (I use ${L1} and ${L2} to indicate language tags,
> ${L1} is the source language, ${L2} the target language. These
> tags must match those given in the L1 and L2 parameters of the
> Mmsapt line in moses.ini.
>
> /some/path/[basename.]${L1}.txt.gz
> /some/path/[basename.]${L2}.txt.gz
> /some/path/[basename.]${L1}-${L2}.symal.gz
>
> Then, in the Mmsapt line in moses.ini, add the parameter
> extra=/some/path/[basename.]
>
> Note that the extra specifiation (like the path parameter) must
> end either in '.' (of the files have a prefix) or '/' (if they
> don't). Files must be gzipped and end in .txt.gz or symal.gz,
> respectively.
>
> 2. To add material dynamically:
>
> o with the moses server, use the update interface of the xmlrpc
> server; see scripts/contrib/sim-pe.py for an example.
> o to simulate post-editing with moses in batch mode, specify
> --spe-src /path/to/source --spe-trg /path/to/target --spe-aln
> /path/to/word-alignment-file.
> E.g.
>
> moses -f moses.ini --spe-src input.en --spe-trg reference.de
> <http://reference.de> --spe-aln en-de.symal
>
> This will translate one sentence, then add input sentence,
> reference (as read from file), and pre-computed word alignment
> to the parallel data.
> In this case (in contrast to the parameter 'extra ' in the
> Mmsapt line, which mandates that the text files are gzipped),
> the files should be plain, uncompressed text files.
>
> - Uli
>
> On Tue, Sep 1, 2015 at 1:11 PM, Vincent Nguyen <vnguyen@neuf.fr
> <mailto:vnguyen@neuf.fr>> wrote:
>
> Hi Uli,
>
> For your point3. here is what I would like to do / understand :
>
> I have an LM and a TM built with EMS but alignment being done by
> FastAlign. So there is no vcb files for the baseline.
>
> In this context I don't see if I can to integrate a new
> incremental corpus to the previous baseline corpus.
>
> hope this is clearer.
>
> Vincent
>
>
>
> Le 23/08/2015 00:36, Ulrich Germann a ?crit :
>> Hi Vincent,
>>
>> 1. I don't use EMS, so I'm the wrong person to ask.
>> 2. Please always post questions to the moses-support mailing
>> list, so that others can benefit from questions and answers as well.
>> 3. Can you briefly explain what you are trying to accomplish? I
>> don't think I understand what you are actually trying to do.
>>
>> Best regards - Uli
>>
>> On Sat, Aug 22, 2015 at 10:45 PM, Vincent Nguyen <vnguyen@neuf.fr
>> <mailto:vnguyen@neuf.fr>> wrote:
>>
>>
>> I kept reading again and again this
>> http://www.statmt.org/moses/?n=Advanced.Incremental
>> but this is not clear enough for a newbie like me for use
>> with EMS.
>> I also see a section in the EMS config file :
>> use of baseline aligment model (incremental training)
>> and I don't really see how it comes with the rest of parameters.
>>
>>
>>
>> Le 22/08/2015 16:31, vnguyen@neuf.fr <mailto:vnguyen@neuf.fr>
>> a ?crit :
>>> Oops
>>> Using EMS i built the phrase table with the mmsapt=
>>> Option and it went through
>>> But i had not added the training-options
>>> -final-alignment-model hmm
>>>
>>> Do i need to start again?
>>>
>>> The thing is i use dyers aligner because of the giga corpus
>>> and i am not sure that training option is compatible since
>>> the tuto mentions giza++ modified...
>>>
>>>
>>>
>>> ____________________
>>>
>>> De : "Ulrich Germann"
>>> Date : 21 ao?t 2015 15:54:08
>>> A : Vincent Nguyen
>>> Cc : prashant@fbk.eu <mailto:prashant@fbk.eu>,
>>> moses-support@mit.edu <mailto:moses-support@mit.edu>
>>> Sujet : Re: [Moses-support] clarification CBPT vs MMSAPT
>>>
>>>
>>>
>>> On Thu, Aug 20, 2015 at 5:40 PM, Vincent Nguyen
>>> <vnguyen@neuf.fr <mailto:vnguyen@neuf.fr>> wrote:
>>>
>>> Thanks to both of you. I will it a try to both solutions.
>>>
>>> For MMSAPT :
>>> Will I be able to make it work with the Giga corpus
>>> fr-en ? If everything is loaded in memory I may be short
>>> of ram rather quickly.
>>>
>>>
>>> For the WMT-15 fr-en data, mmsapt's files are about 20GB in
>>> total, but not all of it will normally be kept in memory.
>>> Mmsapt degrades gracefully, it just gets slow if the VM
>>> manager has to drop memory pages and re-load them. The LM is
>>> about 40GB, so for optimal performance you should calculate
>>> 60+GB of RAM. Provided you have enough RAM, cat all model
>>> files to /dev/null prior to starting moses. Sequential disk
>>> access is much faster than random disk access, and the cat
>>> to /dev/null will push them into the OS's file cache.
>>>
>>> Plus I was using dyers fast align ... so do I need to
>>> realign the whole corpus with the modified version of
>>> giza++ ?
>>>
>>> You need word alignments in the output format produced by
>>> symal (ie. row-column pairs 1-1 2-2 3-4 etc.). How these
>>> alignments are produced doesn't matter for Mmsapts ability
>>> to handle them. It may, of course, affect the alignment
>>> quality, but that's independent of which phrase table
>>> implementation you use.
>>>
>>> - Uli
>>>
>>> For CBPT :
>>> I would like to give the the MT adative server a try but
>>> I don't really understand how to adapt the given
>>> "adaptive model" and "updater model"
>>> in a context where my language pair is different. these
>>> preliminary steps are not part of the tutorial.
>>> (especially the updater_models/alignment folders ...)
>>>
>>> The only glitch I see in the CBPT is that adaptive
>>> changes cannot be made permanent.
>>>
>>>
>>>
>>>
>>> Le 20/08/2015 <tel:20/08/2015> 16:17, Ulrich Germann a
>>> ?crit :
>>>> Memory-mapped phrase tables are an alternative to
>>>> conventional phrase tables. They are much, much faster
>>>> to build, only slightly slower than CompactPT at
>>>> runtime, and at the very least competitive in terms of
>>>> BLEU performance. I usually observe slightly higher
>>>> BLEU scores, but for each individual evaluation, the
>>>> difference is usually not significant. They support
>>>> only phrase-based MT, but not syntax-based MT.
>>>>
>>>> Both Mmsapt and CBPT also cater to post-editing
>>>> scenarios (CBPT were specifically developed for this
>>>> purpose). They allow adding new material to the phrase
>>>> tables at run time. I can't say much about CBPT
>>>> (apparently you add phrase table entries, and there is
>>>> a decay function that rewards more recent choices
>>>> approved by the translator), but in the case of Mmsapt
>>>> (since it samples at lookup time anyway), you can add
>>>> new word-aligned parallel text at run time to the
>>>> training data (or additional material at start-up;
>>>> additions are currently not stored on disk by the
>>>> server (do NOT use mosesserver, use moses --server
>>>> --port ...) and are lost when the server exits, but can
>>>> be loaded at startup time from text files, if they are
>>>> available (in other words: it's currently up to the
>>>> user/client who submits the additions to also store
>>>> them on disk if they are meant to be permanent). Mmsapt
>>>> offers numerous configuration options (separate scores
>>>> or joint scores for background and foreground corpus, a
>>>> provenance feature, etc.) that affect the number of
>>>> features, and there is no established best practice for
>>>> use in interactive MT (unless Michael Denkowski has
>>>> advice to offer in this respect).
>>>>
>>>> For phrase-based MT I recommend Mmsapt (see also my
>>>> paper in the coming issue of PBML), as it saves you a
>>>> lot of phrase table building agony. For interactive
>>>> use, the infrastructure is there but additional
>>>> research is required to figure out the optimal
>>>> configuration of feature functions and associated
>>>> parameters.
>>>>
>>>> Best regards - Uli Germann
>>>>
>>>> On Thu, Aug 20, 2015 at 12:56 AM, Prashant Mathur
>>>> <prashant@fbk.eu <mailto:prashant@fbk.eu>> wrote:
>>>>
>>>> Hi Vincent,
>>>>
>>>> The goal is incremental adaptation but these two
>>>> are different techniques in principle.
>>>> CBPT adds additional dynamic phrase table (with 1
>>>> additional feature) which allows deletion,
>>>> insertion of phrase pairs at any given time. For
>>>> incremental adaptation CBPT can be used in
>>>> conjunction with constraint based decoding as in
>>>> [1] or cascading onlineMgiza++ and normal phrase
>>>> extractor as in [2].
>>>> I don't have much idea about memory mapped suffix
>>>> array implementation but afaik with MMSAPT (which
>>>> uses 7 features) you can do incremental updates to
>>>> your model by adding stream of parallel data along
>>>> with the alignments.
>>>>
>>>> --Prashant
>>>>
>>>> [1]
>>>> http://www.cl.uni-heidelberg.de/~riezler/publications/papers/MTJOURNAL2014.pdf
>>>> <http://www.cl.uni-heidelberg.de/%7Eriezler/publications/papers/MTJOURNAL2014.pdf>
>>>> [2] http://mt4cat.org/software/adaptive-mt-server
>>>>
>>>>
>>>> On Wed, Aug 19, 2015 at 6:53 PM, Vincent Nguyen
>>>> <vnguyen@neuf.fr <mailto:vnguyen@neuf.fr>> wrote:
>>>>
>>>> Hello support,
>>>>
>>>> Going into advanced features of Moses, I am a
>>>> bit confused by the
>>>> differences and therefore which path to follow,
>>>> regarding the 2 features
>>>> CBPT and MMSAPT.
>>>>
>>>> I have the feeling the ultimate goal of both is
>>>> the same but maybe I am
>>>> wrong.
>>>>
>>>> Can someone explain the actual difference ?
>>>>
>>>> by the way the "update" feature of this page
>>>> http://demo.statmt.org/ is
>>>> based on which one ?
>>>>
>>>> Thanks
>>>>
>>>> Vincent.
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> <mailto:Moses-support@mit.edu>
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ulrich Germann
>>>> Senior Researcher
>>>> School of Informatics
>>>> University of Edinburgh
>>>
>>>
>>>
>>>
>>> --
>>> Ulrich Germann
>>> Senior Researcher
>>> School of Informatics
>>> University of Edinburgh
>>
>>
>>
>>
>> --
>> Ulrich Germann
>> Senior Researcher
>> School of Informatics
>> University of Edinburgh
>
>
>
>
> --
> Ulrich Germann
> Senior Researcher
> School of Informatics
> University of Edinburgh

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150901/75edce1a/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 107, Issue 6
*********************************************

0 Response to "Moses-support Digest, Vol 107, Issue 6"

Post a Comment