Moses-support Digest, Vol 82, Issue 44

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: New parameter "const std::string &line" in
FeatureFunction (Hieu Hoang)
2. Re: Error when attempting to translate: fails with "
StrayFactorException " (Hieu Hoang)
3. EAMT Best Thesis Award: new open to non-members (Mikel Forcada)

----------------------------------------------------------------------

Message: 1
Date: Wed, 28 Aug 2013 09:27:52 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] New parameter "const std::string &line"
in FeatureFunction
To: Lane Schwartz <dowobeha@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAEKMkbgtXcAf4dKo9W4z=9eBR_5YE2vBkv1pwFR=2APRVZoyjg@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

yrs --> hours

On 28 August 2013 09:27, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:

> line contains the string that is specified in the [feature] section for
> the feature. eg.
> KENLM lazyken=0 name=LM0 factor=0 path=../interpolated-binlm.1 order=3
> the line variable should be passed to the FeatureFunction base class
> during construction.
>
> you can then parse it for your feature-specific variables. Best to do this
> by
> 1. call ReadParameters() in your constructor
> 2. Implement
> SetParameter(const std::string& key, const std::string& value)
>
> see class PhraseBoundaryState for a good example, or the irstlm or srilm
> in about 2 yrs after i've cleaned them up...
>
>
>
> On 27 August 2013 19:22, Lane Schwartz <dowobeha@gmail.com> wrote:
>
>> I'm doing some LM tinkering, and I noticed that there's a new parameter
>> in FeatureFunction and its descendents.
>>
>> What is the following parameter represent?
>>
>> "const std::string &line"
>>
>> Thanks,
>> Lane
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130828/1fd67c46/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 28 Aug 2013 09:32:42 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Error when attempting to translate: fails
with " StrayFactorException "
To: Stefan Dumitrescu <dumitrescu.stefan@gmail.com>
Cc: moses-support@mit.edu
Message-ID: <521DB5AA.40305@gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Do you want to have multiple factors in your phrase-table?

The training command doesn't specify any factors. The ini file says your
phrase-table has only 1 factor for both input and output. However, your
translation rules contain 10 factors!

On 28/08/2013 08:59, Stefan Dumitrescu wrote:
> Hi Hieu,
>
> The training and test data is correctly processed, first tokenized
> (with moses' script), then truecased then annotated).
>
> I have trained a surface model on the unannotated (unfactored) data
> and everything runs smoothly. However, when i am using an annotated
> corpus (correctly annotated, each token becomes 5 factors) as well as
> an annotated input, then i get this exception.
>
> I tried recompiling moses with -max-factors 10, no change.
>
> I played with the -input-factors switch for the decoder, now i am
> getting this:
>
> .... (i cut the first part) ...
> line=KENLM lazyken=0 name=LM0 factor=0
> path=/usr/local/trans/corpus/tedlm/en.sur face.5gram.kni.blm order=5
> FeatureFunction: LM0 start: 14 end: 14
> Loading table into memory...done.
> Start loading text SCFG phrase table. Moses format : [63.000] seconds
> Reading /usr/local/trans/work/ted/m4/model/phrase-table.0-0.gz
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85-
> --90---95--100
> **************************************************************************************
> **************
> Exception: bitset::set
>
> It is a bit frustrating because i have used factored models several
> times in the past year without any issues..
>
> For my model m1, i did not specify any -translation-factors in the
> training phase and i got a phrase-table.gz which contained the five
> factors together as in :
>
> sdumitrescu /usr/local/trans/work/ted/scripts > zcat
> ../m1/model/phrase-table.gz | head -2
> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
> pe|pe|pe^S|S|Spca ||| !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
> !|!|!^EXCL|EXCL|EXCL ||| 0.5 0.018702 1 0.18952 ||| 0-0 1-0 0-1 2-2
> ||| 2 1 1
> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL |||
> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
> "|"|"^DBLQ|DBLQ|DBLQ ||| 1 0.291644 0.5 0.000225623 ||| 0-0 1-1 2-1
> 1-2 2-3 ||| 1 2 1
>
> For model m3 for example, trained with:
>
> ...(cut)...
> --root-dir /usr/local/trans/work/ted/m3 \
> --max-phrase-length 4 \
> --first-step $FIRSTSTEP \
> --alignment-factors 2-2 \
> --alignment grow-diag-final-and \
> --reordering-factors 2-2 \
> --reordering wbe-msd-bidirectional-fe
>
> i'm getting a phrase-table.0-0.gz:
>
> sdumitrescu /usr/local/trans/work/ted/scripts > zcat
> ../m3/model/phrase-table.0-0.gz | head -2
> ! ! ! " ||| . " ||| 0.000169635 2.32345e-08 1 0.262566 ||| 0-0 1-0 2-0
> 3-1 ||| 5895 1 1
> ! ! ! pe ||| ! ! ! ||| 0.5 0.0202114 1 0.190109 ||| 0-0 2-0 0-1 1-1
> 1-2 ||| 2 1 1
>
> Either one does not work with an annotated input file like:
>
> *I*|i|i^NN|NN|Nc *actually*|actually|actually^ADVE|ADVE|Rmp
> *am*|be|be^VERB1|VERB1|Vmip1s *.*|.|.^PERIOD|PERIOD|PERIOD
>
> .. i'm getting the strayfactor exception when not specifying any
> -input-factors (default 0), or exception: bitset::set when setting
> anything else.
>
> Thanks for your help,
> Stefan
>
> On 8/27/2013 6:07 PM, Hieu Hoang wrote:
>> did you escape your training and input data? There must not be |
>> characters in your data unless you are using factored models
>>
>> the moses tokenizer script does it, as well as the specific escape script.
>> scripts/tokenizer/tokenizer.perl
>> scripts/tokenizer/escape-special-chars.perl
>>
>> On 26/08/2013 15:23, Stefan Dumitrescu wrote:
>>> Hi!
>>>
>>> I have the following error when attempting to translate:
>>>
>>> Exception: moses/Word.cpp:109 in void
>>> Moses::Word::CreateFromString(Moses::FactorDirection, const
>>> std::vector<long unsigned int>&, const StringPiece&, bool) threw
>>> StrayFactorException because `fit'.
>>> You have configured 1 factors but the word !|!|!^EXCL|EXCL|EXCL contains
>>> factor delimiter | too many times.
>>>
>>> I have the following training script:
>>>
>>> /usr/local/trans/tools/moses/scripts/training/train-model.perl \
>>> --corpus /usr/local/trans/corpus/ted/train \
>>> --external-bin-dir=/usr/local/trans/tools/mgiza/bin \
>>> --parallel \
>>> --mgiza \
>>> --mgiza-cpus 8 \
>>> --f ro --e en \
>>> --lm 0:5:/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm:8 \
>>> --root-dir /usr/local/trans/work/ted/m1 \
>>> --max-phrase-length 4 \
>>> --first-step $FIRSTSTEP \
>>> --translation-factors 0-0 \
>>> --alignment grow-diag-final-and \
>>> --reordering wbe-msd-bidirectional-fe
>>>
>>> The train files are factored (5 factors: word, lemma, lemma^postag1,
>>> postag1, postag2). The training process works without any errors, it
>>> generates a valid phrase-table that looks like:
>>>
>>> zcat ../m1/model/phrase-table.gz | head -1
>>> !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
>>> pe|pe|pe^S|S|Spca ||| !|!|!^EXCL|EXCL|EXCL !|!|!^EXCL|EXCL|EXCL
>>> !|!|!^EXCL|EXCL|EXCL ||| 0.5 0.0187005 1 0.18952 ||| 0-0 1-0 0-1 2-2 |||
>>> 2 1 1
>>>
>>> I did not get this error a couple of months ago when working on another
>>> experiment. I'm guessing something changed in Moses and I am missing
>>> some required flag in my scripts? I am using scripts that have worked ok
>>> so far.
>>> I looked through the manual, and I tried using the -input-factors
>>> option, but i still receive the same error. What am I doing wrong? It is
>>> something trivial most likely, but I do appreciate your help with it.
>>>
>>> Thank you,
>>> Stefan
>>>
>>> (moses.ini below:)
>>> #########################
>>> ### MOSES CONFIG FILE ###
>>> #########################
>>>
>>> # input factors
>>> [input-factors]
>>> 0
>>>
>>> # mapping steps
>>> [mapping]
>>> 0 T 0
>>>
>>> [distortion-limit]
>>> 6
>>>
>>> # feature functions
>>> [feature]
>>> UnknownWordPenalty
>>> WordPenalty
>>> PhrasePenalty
>>> PhraseDictionaryMemory name=TranslationModel0 table-limit=20
>>> num-features=4 path=/usr/local/trans/work/ted/m1/model/phrase-table.gz
>>> input-factor=0 output-factor=0
>>> LexicalReordering name=LexicalReordering0 num-features=6
>>> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
>>> path=/usr/local/trans/work/ted/m1/model/reordering-table.wbe-msd-bidirectional-fe.gz
>>> Distortion
>>> KENLM lazyken=0 name=LM0 factor=0
>>> path=/usr/local/trans/corpus/tedlm/en.surface.5gram.kni.blm order=5
>>>
>>> # dense weights for feature functions
>>> [weight]
>>> UnknownWordPenalty0= 1
>>> WordPenalty0= -1
>>> PhrasePenalty0= 0.2
>>> TranslationModel0= 0.2 0.2 0.2 0.2
>>> LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
>>> Distortion0= 0.3
>>> LM0= 0.5
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130828/df95c02b/attachment-0001.htm

------------------------------

Message: 3
Date: Wed, 28 Aug 2013 10:47:25 +0200
From: Mikel Forcada <mlf@dlsi.ua.es>
Subject: [Moses-support] EAMT Best Thesis Award: new open to
non-members
To: moses-support@mit.edu
Message-ID: <521DB91D.7090000@dlsi.ua.es>
Content-Type: text/plain; charset="iso-8859-1"

*EAMT Best Thesis Award: new open to non-members*

Following a decision by the EAMT Executive Committee in its June 7
meeting, the eligibillity requirements have been modified so that the
prize is now open to European, Middle-Eastern and North African
researchers that are not members of the EAMT.

The modified text of the call follows.

Mikel L. Forcada
EAMT Secretary
------------------------------------------------------------------------

*EAMT Best Thesis Award 2013: modified call*

The European Association for Machine Translation
(EAMT,http://www.eamt.org) <http://www.eamt.org%29/>is an organization
that serves the growing community of people interested in MT and
translation tools, including users, developers, and researchers of this
increasingly viable technology.
The EAMT invites entries for its fifth EAMT Best Thesis Award for a PhD
or equivalent thesis on a topic related to machine translation.

*Eligibility*

Researchers who

* have submitted a PhD (or equivalent) thesis on a relevant topic in
an European, Northern African[1] or Middle Eastern[2] institution
within calendar year 2013,

* have successfully obtained the PhD or equivalent degree, and

* have not previously won a similar award,

are invited to submit their theses to the EAMT for consideration.

*Panel*

The submissions will be judged by a panel of experts who are EAMT
members and are appointed by the Executive Committee of the EAMT.

*Selection Criteria*

Each thesis will be judged according to how challenging the problem was,
to how relevant the results are for machine translation as a field, and
to the strength of their impact in terms of scientific publications.

*Scope*

The scope of the thesis need not be confined to a technical area, and
applications are also invited from students who carried out their
research into commercial and management aspects of machine translation.
Possible areas of research might include:

* development of machine translation or advanced computer-assisted
translation systems: software and resources

* machine translation for less-resourced languages

* the use of these systems in professional environments (freelance
translators, translation agencies, localisation, etc.)

* the increasing impact of machine translation on non-professional
Internet users and its impact in communications, social networking, etc.

* machine translation and post-editing

* spoken language translation

* the integration of machine translation and translation memory systems

* the integration of machine translation software in larger IT
applications

* the evaluation of machine translation systems in real tasks such as
those above

* the cross-fertilisation between machine translation and other
language technologies

*Prize*

The winner will be announced by the end of March, 2014, and will receive
a prize of EUR500, together with a suitably-inscribed certificate, and
will receive free EAMT membership for two years. In addition, the
recipient of the award will be required to briefly present their
research at the Annual Conference of the European Association for
Machine Translation in June 2014 (venue to be determined). In order to
facilitate this, the EAMT will waive the winner's registration costs,
and will make available a travel bursary of EUR200 to enable the
recipient of the award to attend the said conference.

*Submission*

Should you wish to enter your thesis, please use the following
link:http://www.easychair.org/conferences/?conf
<http://www.easychair.org/conferences/?conf=eamtbta2013>=eamtbta2013
<http://www.easychair.org/conferences/?conf=eamtbta2013>to submit a
single PDF file containing, in this order:

* a 2-page summary of your thesis in English, containing:

* your full contact details,

* the name and contact details of your supervisor(s),

* a copy of your CV in English (at most one page, plus a complete list
of publications directly related to the thesis)

* an electronic copy of your thesis

* optionally, an appendix with any other relevant information on the
thesis

By submitting their work, authors

* agree that, in case they are granted the award, any subsequently
published version of the thesis should carry the citation "Winner of
the Best Thesis Award 2013 of the European Association for Machine
Translation" and

* acknowledge the right of the EAMT to publicize the granting of the
award.

*Closing Date*

The closing date for submissions is December 31, 2013, 23:59 Central
European Time. No extensions will be granted.

[1] Algeria, Egypt, Libya, Morocco and Tunisia.
[2] Bahrain, Iran, Iraq, Israel, Jordan, Kuwait, Oman, Palestine, Qatar,
Saudi Arabia, Syria, United Arab Emirates, Yemen.

--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes Inform?tics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326

--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes Inform?tics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20130828/43d04030/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 82, Issue 44
*********************************************

Moses-support Digest, Vol 82, Issue 44

0 Response to "Moses-support Digest, Vol 82, Issue 44"

Post a Comment