Moses-support Digest, Vol 101, Issue 74

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Lattice MERT (Zheng Yuan)
2. Last CfP EUROPHRAS2015 and MUMTTT2015 - Deadline: 31 March
2015
(MONTI JOHANNA -Professore associato scienze umanistiche e sociali-d)
3. n-best list reranking (Matthias Huck)
4. Re: n-best list reranking (Holger Schwenk)
5. Re: n-best list reranking (Matthias Huck)

----------------------------------------------------------------------

Message: 1
Date: Fri, 27 Mar 2015 16:40:43 +0000
From: Zheng Yuan <yuanzheng_bupt@126.com>
Subject: [Moses-support] Lattice MERT
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <E948303C-52D2-4337-A273-B71930B4E0BA@126.com>
Content-Type: text/plain; charset="utf-8"

Hi,

Does anyone know where I can find some documents about how to use K?rlis Goba and Christian Buck?s implementation of Lattice MERT?

Or does anyone know how to use it with Moses?

https://github.com/christianbuck/Moses-Lattice-MERT <https://github.com/christianbuck/Moses-Lattice-MERT>

Thanks,
Zheng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150327/dd3c32d8/attachment-0001.htm

------------------------------

Message: 2
Date: Fri, 27 Mar 2015 17:33:46 +0100
From: "MONTI JOHANNA -Professore associato scienze umanistiche e
sociali-d" <jmonti@uniss.it>
Subject: [Moses-support] Last CfP EUROPHRAS2015 and MUMTTT2015 -
Deadline: 31 March 2015
To: multiword-expressions@lists.sourceforge.net,
<LINGUIST@listserv.linguistlist.org>, <elsnet-list@elsnet.org>,
<dbworld@cs.wisc.edu>, <mt-list@eamt.org>, <moses-support@mit.edu>,
<corpora@uib.no>, <IRList@lists.shef.ac.uk>,
<flarenet_subscribers@ilc.cnr.it>, elsnet-list@cogsci.ed.ac.uk,
ln@frmop11.bitnet, corpora@clu.bccs.uib.no
Message-ID: <20150327162400.M17391@uniss.it>
Content-Type: text/plain; charset="utf-8"

COMPUTERISED AND CORPUS-BASED APPROACHES TO PHRASEOLOGY: MONOLINGUAL AND
MULTILINGUAL PERSPECTIVES

EUROPHRAS2015: 29 JUNE-1 JULY 2015, MALAGA

http://www.europhras2015.eu/

In line with the overarching theme "Computerised and Corpus-based Approaches
to Phraseology: Monolingual and Multilingual Perspectives", the event will
focus on various technology-related topics related to phraseology. More
specifically, the conference will be a forum where most recent and advanced
computational and corpus-based methods applied in phraseology will be
discussed. As such, it will promote further development and innovations in the
field, including not only monolingual, but also multilingual approaches to
phraseology and translation.

TOPICS

The conference will invite papers addressing corpus-based phraseology and the
computational processing of phraseological units, such as their
identification, classification, extraction, analysis, translation and
representation. Papers will be accepted in English, Spanish, French and German
covering topics including (but not limited to):

* Corpus based phraseology;

* Computational approaches to monolingual and contrastive phraseology;

* NLP-driven and corpus based approaches to the teaching of phraseological units;

* Phraseology in E-Lexicography and E-Terminography;

* NLP and/or corpus-based identification of phraseological units;

* NLP and/or corpus-based classification of phraseological units;

* Computer-aided and/or corpus-based analysis of phraseological units;

* Machine-aided and corpus-based translation of phraseological units.

EUROPHRAS2015 CHAIR

Gloria Corpas Pastor

EUROPHRAS2015 KEYNOTE SPEAKERS

- Jean-Pierre Colson (Institut Libre Marie Haps, Brussels/Universit?
Catholique de Louvain, Belgium)

- Patrick Hanks (University of Wolverhampton/University of the West of
England, United Kingdom)

- Ulrich Heid (Universit?t Hildesheim, Germany)

- Ruslan Mitkov (University of Wolverhampton, United Kingdom)

The PROGRAMME COMMITTEE members are distinguished experts from all over the world.

SUBMISSION INFORMATION

See EUROPHRAS2015 website: http://www.europhras2015.eu/online

IMPORTANT DATES

31 MARCH 2015 ?New deadline for abstract submission (400-500 words, excluding
references)

30 APRIL 2015 ? ?Notification of acceptance

29 JUNE-1 JULY 2015 Conference

15 JULY 2015 ? ? Submission of camera-ready final versions

In conjunction with EUROPHRAS2015, the second Workshop on "Multi-word Units in
Machine Translation and Translation Technology" (MUMTTT2015) will take place as
an associated event (1-2 July).

MUMTTT WORKSHOP CHAIRS

- Gloria Corpas Pastor (University of Malaga, Spain)

- Ruslan Mitkov (University of Wolverhampton, United Kingdom)

- Johanna Monti (Universit? degli Studi di Sassari, Italy)

- Violeta Seretan (Universit? de Gen?ve, Switzerland)

WORKSHOP KEYNOTE SPEAKER

- Kathrin Steyer (Institut f?r deutsche Sprache, Mannheim, Germany)

For more information regarding the Workshop, please visit:

http://www.europhras2015.eu/presentation

VENUE (CONFERENCE AND WORKSHOP):

University of Malaga,
Faculty of Arts,
Campus of Teatinos s/n,
29071

Malaga, Spain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150327/ea896afb/attachment-0001.htm

------------------------------

Message: 3
Date: Fri, 27 Mar 2015 22:42:10 +0000
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: [Moses-support] n-best list reranking
To: Moses-support <moses-support@mit.edu>
Cc: Holger Schwenk <holger.schwenk@lium.univ-lemans.fr>
Message-ID: <1427496130.10837.19.camel@portedgar>
Content-Type: text/plain; charset="UTF-8"

Hi,

I'm looking for a tool to rerank n-best lists in Moses' current format,
including sparse features. The CSLM toolkit has quite a nice re-ranker
implementation, but apparently it doesn't know sparse features yet.

If anyone already has an extended version of the existing re-ranker from
the CSLM toolkit, or alternatively any other code that does the same and
can also deal with sparse features, please let me know. I'd prefer to
not spend any time at all on implementing this myself, as I'll probably
need to run it only a few times for testing purposes.

Cheers,
Matthias

> On 29 Apr 20:46 2013, Holger Schwenk wrote:
>
> Hello,
>
> you can do n-best list rescoring with the nbest tool which is part of
> the CSLM toolkit (http://www-lium.univ-lemans.fr/~cslm/)
> It is designed to rescore with back-off or continuous space LMs, but is
> shouldn't be difficult to add your won feature functions.
>
> don't ask to contact me if you need help.
>
> best,
>
> Holger

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

Message: 4
Date: Fri, 27 Mar 2015 23:48:39 +0100
From: Holger Schwenk <Holger.Schwenk@lium.univ-lemans.fr>
Subject: Re: [Moses-support] n-best list reranking
To: Matthias Huck <mhuck@inf.ed.ac.uk>, Moses-support
<moses-support@mit.edu>
Cc: holger.schwenk@lium.univ-lemans.fr
Message-ID: <5515DE47.1060109@lium.univ-lemans.fr>
Content-Type: text/plain; charset=utf-8; format=flowed

Hello Matthias,

could you give us an idea what is missing in the CSLM reranker to make
it work for sparse features ?

Right now, we do not parse the names of the feature functions and store
the numerical values only.
In principle, this could changed ...

Then it depends how you want to rescore the sparse features.
The CSLM toolkit can rescore with an back-off LM and Moses on-disk
phrase tables (and obviously neural networks).

Why not adding more functionality ...

- Holger

On 03/27/2015 11:42 PM, Matthias Huck wrote:
> Hi,
>
> I'm looking for a tool to rerank n-best lists in Moses' current format,
> including sparse features. The CSLM toolkit has quite a nice re-ranker
> implementation, but apparently it doesn't know sparse features yet.
>
> If anyone already has an extended version of the existing re-ranker from
> the CSLM toolkit, or alternatively any other code that does the same and
> can also deal with sparse features, please let me know. I'd prefer to
> not spend any time at all on implementing this myself, as I'll probably
> need to run it only a few times for testing purposes.
>
> Cheers,
> Matthias
>
>
>> On 29 Apr 20:46 2013, Holger Schwenk wrote:
>>
>> Hello,
>>
>> you can do n-best list rescoring with the nbest tool which is part of
>> the CSLM toolkit (http://www-lium.univ-lemans.fr/~cslm/)
>> It is designed to rescore with back-off or continuous space LMs, but is
>> shouldn't be difficult to add your won feature functions.
>>
>> don't ask to contact me if you need help.
>>
>> best,
>>
>> Holger
>
>

------------------------------

Message: 5
Date: Fri, 27 Mar 2015 23:57:52 +0000
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] n-best list reranking
To: Holger Schwenk <Holger.Schwenk@lium.univ-lemans.fr>
Cc: Moses-support <moses-support@mit.edu>
Message-ID: <1427500672.11543.46.camel@portedgar>
Content-Type: text/plain; charset="UTF-8"

Hi,

Right, if the `nbest` tool from CSLM is supposed to work with sparse
features, then it needs to read the names.

An n-best list entry with sparse feature scores may look like this:

0 ||| Orlando Bloom und Miranda Kerr noch lieben ||| LexicalReordering0= -2.29848 0 0 0 -1.93214 0 0 0 LexicalReordering0_phr-src-last-c200-cluster_162-0= 1 LexicalReordering0_phr-src-first-c200-cluster_41-0= 1 LexicalReordering0_stk-tgt-last-c200-cluster_134-0= 1 LexicalReordering0_phr-src-last-c200-cluster_189-0= 1 LexicalReordering0_phr-tgt-first-c200-cluster_54-0= 3 LexicalReordering0_phr-tgt-first-c200-cluster_34-0= 1 LexicalReordering0_stk-src-first-c200-cluster_59-0= 3 LexicalReordering0_phr-tgt-first-c200-cluster_134-0= 1 LexicalReordering0_phr-tgt-last-c200-cluster_54-0= 3 LexicalReordering0_phr-tgt-last-c200-cluster_34-0= 1 LexicalReordering0_stk-src-last-c200-cluster_59-0= 3 LexicalReordering0_phr-src-last-c200-cluster_126-0= 1 LexicalReordering0_phr-tgt-first-c200-cluster_119-0= 1 LexicalReordering0_phr-tgt-last-c200-cluster_134-0= 1 LexicalReordering0_phr-src-first-c200-cluster_59-0= 3 LexicalReordering0_phr-src-last-c200-cluster_59-0= 3 LexicalReordering0_stk-!
src-first-c200-cluster_162-0= 1 LexicalReordering0_stk-src-first-c200-cluster_189-0= 1 LexicalReordering0_stk-src-last-c200-cluster_162-0= 1 LexicalReordering0_stk-src-last-c200-cluster_189-0= 1 LexicalReordering0_stk-tgt-first-c200-cluster_34-0= 1 LexicalReordering0_stk-tgt-first-c200-cluster_54-0= 3 LexicalReordering0_phr-tgt-last-c200-cluster_133-0= 1 LexicalReordering0_phr-src-first-c200-cluster_162-0= 1 LexicalReordering0_phr-src-first-c200-cluster_189-0= 1 LexicalReordering0_stk-tgt-first-c200-cluster_134-0= 1 LexicalReordering0_stk-tgt-last-c200-cluster_34-0= 1 LexicalReordering0_stk-tgt-last-c200-cluster_54-0= 3 OpSequenceModel0= -31.707 0 0 0 0 Distortion0= 0 LM0= -36.858 WordPenalty0= -7 PhrasePenalty0= 6 TranslationModel0= -4.56369 -17.4541 -4.49325 -6.47188 0.999896 0 0 0 0 0 4.99948 ||| -4.99724

There can be many thousand different sparse features
"LexicalReordering0_*" which fire on one particular set and in
hypotheses which make it to the 100-best list.

The amount of features in different n-best list entries can vary.

It seems to me that the `nbest` tool from CSLM v3 cannot deal with this.
I had a brief look at the code, and I ran:

$ nbest -i in.100best -o out.100best

(Without specifying any new weights.)

It processes the list but outputs this:

0 ||| Orlando Bloom und Miranda Kerr noch lieben ||| 0 -2.29848 0 0 0 -1.93214 0 0 0 0 1 0 1 0 1 0 1 0 3 0 1 0 3 0 1 0 3 0 1 0 3 0 1 0 1 0 1 0 3 0 3 0 1 0 1 0 1 0 1 0 1 0 3 0 1 0 1 0 1 0 1 0 1 0 3 0 -31.707 0 0 0 0 0 0 0 -36.858 0 -7 0 6 0 -4.56369 -17.4541 -4.49325 -6.47188 0.999896 0 0 0 0 0 4.99948 ||| -4.99724

I think it just takes every token in the scores column and treats it as
a dense score (even including the feature names). Probably nobody
bothered to adapt it to the current format yet.

It would be a minor modification I suppose. The tool just needs to read
and store feature names. Weights would have to be stored by name as
well. They would have to be read from a sparse weights file:

...
LexicalReordering0_btn-src-first-c200-cluster_119-3 0.00840371
LexicalReordering0_btn-src-first-c200-cluster_12-2 0.000442284
LexicalReordering0_btn-src-first-c200-cluster_12-3 0.00182486
LexicalReordering0_btn-src-first-c200-cluster_120-2 5.34991e-06
LexicalReordering0_btn-src-first-c200-cluster_120-3 0.0143345
...

Is CSLM on GitHub? If you don't have a more recent version of the nbest
tool, and nobody else has anything equivalent, then I might take your
code base and just add the few bits that are missing in your tool. It
can be implemented quickly, I'm sure.

I don't want to add any new feature scores using the tool. I only want
to utilize it in order to calculate new overall scores given a weights
file with sparse features, and then to reorder the n-best list entries.
Not a big deal.

Basically, I would think that there should be some functioning tool
readily available for such a seemingly common task. But I'm not aware of
any. Maybe people code a new Perl script for this task on-demand each
time they need it? Or maybe some individual piece of code in the Moses
tuning pipeline does this, and only this?

Cheers,
Matthias

On Fri, 2015-03-27 at 23:48 +0100, Holger Schwenk wrote:
> Hello Matthias,
>
> could you give us an idea what is missing in the CSLM reranker to make
> it work for sparse features ?
>
> Right now, we do not parse the names of the feature functions and store
> the numerical values only.
> In principle, this could changed ...
>
> Then it depends how you want to rescore the sparse features.
> The CSLM toolkit can rescore with an back-off LM and Moses on-disk
> phrase tables (and obviously neural networks).
>
> Why not adding more functionality ...
>
> - Holger
>
> On 03/27/2015 11:42 PM, Matthias Huck wrote:
> > Hi,
> >
> > I'm looking for a tool to rerank n-best lists in Moses' current format,
> > including sparse features. The CSLM toolkit has quite a nice re-ranker
> > implementation, but apparently it doesn't know sparse features yet.
> >
> > If anyone already has an extended version of the existing re-ranker from
> > the CSLM toolkit, or alternatively any other code that does the same and
> > can also deal with sparse features, please let me know. I'd prefer to
> > not spend any time at all on implementing this myself, as I'll probably
> > need to run it only a few times for testing purposes.
> >
> > Cheers,
> > Matthias
> >
> >
> >> On 29 Apr 20:46 2013, Holger Schwenk wrote:
> >>
> >> Hello,
> >>
> >> you can do n-best list rescoring with the nbest tool which is part of
> >> the CSLM toolkit (http://www-lium.univ-lemans.fr/~cslm/)
> >> It is designed to rescore with back-off or continuous space LMs, but is
> >> shouldn't be difficult to add your won feature functions.
> >>
> >> don't ask to contact me if you need help.
> >>
> >> best,
> >>
> >> Holger
> >
> >
>
>

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 101, Issue 74
**********************************************

Moses-support Digest, Vol 101, Issue 74

0 Response to "Moses-support Digest, Vol 101, Issue 74"

Post a Comment