Moses-support Digest, Vol 84, Issue 11

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Syntax Annotation Error Using "parse-en-collins.perl"
(Rajen Chatterjee)
2. Re: lattices with EPSILON (Hieu Hoang)
3. DEADLINE EXTENDED: A salaried PhD position in MT at
Wolverhampton (UK) (Konstantinova, Natalia)


----------------------------------------------------------------------

Message: 1
Date: Mon, 7 Oct 2013 11:52:11 +0530
From: Rajen Chatterjee <rajen.k.chatterjee@gmail.com>
Subject: Re: [Moses-support] Syntax Annotation Error Using
"parse-en-collins.perl"
To: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAC4-+Nw06aDq99t5xWhHfKiXQA8ddM1tku8qC3TPFdP5XKAkpg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

HI,
Thanks but I want to know the usage of "parse-en-collins.perl"
I have tried 2 ways
1) ./parse-en-collins.perl ./test.txt (where test.txt
contains this line "I am going to school .")
2) ./parse-en-collins.perl "I am going to school ."

In both of this usage I get this o/p:

----------------------------------------------------------------------------------------------------------------------------------------------------
Read 11692 items from /home/rajen/Public/jmx/tagger.project/word.voc
Read 45 items from /home/rajen/Public/jmx/tagger.project/tag.voc
Read 42680 items from /home/rajen/Public/jmx/tagger.project/tagfeatures.contexts
Read 42680 contexts, 117558 numFeatures from
/home/rajen/Public/jmx/tagger.project/tagfeatures.fmap
Read model /home/rajen/Public/jmx/tagger.project/model :
numPredictions=45, numParams=117558
Read tagdict from /home/rajen/Public/jmx/tagger.project/tagdict
*This is MXPOST (Version 1.0)*
*Copyright (c) 1997 Adwait Ratnaparkhi*
----------------------------------------------------------------------------------------------------------------------------------------------------

The program halts after this. Why is it halting?

On Sun, Oct 6, 2013 at 4:03 AM, Philipp Koehn <pkoehn@inf.ed.ac.uk> wrote:
> Hi,
>
> the wrapper script calls MXPOST and Collins parser internally, so it takes
> raw tokenized text, not already output from the parser.
>
> If you already have parsed data, then you will need to take the script
> apart and use only the conversion part.
>
> -phi
>
>
> On Thu, Oct 3, 2013 at 11:22 AM, Rajen Chatterjee
> <rajen.k.chatterjee@gmail.com> wrote:
>>
>> Hi,
>> I want to do syntax annotation for which I have used Collins
>> Parser, here is the o/p of Collins Parser:
>> PROB 756 -35.6989 0
>> TOP -35.6989 S -33.1933 NP -1.15926 NPB -0.869919 NNP 0 Pilgrimage
>> VP -28.0845 VBZ 0 is
>> PP -21.0829 IN 0 of
>> NP -12.3089 NPB -4.28776 JJ 0 utmost
>> NN 0 importance
>> PP -4.72756 IN 0 in
>> NP -1.15926 NPB -0.869919 NNP 0 Hinduism
>> (TOP~is~1~1 (S~is~2~2 (NPB~Pilgrimage~1~1 Pilgrimage/NNP ) (VP~is~2~1
>> is/VBZ (PP~of~2~1 of/IN (NP~importance~2~1 (NPB~importance~2~2
>> utmost/JJ importance/NN ) (PP~in~2~1 in/IN (NPB~Hinduism~1~1
>> Hinduism/NNP ./PUNC. ) ) ) ) ) ) )
>> TIME 0
>>
>>
>> Now I want to convert it to Moses format for which I am using
>> parse-en-collins.perl, but I don't know the usage of this script. It
>> is said that the script "parse-en-collins.perl" takes Collins Parser
>> output and convert to Moses format but if I pass the above o/p it give
>> me syntax error. So can I know the usage with a small example.
>>
>> Thanks
>>
>> --
>> -Regards,
>> Rajen Chatterjee.
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



--
-Regards,
Rajen Chatterjee.


------------------------------

Message: 2
Date: Mon, 7 Oct 2013 11:54:51 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] lattices with EPSILON
To: Ondrej Bojar <bojar@ufal.mff.cuni.cz>, Liane Kirsten GUILLOU
<L.K.Guillou@sms.ed.ac.uk>, luliang07@gmail.com, Alexandra Birch
<lexi.birch@gmail.com>
Cc: moses-support <moses-support@mit.edu>, cdec-users@googlegroups.com
Message-ID:
<CAEKMkbhCGArVOK208-shOipBpZqiCgNiBhTt1OUZfZz5jV-ceQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

@ondrej - Yes, Yulia's lattices look like confusion networks in disguised
so there will be a large number of paths through the lattice.

the memory explosion is due to my code creating an object for every path.
It was mainly for the reason mention previously above, ie:
I want to give each feature function the opprtunity to score with full
knowledge of the path.

However, the old binary phrase-table doesn't require these objects to do
the lookups. Therefore, to enable Yulia and anyone else to decode large
lattices, my code will not run when
1. decoding lattice/confusion networks, AND
2. using the old binary phrase table.

@Liang - thanks for the suggestions. I'm not sure how our lattice were
created. Lexi knows

thanks for all who responded, was very useful.



On 4 October 2013 22:20, Ondrej Bojar <bojar@ufal.mff.cuni.cz> wrote:

> Hi,
>
> while you can always run rmepsilon from openfst or other toolkit, epsilon
> edges will be probably particularly useful if one would use different
> semirings for different components of the score vector. With generic
> toolkits, all the components of the score vector are processed in a single
> manner. Depending on whether Moses features do the "plus" of their
> respective scores on their own, each feature can use its own semiring.
>
> The probably (in some sense) maximal explosion in the number of paths is
> achieved when the lattice has the form of a confusion network (no
> epsilons). You get the full cartesian product of choices of the first
> token, the second token etc.
>
> Cheers, Ondrej.
>
> "Hieu Hoang" <Hieu.Hoang@ed.ac.uk> wrote:
>
> >@nicola - i didn't see a reason either but some lattices from a speech
> >recognizer contains them so was just curious. I think chris has a point -
> >they may be easier to create.
> >
> >I think they may also more efficient to decode. In a non-deterministic
> >lattice, you might have the 2 edges with the same symbol coming out of 1
> >node. Each would have to be decoded separately.
> >
> >However, its a pain to decode epsilons and there might be weird edge
> cases,
> >eg. consecutive, beginning and end epsilons, entirely epsiloms.
> >
> >@chris - cheers for the explanation. i might use victor's code and see how
> >it goes.
> >
> >Do you have an example (large) lattice that blows up memory that you can
> >share?
> >
> >Yes - i've changed the code to extract all possible paths. In fact, i
> >extract all paths from beginning to end of sentence, without limit. 2
> >reasons for this
> > 1. I also divorced extracting the path creation from the phrase-table
> >lookup. In the general case there's multiple phrase-tables so it's
> >difficult to keep track of the tries. Also, the intertwinning of the
> binary
> >pt loookup with lattices made it difficult to read.
> > 2. I want to give each feature function the opprtunity to score with
> >full knowledge of the path.
> >
> >This may have to be altered if the memory explosion is too drastic
> >
> >
> >
> >
> >On 4 October 2013 17:49, Chris Dyer <cdyer@cs.cmu.edu> wrote:
> >
> >> It's useful to have epsilons since it simplifies the creation of
> >> lattices in some cases. Yes, you can convert them to a deterministic
> >> equivalent, but that involves implementing FSA determinatization (or
> >> using a tool like https://pypi.python.org/pypi/pyfst), which may not
> >> be convenient.
> >>
> >> Btw, I've also noticed that memory usage with lattices/CNs explodes
> >> with non-binarized phrase tables (maybe also with binarized PTs?).
> >> This is independent of the size of the phrase table and only seems to
> >> be a function of the lattice structure. I'm not sure what's going on
> >> (the code has changed substantially since I last looked at it). But,
> >> you should always match paths in the lattice with paths in the phrase
> >> table trie- maybe moses is now trying to extract all possible paths in
> >> the lattice up to max-phrase-size or something?
> >>
> >> On Fri, Oct 4, 2013 at 11:22 AM, Nicola Bertoldi <bertoldi@fbk.eu>
> wrote:
> >> > I don't see any reason why a lattice should contain an EPSILON edge.
> >> >
> >> > In a confusion network, EPSILON are needed to allow the translation of
> >> input of different lengths.
> >> > The sausage structure of the CN imposes the same amount of source
> words,
> >> > and the EPSILONs overcome this constraint.
> >> >
> >> > This is not the case for lattice, because you can have any number of
> >> edges/words in a complete source path.
> >> >
> >> >
> >> > cheers,
> >> > Nicola
> >> >
> >> >
> >> >
> >> > On Oct 4, 2013, at 2:52 PM, Hieu Hoang wrote:
> >> >
> >> > I'm just looking at the lattices decoding, as implemented in moses.
> >> >
> >> > for confusion networks, it's fair to have EPSILON words (that
> represent
> >> blank words). However, I don't see the point of them in lattices.
> >> >
> >> > Anyone have an opinion? How is it implemented in cdec & joshua?
> >> >
> >> > --
> >> > Hieu Hoang
> >> > Research Associate
> >> > University of Edinburgh
> >> > http://www.hoang.co.uk/hieu
> >> >
> >> > _______________________________________________
> >> > Moses-support mailing list
> >> > Moses-support@mit.edu<mailto:Moses-support@mit.edu>
> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >
> >> >
> >> > _______________________________________________
> >> > Moses-support mailing list
> >> > Moses-support@mit.edu
> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "cdec users" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to cdec-users+unsubscribe@googlegroups.com.
> >> For more options, visit https://groups.google.com/groups/opt_out.
> >>
> >
> >
> >
> >--
> >Hieu Hoang
> >Research Associate
> >University of Edinburgh
> >http://www.hoang.co.uk/hieu
> >_______________________________________________
> >Moses-support mailing list
> >Moses-support@mit.edu
> >http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> Ondrej Bojar
> http://www.cuni.cz/~obo
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131007/6568c60b/attachment-0001.htm

------------------------------

Message: 3
Date: Mon, 7 Oct 2013 11:41:46 +0000
From: "Konstantinova, Natalia" <N.Konstantinova@wlv.ac.uk>
Subject: [Moses-support] DEADLINE EXTENDED: A salaried PhD position in
MT at Wolverhampton (UK)
To: "elsnet-list@elsnet.org" <elsnet-list@elsnet.org>,
"corpora@uib.no" <corpora@uib.no>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>,
"mt-list@eamt.org" <mt-list@eamt.org>
Message-ID:
<C3FAEA4979A99C4B83BF71CDCF4E2E006E8673FC@EXCHMBX10I04.unv.wlv.ac.uk>
Content-Type: text/plain; charset="us-ascii"

----------------------------------------------------------------------------

Apologies for cross-posting.

Please circulate to any potentially interested parties
----------------------------------------------------------------------------

Applications are invited for an Early Stage Researcher pre-doctoral position in hybrid language translation technologies.

The deadline has been extended and now it is 22 October.


DESCRIPTION
-----------------------

The position is fixed term until September 2016 and is a part of the new EU Framework 7 Marie-Curie Network EXPERT, concerned with the exploitation of empirical approaches to machine translation, including statistical machine translation and example-based machine translation.

The EXPERT project brings together researchers from six European universities (University of Wolverhampton (UoW), University of Sheffield (USFD), Universidad de Malaga (UMA), Universitaet des Saarlandes (USAAR), Dublin City University (DCU) and Universiteit van Amsterdam (UvA)) and five translation services and technology providers (Pangeanic, Hermes and Translated, Celer Soluciones and WordFast).

The project allocated to the University of Wolverhampton that we are currently recruiting to is:


- Investigation of methodologies to evaluate the improved SMT, EBMT and TM prototypes and new hybrid computer-aided translation technology proposed in EXPERT (ESR12).

The research will be conducted under supervision of University of Wolverhampton with opportunities for collaborative work with other universities and industrial partners in EXPERT, and with the expectation of a 6 month secondment to 2-3 partners in this project based on the topic of research.


REQUIREMENTS & ELIGIBILITY
-----------------------------------------------

Applicants should hold a good honours degree in a relevant field of study (e.g. computer science, engineering, mathematics) and have experience in natural language processing, machine translation or related area. They should also have a solid background in mathematics/statistics and excellent programming skills (C/C++, Java, Python/Perl, etc.). See the job specification for more details on the expected profile: https://jobs.wlv.ac.uk/wd/plsql/wd_portal.list?p_web_site_id=3045&p_function=map&p_title=Current%20vacancies

Eligibility:

Appointment will be subject to the eligibility requirements of the Marie Curie programme, which specify that early stage researchers must be, at the time of recruitment, in the first four years of their research careers, measured from the date when they obtained the qualification which would entitle them to embark on a doctorate. As per Marie Curie Terms and Conditions, researchers must not have resided in the country of application for more than 12 months in the 3 years immediately prior to the appointment.

(Please check Section III.3 of the following link for further information on the Marie Curie scheme regarding eligibility criteria: ftp://ftp.cordis.europa.eu/pub/fp7/docs/fp7-mga-annex3intramulti_en.pdf )

Female candidates meeting the requirements for these posts are particularly encouraged to apply.


CONDITIONS
--------------------

This post is fixed-term until September 2016.

Terms and conditions of employment: Will be those for Marie Curie Early Stage Researchers.

Salary: Marie Curie rates will apply. For employment in the UK, the reference figure for gross salary is 51,062 euros/year.


HOW TO APPLY
-------------------------

Use the application system of the University of Wolverhampton (https://jobs.wlv.ac.uk/wd/plsql/wd_portal.list?p_web_site_id=3045&p_function=map&p_title=Current%20vacancies) , please make sure to include:

* Detailed CV;

* 2 page summary on the research to be undertaken on the chosen topic. It should include, but is not limited to, a). aim of the research; b). general research question to be refined during the Phd; c). research plan; d). why do have necessary skills to undertake this project?



(!) Also make sure to send a copy of your application to riilp@wlv.ac.uk<mailto:riilp@wlv.ac.uk> with subject "EXPERT job application "
Closing date: 22 October 2013

More information can be found at http://expert-itn.eu

Best regards,

Dr. Natalia Konstantinova
EXPERT Network Training Coordinator
Editorial Assistant for the Journal of Natural Language Engineering
Research Group in Computational Linguistics
Research Institute of Information and Language Processing
University of Wolverhampton
Stafford Street
WOLVERHAMPTON WV1 1LY
Email: n.konstantinova@wlv.ac.uk<mailto:n.konstantinova@wlv.ac.uk>
Tel: + 44 1902 322967
Fax: 01902 323 543


--
Scanned by iCritical.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131007/f74099c3/attachment.htm
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT00001.txt
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131007/f74099c3/attachment.txt

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 84, Issue 11
*********************************************

0 Response to "Moses-support Digest, Vol 84, Issue 11"

Post a Comment