Moses-support Digest, Vol 87, Issue 55

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: word alignment-words' indexes and sentences' length
(amir haghighi)
2. EMS MML IndexError: list index out of range (jian zhang)
3. Re: filter parallel corpus (Philipp Koehn)
4. Re: EMS MML IndexError: list index out of range (Barry Haddow)
5. HyTra deadline extension to Feb. 7 (EACL Workshop on Hybrid
Approaches to Translation) (Reinhard Rapp)


----------------------------------------------------------------------

Message: 1
Date: Fri, 24 Jan 2014 18:15:43 +0330
From: amir haghighi <amir.haghighi.64@gmail.com>
Subject: Re: [Moses-support] word alignment-words' indexes and
sentences' length
To: moses-support@mit.edu
Message-ID:
<CA+UVbEjb2m-xj3o3UG7M6PY8yte8Z5AUBbka5KPdLs-DOysaWQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Thank you Barry for your help.

Hi Amin,
I can't see the link. could you please attach it to your email?

Regrads
Amir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140124/1a3960b6/attachment-0001.htm

------------------------------

Message: 2
Date: Fri, 24 Jan 2014 14:53:21 +0000
From: jian zhang <jianzhang09@gmail.com>
Subject: [Moses-support] EMS MML IndexError: list index out of range
To: moses-support@mit.edu
Message-ID:
<CALA=z0CphYSno9bGTVVmoZdfVkHvaMYe1kwzioiAv4fpUDH=1Q@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

I got error of IndexError: list index out of range at
the TRAINING_mml-filter-before-wa step.

I had read the post at
https://www.mail-archive.com/moses-support@mit.edu/msg08767.html, however I
still can not figure out what is wrong.

The full error is

general:strategy = Score
general:source_language = fr
general:target_language = en
general:input_stem = /home/mml/mml-test/experiment/training/corpus.1
general:output_stem = /home/mml/mml-test/experiment/training/corpus-mml.1
general:domain_file = /home/mml/mml-test/experiment/model/domains.1
general:domain_file_out =
/home/mml/mml-test/experiment/training/corpus-mml.1
score:score_file = /home/mml/mml-test/experiment/training/corpus-mml-score.1
score:proportion = 0.9

2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137
Traceback (most recent call last):
File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", line
156, in <module>
main()
File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", line
111, in main
strategy = strategy_class(config)
File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py", line
72, in __init__
[float(line[:-1]) for line in open(self.score_file)],
reverse=True)[ignore_count + count]
IndexError: list index out of range

And my ems configuration file has:

#################################################################
# PARALLEL CORPUS PREPARATION:
# create a tokenized, sentence-aligned corpus, ready for training

[CORPUS]

#in-domain parallel corpus
[CORPUS:in]
clean-stem = $training-in-domain-corpus

[CORPUS:out]
#out-domain parallel corpus
clean-stem = $training-out-domain-corpus


#################################################################
# LANGUAGE MODEL TRAINING
[LM]
[LM:lm]
type = 8
lm = $language-model
#################################################################
# MODIFIED MOORE LEWIS FILTERING

[MML]

lm-training = $srilm-dir/ngram-count
lm-settings = "-interpolate -kndiscount -unk"
lm-binarizer = $moses-src-dir/bin/build_binary
lm-query = $moses-src-dir/bin/query
order = 5

### in-/out-of-domain source/target corpora to train the 4 language model
#
# in-domain parallel corpus
indomain-stem = [CORPUS:in:clean-split-stem]

# out-of-domain parallel corpus
outdomain-stem = [CORPUS:out:clean-split-stem]

# settings: number of lines sampled from the corpora to train each language
model on
settings = "--line-count 100000"

#################################################################
# TRANSLATION MODEL TRAINING
[TRAINING]
script = $moses-script-dir/training/train-model.perl
training-options = "-mgiza -mgiza-cpus 12 -sort-buffer-size 16G
-sort-compress gzip -sort-parallel 12 -cores 12"
parallel = yes
alignment-symmetrization-method = grow-diag-final-and
lexicalized-reordering = msd-bidirectional-fe
score-settings = "--GoodTuring"
include-word-alignment-in-rules = yes

#space separated all out-of domain corpora to be filtered
mml-filter-corpora = out
mml-before-wa = "-proportion 0.9"

#####################################################

Thanks.


Jian Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140124/3b1c36ed/attachment-0001.htm

------------------------------

Message: 3
Date: Fri, 24 Jan 2014 09:56:45 -0500
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] filter parallel corpus
To: Saeed Farzi <saeedfarzi@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAAFADDCAmBpOdn9YAeLgZ8-tdHutMOUyLPyG9BXHcuqZxs5FDg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

Moses has an implementation of modified Moore-Lewis filtering
which may be roughly what you want:
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc66

-phi

On Thu, Jan 16, 2014 at 10:43 AM, Saeed Farzi <saeedfarzi@gmail.com> wrote:
> Dear all,
>
> I am working on a translation task with a very large parallel corpus.
> Because of computational cost of training such a parallel corpus, i am
> going to filter it regarding to the test set ( of course , by the
> filtering, the evaluation must be still fair).
>
> I am looking for a solution or a tool for filtering parallel corpus sentences.
>
> Note that i do not need to filter phrase table. I know that the
> filter_ moses tool reduces the phrase table size.
>
> cheers
> --
> S.Farzi, Ph.D. Student
> Natural Language Processing Lab,
> School of Electrical and Computer Eng.,
> Tehran University
> Tel: +9821-6111-9719
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


------------------------------

Message: 4
Date: Fri, 24 Jan 2014 15:51:57 +0000
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] EMS MML IndexError: list index out of
range
To: jian zhang <jianzhang09@gmail.com>, moses-support@mit.edu
Message-ID: <52E28C1D.6060609@staffmail.ed.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Jian

This is a bit suspect:

2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137

Are the scores in this file sensible (or are they all the same?)

/home/mml/mml-test/experiment/training/corpus-mml-score.1

cheers - Barry

On 24/01/14 14:53, jian zhang wrote:
> Hi,
>
> I got error of IndexError: list index out of range at
> the TRAINING_mml-filter-before-wa step.
>
> I had read the post at
> https://www.mail-archive.com/moses-support@mit.edu/msg08767.html,
> however I still can not figure out what is wrong.
>
> The full error is
>
> general:strategy = Score
> general:source_language = fr
> general:target_language = en
> general:input_stem = /home/mml/mml-test/experiment/training/corpus.1
> general:output_stem = /home/mml/mml-test/experiment/training/corpus-mml.1
> general:domain_file = /home/mml/mml-test/experiment/model/domains.1
> general:domain_file_out =
> /home/mml/mml-test/experiment/training/corpus-mml.1
> score:score_file =
> /home/mml/mml-test/experiment/training/corpus-mml-score.1
> score:proportion = 0.9
>
> 2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137
> Traceback (most recent call last):
> File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py",
> line 156, in <module>
> main()
> File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py",
> line 111, in main
> strategy = strategy_class(config)
> File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py",
> line 72, in __init__
> [float(line[:-1]) for line in open(self.score_file)],
> reverse=True)[ignore_count + count]
> IndexError: list index out of range
>
> And my ems configuration file has:
>
> #################################################################
> # PARALLEL CORPUS PREPARATION:
> # create a tokenized, sentence-aligned corpus, ready for training
>
> [CORPUS]
>
> #in-domain parallel corpus
> [CORPUS:in]
> clean-stem = $training-in-domain-corpus
>
> [CORPUS:out]
> #out-domain parallel corpus
> clean-stem = $training-out-domain-corpus
>
>
> #################################################################
> # LANGUAGE MODEL TRAINING
> [LM]
> [LM:lm]
> type = 8
> lm = $language-model
> #################################################################
> # MODIFIED MOORE LEWIS FILTERING
>
> [MML]
>
> lm-training = $srilm-dir/ngram-count
> lm-settings = "-interpolate -kndiscount -unk"
> lm-binarizer = $moses-src-dir/bin/build_binary
> lm-query = $moses-src-dir/bin/query
> order = 5
>
> ### in-/out-of-domain source/target corpora to train the 4 language model
> #
> # in-domain parallel corpus
> indomain-stem = [CORPUS:in:clean-split-stem]
>
> # out-of-domain parallel corpus
> outdomain-stem = [CORPUS:out:clean-split-stem]
>
> # settings: number of lines sampled from the corpora to train each
> language model on
> settings = "--line-count 100000"
>
> #################################################################
> # TRANSLATION MODEL TRAINING
> [TRAINING]
> script = $moses-script-dir/training/train-model.perl
> training-options = "-mgiza -mgiza-cpus 12 -sort-buffer-size 16G
> -sort-compress gzip -sort-parallel 12 -cores 12"
> parallel = yes
> alignment-symmetrization-method = grow-diag-final-and
> lexicalized-reordering = msd-bidirectional-fe
> score-settings = "--GoodTuring"
> include-word-alignment-in-rules = yes
>
> #space separated all out-of domain corpora to be filtered
> mml-filter-corpora = out
> mml-before-wa = "-proportion 0.9"
>
> #####################################################
>
> Thanks.
>
>
> Jian Zhang
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 5
Date: Fri, 24 Jan 2014 17:04:03 +0100
From: "Reinhard Rapp" <reinhardrapp@gmx.de>
Subject: [Moses-support] HyTra deadline extension to Feb. 7 (EACL
Workshop on Hybrid Approaches to Translation)
To: <IRList@lists.shef.ac.uk>, <listmaster@loria.fr>, <ln@cines.fr>,
<lr_egroup@mail.iiit.ac.in>, <moses-support@mit.edu>,
<news@multilingual.com>
Message-ID: <7B2294067970402A8D4D5E21A0C83802@ASUSPC>
Content-Type: text/plain; charset="windows-1252"


*** DEADLINE EXTENSION TO FEBRUARY 7, 2014 ***

=================================================================

THIRD WORKSHOP ON HYBRID APPROACHES TO TRANSLATION (HyTra 2014)

Co-located with EACL 2014 http://eacl2014.org/

Gothenburg, Sweden

April 27, 2014

http://www.upf.edu/glicom/hytra2014.html


INVITED SPEAKERS

Hans Uszkoreit (Saarland University and DFKI, Germany)
Joakim Nivre (Uppsala University, Sweden)

=================================================================


WORKSHOP DESCRIPTION

The aim of the HyTra workshop series is to bring together researchers developing and applying statistical, example-based, or rule-based translation systems, and those enhancing MT systems by combining elements from different approaches, to promote discussion and sharing of ideas among them. Hereby one relevant focus is on effectively combining linguistic and data driven approaches (rule-based and statistical MT). Another focus is on hybridization in the context of human translation.

The 3rd Workshop on Hybrid Approaches to Translation (HyTra-3) intends to continue developing and empowering the research agenda in the area of Hybrid Translation already started at its first and second editions. The previous two editions (see http://www-lium.univ-lemans.fr/esirmt-hytra/ and http://hytra.barcelonamedia.org/hytra2013/) were co-located with EACL 2012 in Avignon and with ACL 2013 in Sofia, and the proceedings were published on the ACL Anthology.


TOPICS

We solicit contributions including but not limited to the following topics:

- ways and techniques of hybridization
- architectures for the rapid development of hybrid MT systems
- applications of hybrid systems
- hybrid systems dealing with under-resourced languages
- hybrid systems dealing with morphologically rich languages
- using linguistic information (morphology, syntax, semantics) to enhance statistical MT
(e.g. with hierarchical or factored models)
- using contextual information to enhance statistical MT
- bootstrapping rule-based systems from corpora
- hybrid methods in spoken language translation
- extraction of dictionaries and other large-scale resources for MT from parallel and comparable corpora
- induction of morphological, grammatical, and translation rules from corpora
- machine learning techniques for hybrid MT
- describing structural mappings between languages (e.g. tree-structures using
synchronous/transduction grammars)
- heuristics for limiting the search space in hybrid MT
- alternative methods for the fair evaluation of the output of different types of MT systems
(e.g. relying on linguistic criteria)
- system combination approaches such as multi-engine MT (parallel) or automatic post-editing (sequential)
- open source tools and free language resources for hybrid MT


SUBMISSIONS

Contributions can be short or long papers. Short paper submissions must describe original and unpublished work without exceeding five pages of content plus one extra page for references. Characteristics of short papers include: a small, focused contribution; work in progress; a negative result; an opinion piece; an interesting application nugget. Long paper submissions must describe substantial, original, completed and unpublished work without exceeding eight pages of content plus two extra pages for references. Submissions will be judged according to the criteria of the main conference (EACL 2014).

Authors are invited to submit papers on original and previously unpublished work. Formatting should be according to EACL 2014 specifications using LaTeX or MS-Word style files, see section "submission format" at http://eacl2014.org/call-for-papers. Reviewing of papers will be double-blind, so the submissions should not reveal the authors' identity.

Submission is electronic in PDF format using the START submission system at the following URL: https://www.softconf.com/eacl2014/HyTra/

Double submission policy: Parallel submission to other meetings or publications is possible but must be immediately notified to the workshop contact person (see below). If accepted, withdrawals are only possible within two days after notification.

For an accepted paper to appear in the proceedings, at least one author must register for the workshop and actually present the paper. The papers will be published in the workshop proceedings which will be made available via the ACL Anthology.


BEST PAPERS

Authors of selected papers will be invited to contribute extended versions of their papers as book chapters for an edited volume on hybrid MT.


IMPORTANT DATES

February 7, 2014: Extended deadline for paper submission
February 20, 2014: Notification of acceptance
March 3, 2014: Camera ready papers due
April 27, 2014: Workshop in Gothenburg


ORGANIZERS

Rafael E. Banchs (Institute for Infocomm Research, Singapore)
Marta R. Costa-jussa (Institute for Infocomm Research, Singapore)
Reinhard Rapp (Universities of Aix-Marseille and Mainz)
Patrik Lambert (Pompeu Fabra University, Barcelona)
Kurt Eberle (Lingenio GmbH, Heidelberg)
Bogdan Babych (University of Leeds)


CONTACT PERSON

Rafael E. Banchs: rembanchs (at) i2r (dot) a-star (dot) edu (dot) sg


PROGRAMME COMMITTEE

* Ahmet Aker, University of Sheffield, UK
* Bogdan Babych, University of Leeds, UK
* Rafael E. Banchs, Institute for Infocomm Research, Singapore
* Alexey Baytin, Yandex, Moscow, Russia
* N?ria Bel, Universitat Pompeu Fabra, Barcelona, Spain
* Pierrette Bouillon, ISSCO/TIM/ETI, University of Geneva, Switzerland
* Michael Carl, Copenhagen Business School, Denmark
* Marta R. Costa-jussa, Institute for Infocomm Research, Singapore
* Oliver Culo, University of Mainz, Germany
* Kurt Eberle, Lingenio GmbH, Heidelberg, Germany
* Andreas Eisele, DGT (European Commission), Luxembourg
* Marcello Federico, Fondazione Bruno Kessler, Trento, Italy
* Christian Federmann, Language Technology Lab, DFKI, Saarbr?cken, Germany
* Jos? A. R. Fonollosa, Universitat Polit?cnica de Catalunya, Barcelona, Spain
* Maxim Khalilov, TAUS, Amsterdam, The Netherlands
* Patrik Lambert, Pompeu Fabra University, Barcelona, Spain
* Udo Kruschwitz, University of Essex, UK
* Yanjun Ma, Baidu Inc., Beijing, China
* Jos? B. Mari?o, Universitat Polit?cnica de Catalunya, Barcelona, Spain
* Bart Mellebeek, University of Amsterdam, The Netherlands
* Hermann Ney, RWTH Aachen, Germany
* Reinhard Rapp, Universities of Aix-Marseille, France, and Mainz, Germany
* Anders S?gaard, University of Copenhagen, Denmark
* Nasredine Semmar, CEA LIST, Fontenay-aux-Roses, France
* Wade Shen, Massachusetts Institute of Technology, Cambridge, USA
* Serge Sharoff, University of Leeds, UK
* George Tambouratzis, Institute for Language and Speech Processing, Athens, Greece
* J?rg Tiedemann, University of Uppsala, Sweden
* Dekai Wu, The Hong Kong University of Science and Technology, Hong Kong, China

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140124/884b711f/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 87, Issue 55
*********************************************

0 Response to "Moses-support Digest, Vol 87, Issue 55"

Post a Comment