Moses-support Digest, Vol 110, Issue 32

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Lexical reordering fails with zlib (Varden Wang)
2. Doubts on Multiple Decoding Paths (Anoop (?????))
3. z-mert (Sarah Schulz)
4. Re: z-mert (Marcin Junczys-Dowmunt)
5. 1st CfP: 2nd Workshop on Natural Language Processing for
Translation Memories (NLP4TM 2016) at LREC 2016 (Carla Parra)

----------------------------------------------------------------------

Message: 1
Date: Thu, 17 Dec 2015 22:11:49 -0800
From: Varden Wang <vardenw@uw.edu>
Subject: Re: [Moses-support] Lexical reordering fails with zlib
To: Matthias Huck <mhuck@inf.ed.ac.uk>
Cc: moses-support@mit.edu
Message-ID:
<CANf2V=vMxdy6Wck73gH_8GuMH-Y9kDeYv2SAL+c01VZaB5B6og@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Thanks! That's a good workaround!

On Thu, Dec 17, 2015 at 2:56 PM, Matthias Huck <mhuck@inf.ed.ac.uk> wrote:
> Hi,
>
> As an addendum:
>
> You can try a manual workaround. Run gunzip on extract.o.sorted.gz and
> do lexical-reordering-score on the resulting plain text file.
>
> It might be inconvenient but would hopefully solve the issue.
>
> Cheers,
> Matthias
>
>
> On Thu, 2015-12-17 at 17:44 +0000, Matthias Huck wrote:
>> Hi,
>>
>> It's a problem that apparently occurs very rarely, and as Guy mentioned,
>> we were so far assuming that it's caused by a zlib bug.
>>
>> However, the zlib bug was (to my knowledge) fixed in zlib v1.2.8.
>> This seems to be the bug fix:
>> https://github.com/madler/zlib/commit/51370f365607fe14a6a7a1a27b3bd29d788f5e5b
>>
>> I've only encountered the issue once (and I'm training systems
>> frequently). When I came across it, I executed the same command with a
>> Moses compile on a different machine that was running an older version
>> of OpenSuse, rather than Ubuntu 12.04. The problem did not exist on the
>> old system.
>>
>> My guess is that it really is a zlib bug, but it would be worrying if
>> switching to zlib v1.2.8 doesn't resolve it.
>>
>> Cheers,
>> Matthias
>>
>>
>> On Thu, 2015-12-17 at 09:18 -0800, Varden Wang wrote:
>> > I seem to be having a very very similar issue. I have the exact same
>> > lib package as Guy (but I upgraded from lib 1.2.3.4). I'm using the
>> > commit SHA e211d752f6bc680094520482f190d0f805405c6c of the
>> > mosesdecoder. The funny thing is that I trained on the very same setup
>> > on different data sets without encountering this problem.
>> >
>> > My error:
>> >
>> > Executing: /usr/local/google/home/varden/MOSES/mosesdecoder/scripts/../bin/lexical-reordering-score
>> > /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/extract.o.sorted.gz
>> > 0.5 /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/reordering-table.
>> > --model "wbe msd wbe-msd-bidirectional-fe"
>> >
>> > Lexical Reordering Scorer
>> >
>> > scores lexical reordering models of several types (hierarchical,
>> > phrase-based and word-based-extraction
>> >
>> > terminate called after throwing an instance of 'util::GZException'
>> >
>> > what(): zlib encountered invalid distances set code -3
>> >
>> > ERROR: Execution of:
>> > /usr/local/google/home/varden/MOSES/mosesdecoder/scripts/../bin/lexical-reordering-score
>> > /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/extract.o.sorted.gz
>> > 0.5 /usr/local/google/home/varden/MOSES/FRENCH_moses_HELP_CONTENT_v1/train/model/reordering-table.
>> > --model "wbe msd wbe-msd-bidirectional-fe"
>> >
>> > died with signal 6, with coredump
>> >
>> > Thanks,
>> >
>> > Varden
>> >
>> > On Mon, Dec 7, 2015 at 9:01 AM, <moses-support-request@mit.edu> wrote:
>> > > Send Moses-support mailing list submissions to
>> > > moses-support@mit.edu
>> > >
>> > > To subscribe or unsubscribe via the World Wide Web, visit
>> > > http://mailman.mit.edu/mailman/listinfo/moses-support
>> > > or, via email, send a message with subject or body 'help' to
>> > > moses-support-request@mit.edu
>> > >
>> > > You can reach the person managing the list at
>> > > moses-support-owner@mit.edu
>> > >
>> > > When replying, please edit your Subject line so it is more specific
>> > > than "Re: Contents of Moses-support digest..."
>> > >
>> > >
>> > > Today's Topics:
>> > >
>> > > 1. Lexical reordering fails with zlib (Guy)
>> > >
>> > >
>> > > ----------------------------------------------------------------------
>> > >
>> > > Message: 1
>> > > Date: Mon, 7 Dec 2015 06:14:47 +0000 (UTC)
>> > > From: Guy <guyarg@yahoo.com>
>> > > Subject: [Moses-support] Lexical reordering fails with zlib
>> > > To: moses-support@mit.edu
>> > > Message-ID: <loom.20151207T071423-100@post.gmane.org>
>> > > Content-Type: text/plain; charset=us-ascii
>> > >
>> > > Hello everyone,
>> > >
>> > > I've just recently started to work with Moses and managed to build a couple
>> > > of models without problems... until now.
>> > >
>> > > I was training a new system and I got this error when executing
>> > > lexical-reordering-score:
>> > >
>> > > ../mosesdecoder/scripts/../bin/lexical-reordering-score
>> > > /local/scratch/train/model/extract.o.sorted.gz 0.5
>> > > /local/scratch/train/model/reordering-table. --model "wbe msd
>> > > wbe-msd-bidirectional-fe"
>> > > Lexical Reordering Scorer
>> > > scores lexical reordering models of several types (hierarchical,
>> > > phrase-based and word-based-extraction
>> > > terminate called after throwing an instance of 'util::GZException'
>> > > what(): util/read_compressed.cc:163 in virtual std::size_t
>> > > util::{anonymous}::GZip::Read(void*, std::size_t, util::ReadCompressed&)
>> > > threw GZException'.
>> > > zlib encountered invalid distances set code -3
>> > > Aborted
>> > >
>> > > I found an old post
>> > > (http://permalink.gmane.org/gmane.comp.nlp.moses.user/10151) saying this was
>> > > due to an apparent bug in zlib 1.2.3.4 on Ubuntu 12.04 and that upgrading to
>> > > zlib 1.2.8 solves the problem. However, I already have zlib 1.2.8 (but on
>> > > Ubuntu 14.04) and I still get this error. In case it helps, package name is
>> > > zlib1g:amd64 version 1:1.2.8.dfsg-1ubuntu).
>> > >
>> > > It's a bit strange that I didn't stumble upon this problem when I trained
>> > > previous systems.
>> > >
>> > > Any ideas on what to do?
>> > >
>> > > Thank you very much,
>> > > Guy
>> > >
>> > >
>> > >
>> > >
>> > > ------------------------------
>> > >
>> > > _______________________________________________
>> > > Moses-support mailing list
>> > > Moses-support@mit.edu
>> > > http://mailman.mit.edu/mailman/listinfo/moses-support
>> > >
>> > >
>> > > End of Moses-support Digest, Vol 110, Issue 16
>> > > **********************************************
>> > _______________________________________________
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>>
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>

------------------------------

Message: 2
Date: Fri, 18 Dec 2015 12:05:48 +0530
From: Anoop (?????) <anoop.kunchukuttan@gmail.com>
Subject: [Moses-support] Doubts on Multiple Decoding Paths
To: moses-support <moses-support@mit.edu>
Message-ID:
<CADXxMYc1yPeaZbB9dD=8JqhmezMQR7g9mBviSpMg7yz5-ZQphg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

I am trying to understand the multiple decoding paths feature in Moses.

The documentation (http://www.statmt.org/moses/?n=Advanced.Models#ntoc7)
describes 3 methods: both, either and union

The following is my understanding of the options. Please let me know if it
is correct:

- With *both* option, the constituent phrases of the target hypothesis
come from both tables (since they are shared) and are scored with both the
tables.
- With *either* option, all the constituent phrases of a target
hypothesis come from a single table, but different hypothesis can use
different tables. Each hypothesis is scored using one table only. I did not
understand the " additional options are collected from the other tables"
bit in the documentation.
- With *union* option, the constituent phrases of a target hypothesis
come from different tables and are scored using scores from all the tables.
Use 0 if the option doesn't appear in some table, unless the
*default-average-others=true* option is used.

Regards,
Anoop.

--
I claim to be a simple individual liable to err like any other fellow
mortal. I own, however, that I have humility enough to confess my errors
and to retrace my steps.

http://flightsofthought.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151218/39b84b12/attachment-0001.html

------------------------------

Message: 3
Date: Fri, 18 Dec 2015 11:09:02 +0100
From: Sarah Schulz <schulzsh@ims.uni-stuttgart.de>
Subject: [Moses-support] z-mert
To: moses-support@mit.edu
Message-ID: <5673DB3E.6050100@ims.uni-stuttgart.de>
Content-Type: text/plain; charset=utf-8

Hi,

I am using z-mert for the first time since I had to implement my own
score for tuning.
But when I try to run it, I get the following error while parse the
param.txt file:

Exception in thread "main" java.util.InputMismatchException
at java.util.Scanner.throwFor(Scanner.java:864)
at java.util.Scanner.next(Scanner.java:1485)
at java.util.Scanner.nextDouble(Scanner.java:2413)
at MertCore.processParamFile(MertCore.java:1537)
at MertCore.initialize(MertCore.java:310)
at MertCore.<init>(MertCore.java:239)
at ZMERT.main(ZMERT.java:44)

My param.txt looks like this:

lm_0 ||| 1.0 Opt 0.5 1.5 0.5 1.5
d_0 ||| 1.0 Opt 0.5 1.5 0.5 1.5
tm_0 ||| 0.3 Opt 0.25 0.75 0.25 0.75
tm_1 ||| 0.2 Opt 0.25 0.75 0.25 0.75
tm_2 ||| 0.2 Opt 0.25 0.75 0.25 0.75
tm_3 ||| 0.3 Opt 0.25 0.75 0.25 0.75
w_0 ||| 0.0 Opt -0.5 0.5 -0.5 0.5
normalization = none

I was wondering if a type cast to double is missing in the code but
before changing the z-mert code, I wanted to make sure I didn't get
anything else wrong.

Does anybody have experience with that?

Cheers,

Sarah

------------------------------

Message: 4
Date: Fri, 18 Dec 2015 11:12:17 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] z-mert
To: moses-support@mit.edu
Message-ID: <5673DC01.7040608@amu.edu.pl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Sarah,
try running the command with

LC_ALL=C java -jar ...

I think the problem is that Java assumes a German locale and expects
floating point number with a comma and not a dot. I spent some time
myself to figure that out while using ZMERT.
Best,
Marcin

On 18.12.2015 11:09, Sarah Schulz wrote:
> Hi,
>
> I am using z-mert for the first time since I had to implement my own
> score for tuning.
> But when I try to run it, I get the following error while parse the
> param.txt file:
>
> Exception in thread "main" java.util.InputMismatchException
> at java.util.Scanner.throwFor(Scanner.java:864)
> at java.util.Scanner.next(Scanner.java:1485)
> at java.util.Scanner.nextDouble(Scanner.java:2413)
> at MertCore.processParamFile(MertCore.java:1537)
> at MertCore.initialize(MertCore.java:310)
> at MertCore.<init>(MertCore.java:239)
> at ZMERT.main(ZMERT.java:44)
>
> My param.txt looks like this:
>
> lm_0 ||| 1.0 Opt 0.5 1.5 0.5 1.5
> d_0 ||| 1.0 Opt 0.5 1.5 0.5 1.5
> tm_0 ||| 0.3 Opt 0.25 0.75 0.25 0.75
> tm_1 ||| 0.2 Opt 0.25 0.75 0.25 0.75
> tm_2 ||| 0.2 Opt 0.25 0.75 0.25 0.75
> tm_3 ||| 0.3 Opt 0.25 0.75 0.25 0.75
> w_0 ||| 0.0 Opt -0.5 0.5 -0.5 0.5
> normalization = none
>
> I was wondering if a type cast to double is missing in the code but
> before changing the z-mert code, I wanted to make sure I didn't get
> anything else wrong.
>
> Does anybody have experience with that?
>
> Cheers,
>
> Sarah
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

Message: 5
Date: Fri, 18 Dec 2015 13:36:56 +0100
From: Carla Parra <carla.parra@hermestrans.com>
Subject: [Moses-support] 1st CfP: 2nd Workshop on Natural Language
Processing for Translation Memories (NLP4TM 2016) at LREC 2016
To: carla.parra@hermestrans.com
Message-ID: <dd3dc08ff2ee0f25189de75c0c54bd00@hermestrans.com>
Content-Type: text/plain; charset="utf-8"

(apologies for cross-posting)

2ND WORKSHOP ON NATURAL LANGUAGE PROCESSING FOR TRANSLATION MEMORIES
(NLP4TM 2016)

http://rgcl.wlv.ac.uk/nlp4tm2016/ [1]

to be held at LREC 2016 (Portoro?, Slovenia), May 28, 2016

Submission deadline: February 10, 2016

1. CALL FOR PAPERS

Translation Memories (TM) are amongst the most used tools by
professional translators, if not the most used. The underlying idea of
TMs is that a translator should benefit as much as possible from
previous translations by being able to retrieve how a similar sentence
was translated before. Moreover, the usage of TMs aims at guaranteeing
that new translations follow the client's specified style and
terminology. Despite the fact that the core idea of these systems relies
on comparing segments (typically of sentence length) from the document
to be translated with segments from previous translations, most of the
existing TM systems hardly use any language processing for this. Instead
of addressing this issue, most of the work on translation memories
focused on improving the user experience by allowing processing of a
variety of document formats, intuitive user interfaces, etc.

The term second generation translation memories has been around for more
than ten years and it promises translation memory software that
integrates linguistic processing in order to improve the translation
process. This linguistic processing can involve tasks such as matching
of subsentential chunks, edit distance operations between syntactic
trees, incorporation of semantic and discourse information in the
matching process. This workshop invites papers presenting second
generation translation memories and related initiatives.

Terminologies, glossaries and ontologies are also very useful for
translation memories, by facilitating the task of the translator and
ensuring a consistent translation. The field of Natural Language
Processing (NLP) has proposed numerous methods for terminology
extraction and ontology extraction. Researchers are encouraged to submit
papers to the workshop which show how these methods are being
successfully applied to Translation Memories. In addition, papers
discussing the integration of Machine Translation and Translation
Memories or studies about automatic building of translation memories
from corpora are also welcomed.

2. TOPICS OF INTEREST

This workshop invites original papers which show how language processing
can help translation memories. Topics of interest include but are not
limited to:

*

Improving matching and retrieval of segments by using morphological,
syntactic, semantic and discourse information
*

Automatic extraction of terminologies and ontologies for translation
memories
*

Integration of named entity recognition and terminologies in matching
and retrieval
*

Using natural language processing for automatic construction of
translation memories
*

Extracting and aligning TM segments from a parallel or comparable corpus

*

Construction of translation memories using the Internet
*

Corpus based studies about the usefulness of TM for specific domains
*

Development of hybrid TM and MT translation systems
*

Study of NLP techniques used by TM tools available in the market
*

Automatic methods for TM cleaning and maintenance

?

3. SHARED TASK

A shared task on cleaning translation memories will be organised. A
training set will be distributed to be used to develop and train the
participants' systems. The testing will be done on 500 segments
distributed during the testing phase.

* TASK: Automatically clean translation memories

*

TRAINING SET: 1,500 TM segments annotated with information on whether
they are a valid translation of each other
*

TEST SET: 500 TM segments
*

LANGUAGE PAIRS:

*

English-Italian
*

English-German
*

English-Spanish

*

RELEASE OF THE TRAINING DATA: end of January 2016

Participants are encouraged to submit working notes of their systems to
be presented during the workshop. More details, including the shared
task schedule will be announced soon in a dedicated Call for
Participation.

4. SUBMISSION INFORMATION

We invite contributions of either long papers (8 pages + 2 references)
which present unpublished original research or short paper/demos of
systems which present work in progress or working systems (4 pages + 2
references). The submissions do not need to be anonymised.

All the papers will have to be submitted in PDF format via the START
system by following this link: https://www.softconf.com/lrec2016/NLP4TM/
[2]

5. Identify, Describe and Share your LRs

As scientific work requires accurate citations of referenced work so as
to
allow the community to understand the whole context and also replicate
the
experiments conducted by other researchers, LREC 2016 endorses the need
to
uniquely Identify LRs through the use of the International Standard
Language
Resource Number (ISLRN, www.islrn.org [3]), a Persistent Unique
Identifier to be
assigned to each Language Resource. The assignment of ISLRNs to LRs
cited in
LREC papers will be offered at submission time.

6. IMPORTANT DATES

*

Submission deadline: 10th February 2016
*

Acceptance notification: 7th March 2016
*

Camera-ready versions: 31st March 2016
*

Workshop date: 28th May 2016

7. ORGANISING COMMITTEE

For the workshop

*

Constantin Orasan, University of Wolverhampton, UK
*

Carla Parra, Hermes, Spain
*

Eduard Barbu, Translated, Italy
*

Marcello Federico, FBK, Italy

For the shared task

*

Eduard Barbu, Translated, Italy
*

Carla Parra, Hermes, Spain
*

Luca Mastrostefano, Translated, Italy
*

Matteo Negri, FBK, Italy
*

Marco Turchi, FBK, Italy
*

Constantin Orasan, University of Wolverhampton, UK

The organisers can be contacted by sending an email to
nlp4tm2016@gmail.com.

8. PROGRAM COMMITTEE

*

Juanjo Arevalillo, Hermes, Spain
*

Yves Champollion, WordFast, France
*

Gloria Corpas, University of Malaga, Spain
*

Maud Ehrmann, EPFL, Switzerland
*

Kevin Flanagan, Swansea University, UK
*

Corina Forascu, University "Al. I. Cuza", Romania
*

Gabriela Gonzalez, eTrad, Argentina
*

Rohit Gupta, University of Wolverhampton, UK
*

Manuel Herranz, Pangeanic, Spain
*

Samuel L?ubli, Autodesk, Switzerland
*

Liangyou Li, DCU, Ireland
*

Qun Liu, DCU, Ireland
*

Ruslan Mitkov, University of Wolverhampton, UK
*

Aleksandros Poulis, Lionbridge, Sweden
*

Gabor Proszeky, Morphologic, Hungary
*

Uwe Reinke, Cologne University of Applied Sciences, Germany
*

Michel Simard, NRC, Canada
*

Mark Shuttleworth, UCL, UK
*

Masao Utiyama, NICT, Japan
*

Mihaela Vela, Saarland University, Germany
*

Andy Way, DCU, Ireland
*

J?rn W?bker, Lilt, USA
*

Marcos Zampieri, Saarland University and DFKI, Germany

--

DR. CARLA PARRA ESCART?N

Especialista en tecnolog?a aplicada - Investigadora Marie Curie -
EXPERT ITN [4]

Applied Technology Engineer - Marie Curie Experienced Researcher -
EXPERT ITN [4]

www.hermestrans.com [5]

(+34) 91 640 7640 (Madrid)

(+34) 95 202 0525 (M?laga)

AVISO LEGAL: Este mensaje est? dirigido ?nicamente a su destinatario.
Contiene informaci?n CONFIDENCIAL sometida a secreto profesional o cuya
divulgaci?n est? prohibida por la ley. Si ha recibido este mensaje por
error, debe saber que su lectura, copia y uso no est?n autorizados. Le
rogamos que nos lo comunique inmediatamente por esta misma v?a y proceda
a su destrucci?n. El correo electr?nico mediante Internet no permite
asegurar la confidencialidad de los mensajes que se transmiten ni su
integridad o correcta recepci?n. Hermes Traducciones y Servicios
Ling??sticos, SL no asume responsabilidad alguna por estas
circunstancias y se reserva el derecho a ejercer las acciones legales
que le correspondan contra todo tercero que acceda de forma ileg?tima al
contenido de este mensaje y al de los archivos en ?l contenidos. Si el
destinatario de este mensaje no consintiera la utilizaci?n del correo
electr?nico por Internet y la grabaci?n de los mensajes, rogamos que lo
ponga en nuestro conocimiento de forma inmediata.

LEGAL NOTICE: This message is only intended for the addressee. It
contains CONFIDENTIAL information protected by professional secrecy.
Dissemination of such information is prohibited by law. If you have
received his message by mistake, please be aware that you are not
authorised to read, copy or use it. Please notify us immediately via
this means and destroy it. E-mail over the Internet does not allow to
ensure the confidentiality, integrity or correct reception of the
messages that are sent. Hermes Traducciones y Servicios Ling??sticos, SL
does not accept liability for these circumstances and reserves the right
to take the legal measures to which it is entitled against any third
party that unlawfully accesses the content of this message and the files
attached here to. If the addressee of this message does not consent to
the use of e-mail via the Internet and to messages being saved, please
notify us on an immediate basis.

Links:
------
[1] http://rgcl.wlv.ac.uk/nlp4tm2016/
[2] https://www.softconf.com/lrec2016/NLP4TM/
[3] http://www.islrn.org
[4] http://expert-itn.eu/
[5] http://www.hermestrans.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151218/f6792aac/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 110, Issue 32
**********************************************

Moses-support Digest, Vol 110, Issue 32

0 Response to "Moses-support Digest, Vol 110, Issue 32"

Post a Comment