Moses-support Digest, Vol 160, Issue 1

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. get paid to help preserve the MT Archive (Matt Post)

----------------------------------------------------------------------

Message: 1
Date: Thu, 6 Feb 2020 15:00:09 -0500
From: Matt Post <post@cs.jhu.edu>
Subject: [Moses-support] get paid to help preserve the MT Archive
To: <moses-support@mit.edu>
Message-ID: <DB62A10F-4B6E-4FB0-8ABA-EFE9CAE08F0A@cs.jhu.edu>
Content-Type: text/plain; charset=utf-8

Hi everyone,

If you?ve been around a while, you are probably aware of how hard it can be to find and cite old MT papers. Many of these can only be found on the MT Archive, which has not been maintained for some years.

https://www.aclweb.org/anthology/
http://www.mt-archive.info

As Director of the ACL Anthology, I am looking for someone to help move the MT Archive into the ACL Anthology. This conversion is a paid position (with funding from IAMT) with a goal completion date of April 15, 2020, so that the results can be demonstrated at EAMT.

I am personally very excited about this conversion project. We?ve put a lot of work into the Anthology over the past year, and all of this could come together very quickly. It is satisfying to watch the ingestions and changes go live, and putting this wealth of data in a place where it can be easily searched, exported, and cited will be immensely satisfying!

If you are interested, please contact me. You can see more information in the job advertisement below.

# Seeking assistance to help in the conversion of the Machine Translation Archive

February 6, 2020

The Association for Computational Linguistics (ACL) is seeking assistance in the task of ingesting the Machine Translation Archive (www.mt-archive.info) into the ACL Anthology (www.aclweb.org/anthology). This job is funded by the International Association for Machine Translation (IAMT) with the goal of preserving and disseminating the wealth of information present in the Archive, much of it which is exclusively there.

## Job Description

The Machine Translation Archive (hereafter, ?Archive?) was created by John Hutchins in 2004 and currently contains about 12,000 entries. All of the archive, including various portals and indexes, is hand-crafted HTML written using Microsoft Word, and all of the papers are stored as PDF files. It is the single most important source of papers about machine translation, with emphasis on historical MT papers.

The main task is to convert the information in the MT Archive into the XML format used by the Anthology. The steps, which will be done in close collaboration with the Anthology Director, are:

? Producing a spreadsheet of conference proceedings and journals in the MT Archive, and obtaining identifiers for each of them from the Anthology team.
? Semi-automatically transforming each of these proceedings into the XML metadata format used by the Anthology. This will only include abstracts when they have already been extracted from the PDFs in the Archive.
? Renaming all the PDFs into the format required by the Anthology.
? Where not already extant, incorporating the conference program into the frontmatter (for example, for AMTA 2008)
? (Time-permitting) Converting the following additional manually-curated metadata from the Archive into a structured object that refers to the new Anthology identifiers.
? Languages and language pairs
? System and project names
? Organizations and Affiliations
? Methods, techniques, applications, and uses

We hope to complete the conversion by April 15, 2020. Hourly salary will be negotiated at time of hiring. Timesheets will be signed and approved by the Anthology Director and paid biweekly from the ACL.

To apply, please send an email to anthology@aclweb.org, with a subject of ?Application for the MT Archive Ingestion Position?. In the body of the email, please provide the following information:

? Personal Information: A curriculum vitae.
? Job Times: When you are able to start working; hours available per week; estimated completion date.
? Qualifications: A paragraph describing your qualifications; an email address for one or two references.
? Plan: A paragraph or two summarizing your intended technical approach.

## Appendix: Detailed Information

### Main XML format

The Anthology repository is open-sourced and is hosted online at https://github.com/acl-org/acl-anthology. The paper metadata for the Anthology is hosted in the data/xml directory, with XML files roughly corresponding to events. For example, the proceedings of ACL 2019 are in data/xml/P19.xml, and look like this:

<?xml version='1.0' encoding='UTF-8'?>
<collection id="P19">
<volume id="1" ingest-date="2019-07-28">
<meta>
<booktitle>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</booktitle>
<url>P19-1</url>
<editor><first>Anna</first><last>Korhonen</last></editor>
<editor><first>David</first><last>Traum</last></editor>
<editor><first>Llu?s</first><last>M?rquez</last></editor>
<publisher>Association for Computational Linguistics</publisher>
<address>Florence, Italy</address>
<month>July</month>
<year>2019</year>
</meta>
<frontmatter>
<url>P19-1000</url>
</frontmatter>
<paper id="1">
<title>One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues</title>
<author><first>Chongyang</first><last>Tao</last></author>
<author><first>Wei</first><last>Wu</last></author>
<author><first>Can</first><last>Xu</last></author>
<author><first>Wenpeng</first><last>Hu</last></author>
<author><first>Dongyan</first><last>Zhao</last></author>
<author><first>Rui</first><last>Yan</last></author>
<pages>1?11</pages>
<abstract>Currently, researchers have paid great attention to retrieval-based dialogues in open-domain. In particular, people study the problem by investigating context-response matching for multi-turn response selection based on publicly recognized benchmark data sets. State-of-the-art methods require a response to interact with each utterance in a context from the beginning, but the interaction is performed in a shallow way. In this work, we let utterance-response interaction go deep by proposing an interaction-over-interaction network (IoI). The model performs matching by stacking multiple interaction blocks in which residual information from one time of interaction initiates the interaction process again. Thus, matching information within an utterance-response pair is extracted from the interaction of the pair in an iterative fashion, and the information flows along the chain of the blocks via representations. Evaluation results on three benchmark data sets indicate!
that IoI can significantly outperform state-of-the-art methods in terms of various matching metrics. Through further analysis, we also unveil how the depth of interaction affects the performance of IoI.</abstract>
<url>P19-1001</url>
<doi>10.18653/v1/P19-1001</doi>
</paper>
</volume>
</collection>

Events are typically assigned a collection identifier, e.g., ?P19?. Within a collection are volumes (e.g., ?1? for main papers, ?2? for a demo session, and so on). Finally, individual papers within a volume are numbered, started at ?1?. For each volume within a collection, there is a top-level <meta> section containing volume information, following by an XML entry for all papers.

### Metadata formats

The Archive contains additional information beyond proceedings volumes, which we would also like converted. This includes the four pages linked to above: Languages and language pairs, System and project names, Organizations and Affiliations, and Methods, techniques, applications, and uses. Each of these sections should be converted into a validating YAML format. For example: ?Index of languages: A?D: publications since 2010? should be converted to look something like the following:

afr-dut:
- [paperid]
- [paperid]

Listing all paper IDs that were originally recorded for that language pair, and so on for other metadata.

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 160, Issue 1
*********************************************

Moses-support Digest, Vol 160, Issue 1

0 Response to "Moses-support Digest, Vol 160, Issue 1"

Post a Comment