Moses-support Digest, Vol 112, Issue 41

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Incremental Training (Maria Braga)
2. Re: Segmentation Fault (Matthias Huck)
3. WMT 2016 Shared Task on Bilingual Document Alignment - Call
for Participation (Philipp Koehn)
4. Seg Fault when Binarizing Phrase Tables (Jake Ballinger)
5. Re: Seg Fault when Binarizing Phrase Tables
(Marcin Junczys-Dowmunt)


----------------------------------------------------------------------

Message: 1
Date: Mon, 22 Feb 2016 17:02:59 +0000
From: Maria Braga <maria@unbabel.com>
Subject: [Moses-support] Incremental Training
To: moses-support@mit.edu
Message-ID:
<CALA-XXyPpszKWVVmwvaOSpEbWS0G-UHKeePPqiZR24ygDKiN9Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,

I am currently trying to use the incremental training of moses. I am
following the http://www.statmt.org/moses/?n=Advanced.Incremental#ntoc6
tutorial and in step 3 (Build binary files) when I run the following
command:

~/mosesdecoder/bin/mmlex-build corpus en es -o corpus.en-es.lex

it throws the following error:

terminate called after throwing an instance of 'util::Exception'
what(): moses/TranslationModel/UG/mm/mmlex-build.cc:247 in void
Counter::processSentence(tpt::id_type) threw util::Exception because `r >=
check1.size()'.
out of bounds at line 0
Aborted (core dumped)

How can I solve this?

Regards,
--
Maria Braga



--
Maria Braga
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160222/c22cbf32/attachment-0001.html

------------------------------

Message: 2
Date: Mon, 22 Feb 2016 18:36:40 +0000
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] Segmentation Fault
To: Jasneet Sabharwal <jasneet.sabharwal@sfu.ca>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <1456166200.3234.46.camel@portedgar>
Content-Type: text/plain; charset="UTF-8"

Hi,

Which object doesn't exist?

You can just protect access to your cache container with mutexes.
I believe the Model1Feature does something similar.
https://github.com/moses-smt/mosesdecoder/blob/master/moses/FF/Model1Feature.cpp

There might be more beautiful solutions, though. Independent
thread-specific caches would be useful.

Cheers,
Matthias


On Sun, 2016-02-21 at 20:23 -0800, Jasneet Sabharwal wrote:
> Is it possible to cache some data when decoding a source sentence? I
> was trying to use boost's thread_specific_ptr to cache a map which I
> want to update in my evaluation function but when I try to access the
> map
> (https://github.com/KonceptGeek/mosesdecoder/blob/RELEASE-3.0-CombinedFeature-Caching/moses/FF/CoarseBiLM.cpp#L145-L154) I get segmentation fault as the object doesn't exist.
>
> Is there any other way to do some caching?



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 3
Date: Mon, 22 Feb 2016 15:11:19 -0500
From: Philipp Koehn <phi@jhu.edu>
Subject: [Moses-support] WMT 2016 Shared Task on Bilingual Document
Alignment - Call for Participation
To: "moses-support@mit.edu" <moses-support@mit.edu>, "corpora@uib.no"
<CORPORA@uib.no>, Multiple recipients of list <mt_list@nist.gov>
Message-ID:
<CAAFADDCUG9EAEtJQwhTi-6Xihr2jDBmkhxNVutrJwzc0bkZcCA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

WMT 2016 Shared Task on Bilingual Document Alignment

CALL FOR PARTICIPATION

========================================================
WMT 2016 Shared Task on Bilingual Document Alignment
========================================================

Website: http://www.statmt.org/wmt16/bilingual-task.html
At WMT 2016 (collocated with ACL 2016)

Parallel corpora are especially important for statistical machine
translation, but so far the collection of such data within the
academic research community has been ad hoc and limited
in scale. To promote this research problem within we organize
a shared task on aligning bilingual documents from crawled
web sites.

More details can be found below, and on our website:
http://www.statmt.org/wmt16/bilingual-task.html

Important Dates:

Release of training data: February 12, 2016
Release of test data: April 11, 2016
Results submission deadline: May 2, 2016
Paper submission deadline: May 8, 2016
Notification of acceptance: June 5, 2016
Camera-ready deadline: June 22, 2016

=========================
Detailed Task Description
=========================

The task is to align French web pages to English web pages
for a given crawled webdomain (a set of web pages under a fully
qualified domain name - FWDN).

TRAINING DATA:
For the crawled data we provide one file per webdomain in .lett format
adapted from Bitextor. This a plain text format with one line per page.
Each line consists of 6 tab-separated values:

Language ID (e.g. en)
Mime type (always text/html)
Encoding (always charset=utf-8)
URL
HTML in Base64 encoding
Text in Base64 encoding

We make sure that the language id is reliable, at least for the
documents in the train and test pairs. We also ensure that all
known pairs have been crawled and no URLs are missing
from the crawls.

Text extraction was performed using an HTML5 parser. As the
original HTML pages are available, participants are welcome
to implement their own text extraction, for example to remove
boilerplate.

To facilitate use of the .lett files we provide a simple reader
class in Python.

Additionally, we have identified spans of French text for which
we produced English translations using MT. These translations
are not part of the lett files but provided separately.

As part of the training data we provide a set of 1,624 correctly
aligned EN-FR pairs from 49 webdomains. The number of pairs per
webdomain varies between 4 and over 200. All pairs are from within
a single webdomain, possible matches between two different
webdomains, e.g. siemens.de and siemens.com, are not considered
in this task.

Answer keys are given in the format
Source_URL<TAB>Target_URL

TEST SET:
For testing, we will provide additional crawls of new webdomains,
distinct from the ones in the training data in the same format. For
these no known pairs will be provided. Because the full
set of valid document pairs is unknown evaluation we be based
entirely on precision on an annotated subset of correctly aligned
pairs.

Participants are expected to produce a list of possible pairings in
the format of the training data. Each source url may be matched
with at most one target url and visa-versa. Should a URL occur
repeatedly, later occurrences are ignored. We provide an evaluation
script to assess performance during development.

BASELINE:
We provide a simple baseline method based on URL matching.

Training data and baseline method are available at
http://www.statmt.org/wmt16/bilingual-task.html


ORGANIZERS:
Christian Buck, University of Edinburgh
Philipp Koehn, Johns Hopkins University


ACKNOWLEDGMENT:
This shared task received support from a Google Faculty Research Award.


------------------------------

Message: 4
Date: Mon, 22 Feb 2016 20:11:55 -0500
From: Jake Ballinger <ballingerj@allegheny.edu>
Subject: [Moses-support] Seg Fault when Binarizing Phrase Tables
To: moses-support@mit.edu
Message-ID:
<CA+5d++jqPFXpMQHOeT8nXV3=Kb8bXr5_ityfvKPZQ7ECye2W6A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello everyone,

I'm trying to set up the baseline system, as mentioned here
<http://www.statmt.org/moses/?n=Moses.Baseline>, When I try to binarize the
phrase table (see the "Testing" section), I get a segmentation fault.

I used the following command:

~/mosesdecoder/bin/processPhraseTableMin -in
working/train/model/phrase-table.gz -nscores 4 -out
working/binarised-model/phrase-table &> binarize.out


Thank you,
Jake

--
Jake Ballinger
Major: Computer Science
Minors: Chinese, French, Spanish, & Math
443-974-6184
ballingerj@allegheny.edu
Box 582
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160222/e363b646/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: binarize.out
Type: application/octet-stream
Size: 710 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20160222/e363b646/attachment-0001.obj

------------------------------

Message: 5
Date: Tue, 23 Feb 2016 16:06:03 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Seg Fault when Binarizing Phrase Tables
To: moses-support@mit.edu
Message-ID: <56CC755B.5070201@amu.edu.pl>
Content-Type: text/plain; charset="windows-1252"

Hi,
Can you send me the phrase table you are binarizing? It seems to be
small enough.
Best,
Marcin

W dniu 23.02.2016 o 02:11, Jake Ballinger pisze:
> Hello everyone,
>
> I'm trying to set up the baseline system, as mentioned here
> <http://www.statmt.org/moses/?n=Moses.Baseline>, When I try to
> binarize the phrase table (see the "Testing" section), I get a
> segmentation fault.
>
> I used the following command:
>
> ~/mosesdecoder/bin/processPhraseTableMin -in
> working/train/model/phrase-table.gz -nscores 4 -out
> working/binarised-model/phrase-table &> binarize.out
>
>
> Thank you,
> Jake
>
> --
> Jake Ballinger
> Major: Computer Science
> Minors: Chinese, French, Spanish, & Math
> 443-974-6184
> ballingerj@allegheny.edu <mailto:ballingerj@allegheny.edu>
> Box 582
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160223/1dcf57fe/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 112, Issue 41
**********************************************

0 Response to "Moses-support Digest, Vol 112, Issue 41"

Post a Comment