Moses-support Digest, Vol 111, Issue 67

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Contributing to documentation (Jonathan Chen)
2. Multilingually Sentence-Aligned Corpora (Graham Neubig)
3. Re: Multilingually Sentence-Aligned Corpora
(Marcin Junczys-Dowmunt)
4. Re: Multilingually Sentence-Aligned Corpora (Lane Schwartz)
5. Re: Multilingually Sentence-Aligned Corpora (Lane Schwartz)


----------------------------------------------------------------------

Message: 1
Date: Fri, 22 Jan 2016 08:54:53 -0600
From: Jonathan Chen <jchen45@hotmail.com>
Subject: [Moses-support] Contributing to documentation
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <BAY182-W10D0AF97548AFECF73DF8EA4C40@phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"

There are some extra whitespace characters in the code snippets at
http://www.statmt.org/moses/?n=Moses.Baseline

(How) can I log in to fix them?

Thanks,
Jonathan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160122/0a7f6091/attachment-0001.html

------------------------------

Message: 2
Date: Fri, 22 Jan 2016 10:26:23 -0500
From: Graham Neubig <neubig@is.naist.jp>
Subject: [Moses-support] Multilingually Sentence-Aligned Corpora
To: "<moses-support@mit.edu>" <moses-support@mit.edu>
Message-ID:
<CADkjOCMtCUdaB1UvKSCsftt+CkTF0A9-ahi-Ebaq9BBf8P1M6Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Moses Mailing List,

This is not directly related to Moses, but I was wondering if there are any
high-quality, multi-lingually sentence aligned corpora available (i.e. 3 or
more languages with aligned sentences). We're aware of the Europarl and
Bible corpora, but Europarl only covers European languages, and the Bible
corpus is quite small in MT terms.

TED and MULTI-UN are options, but as far as I know the data is only
bilingually aligned at the moment, and it can be a bit hard to get a clean
multi-lingual corpus from them. If anyone has any experience with this, or
resource available, I'd love some info.

Thanks in advance,
Graham
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160122/6944773b/attachment-0001.html

------------------------------

Message: 3
Date: Fri, 22 Jan 2016 16:36:52 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Multilingually Sentence-Aligned Corpora
To: moses-support@mit.edu
Message-ID: <56A24C94.3030803@amu.edu.pl>
Content-Type: text/plain; charset="windows-1252"

Hi Graham,
At the UN we are now working to release an official version of our data.
As a bonus to the pair-wise alignment, it will contain a 6-way fully
aligned subcorpus for English, French, Spanish, Russian, Chinese,
Arabic; about 13M segments per language. We are waiting for some LREC
feedback and the official greenlight from UN officials, but that should
be a matter of a couple of weeks now (maybe one, maybe two, maybe four).
Once it is ready I can make an announcement here.
Best,
Marcin

W dniu 22.01.2016 o 16:26, Graham Neubig pisze:
> Dear Moses Mailing List,
>
> This is not directly related to Moses, but I was wondering if there
> are any high-quality, multi-lingually sentence aligned corpora
> available (i.e. 3 or more languages with aligned sentences). We're
> aware of the Europarl and Bible corpora, but Europarl only covers
> European languages, and the Bible corpus is quite small in MT terms.
>
> TED and MULTI-UN are options, but as far as I know the data is only
> bilingually aligned at the moment, and it can be a bit hard to get a
> clean multi-lingual corpus from them. If anyone has any experience
> with this, or resource available, I'd love some info.
>
> Thanks in advance,
> Graham
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160122/740bea46/attachment-0001.html

------------------------------

Message: 4
Date: Fri, 22 Jan 2016 09:55:09 -0600
From: Lane Schwartz <dowobeha@gmail.com>
Subject: Re: [Moses-support] Multilingually Sentence-Aligned Corpora
To: Graham Neubig <neubig@is.naist.jp>
Cc: "<moses-support@mit.edu>" <moses-support@mit.edu>
Message-ID:
<CABv3vZkJJ9anb-avx3F0bxfxHABL2J6vuF547qMJrwf56mW7sQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Graham,

I'm in the process of developing a multi-lingual sentence aligner. I'm
planning to use it on Europarl, which is currently NOT sentence-aligned in
any multil-lingual way.

Lane


On Fri, Jan 22, 2016 at 9:26 AM, Graham Neubig <neubig@is.naist.jp> wrote:

> Dear Moses Mailing List,
>
> This is not directly related to Moses, but I was wondering if there are
> any high-quality, multi-lingually sentence aligned corpora available (i.e.
> 3 or more languages with aligned sentences). We're aware of the Europarl
> and Bible corpora, but Europarl only covers European languages, and the
> Bible corpus is quite small in MT terms.
>
> TED and MULTI-UN are options, but as far as I know the data is only
> bilingually aligned at the moment, and it can be a bit hard to get a clean
> multi-lingual corpus from them. If anyone has any experience with this, or
> resource available, I'd love some info.
>
> Thanks in advance,
> Graham
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
When a place gets crowded enough to require ID's, social collapse is not
far away. It is time to go elsewhere. The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160122/ce6ce2e8/attachment-0001.html

------------------------------

Message: 5
Date: Fri, 22 Jan 2016 09:55:53 -0600
From: Lane Schwartz <dowobeha@gmail.com>
Subject: Re: [Moses-support] Multilingually Sentence-Aligned Corpora
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CABv3vZmSDph3nBgHT8jVG2_w6N5gqGJuOq5CETfAWRCVLNWOww@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Marcin,

That sounds great! Yes, please do make an announcement. I would definitely
make use of such a multi-aligned corpus.

Lane


On Fri, Jan 22, 2016 at 9:36 AM, Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
wrote:

> Hi Graham,
> At the UN we are now working to release an official version of our data.
> As a bonus to the pair-wise alignment, it will contain a 6-way fully
> aligned subcorpus for English, French, Spanish, Russian, Chinese, Arabic;
> about 13M segments per language. We are waiting for some LREC feedback and
> the official greenlight from UN officials, but that should be a matter of a
> couple of weeks now (maybe one, maybe two, maybe four). Once it is ready I
> can make an announcement here.
> Best,
> Marcin
>
> W dniu 22.01.2016 o 16:26, Graham Neubig pisze:
>
> Dear Moses Mailing List,
>
> This is not directly related to Moses, but I was wondering if there are
> any high-quality, multi-lingually sentence aligned corpora available (i.e.
> 3 or more languages with aligned sentences). We're aware of the Europarl
> and Bible corpora, but Europarl only covers European languages, and the
> Bible corpus is quite small in MT terms.
>
> TED and MULTI-UN are options, but as far as I know the data is only
> bilingually aligned at the moment, and it can be a bit hard to get a clean
> multi-lingual corpus from them. If anyone has any experience with this, or
> resource available, I'd love some info.
>
> Thanks in advance,
> Graham
>
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
When a place gets crowded enough to require ID's, social collapse is not
far away. It is time to go elsewhere. The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160122/4c002990/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 111, Issue 67
**********************************************

0 Response to "Moses-support Digest, Vol 111, Issue 67"

Post a Comment