Moses-support Digest, Vol 93, Issue 1

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Ugandan languages MT (Adam Lopez)
2. Re: Ugandan languages MT (Chris Dyer)
3. special characters in the cleaned data (Arefeh Kazemi)
4. Re: Ugandan languages MT (Lane Schwartz)
5. CfP: First International Workshop on Computational
Linguistics for Uralic Languages (IWCLUL) (Tommi A Pirinen)


----------------------------------------------------------------------

Message: 1
Date: Mon, 30 Jun 2014 17:38:07 -0400
From: Adam Lopez <alopez@cs.jhu.edu>
Subject: [Moses-support] Ugandan languages MT
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAE-Scgt+CtgTKc_fbiW7z=E0twAPDxUAWAuHU95Fq=JeJ6GSUg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi -- Asking on behalf of a colleague: does anyone know of MT systems and/
or parallel datasets for the languages of Uganda? (Swahili, Luganda, Soga,
Karomojong, Alur, etc.)
-Adam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140630/21ec2960/attachment-0001.htm

------------------------------

Message: 2
Date: Mon, 30 Jun 2014 19:03:59 -0400
From: Chris Dyer <cdyer@cs.cmu.edu>
Subject: Re: [Moses-support] Ugandan languages MT
To: Adam Lopez <alopez@cs.jhu.edu>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAEHEvxN+LZ9WpFwVMyJq89CeAabq14C2zUnpNUjTw9BXh5Qe5g@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

We've put together a small corpus (15k sentences training, 2k each of
dev/test) of Swahili-English which you can get here:
http://demo.clab.cs.cmu.edu/cdyer/gv.sw-en.tar.gz

It's roughly equivalent to the data used for the experiments reported
in this paper:
http://anthology.aclweb.org//D/D13/D13-1174.pdf

On Mon, Jun 30, 2014 at 5:38 PM, Adam Lopez <alopez@cs.jhu.edu> wrote:
> Hi -- Asking on behalf of a colleague: does anyone know of MT systems and/
> or parallel datasets for the languages of Uganda? (Swahili, Luganda, Soga,
> Karomojong, Alur, etc.)
> -Adam
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

Message: 3
Date: Mon, 30 Jun 2014 18:48:45 -0700
From: Arefeh Kazemi <arefeh_kazemi@yahoo.com>
Subject: [Moses-support] special characters in the cleaned data
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<1404179325.39104.YahooMailNeo@web121706.mail.ne1.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi everyone,


I am working with moses and I need to use the cleaned corpus and the alignment file. I need to parse the source sentences in the cleaned corpus and use the alignment file to project the pars tree into the? target sentence. I have some problems with the special characters like &apos; and &quot; in the cleaned data, they cause the parser to work incorrectly. I want to know is it possible to change moses settings to not reproduce these characters in the cleaned data?


Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140630/53c1d3e9/attachment-0001.htm

------------------------------

Message: 4
Date: Tue, 1 Jul 2014 09:54:59 -0400
From: Lane Schwartz <dowobeha@gmail.com>
Subject: Re: [Moses-support] Ugandan languages MT
To: Adam Lopez <alopez@cs.jhu.edu>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CABv3vZk+MbWT3KNhVb3KUw3oXaP02xqpvPJc+YWkabgnWZZbdA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Adam,

I just spoke with a professor from Kenya at an ACL workshop this week.
He was working on a constrained domain system for Swahili. It may have
been rule-based. Contact me for details if you'd like his contact
info.

Lane



On Mon, Jun 30, 2014 at 5:38 PM, Adam Lopez <alopez@cs.jhu.edu> wrote:
> Hi -- Asking on behalf of a colleague: does anyone know of MT systems and/
> or parallel datasets for the languages of Uganda? (Swahili, Luganda, Soga,
> Karomojong, Alur, etc.)
> -Adam
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



--
When a place gets crowded enough to require ID's, social collapse is not
far away. It is time to go elsewhere. The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"


------------------------------

Message: 5
Date: Tue, 1 Jul 2014 15:04:03 +0100
From: Tommi A Pirinen <tommi.pirinen@computing.dcu.ie>
Subject: [Moses-support] CfP: First International Workshop on
Computational Linguistics for Uralic Languages (IWCLUL)
To: moses-support@MIT.EDU
Message-ID: <20140701150403.68ac2571@zmey>
Content-Type: text/plain; charset=UTF-8

[Apologies for multi-posting and duplicates you may receive. AFAIK
there hasn't been much work on SMT's for Uralic languages especially
apart from the national languages so it would be excellent to see
more.]


First International Workshop on Computational Linguistics for Uralic
Languages

[ [1]in English | [2]??-?????? ]

Troms? / Romsa / Tromsa / ??????

16th January, 2015, Troms?, Norway

[3]http://gtweb.uit.no/iwclul2015/

Call for papers

The purpose of the First International Workshop on Computational
Linguistics for Uralic Languages is to bring together researchers
working on computational approaches to working with these languages.
We accept papers and tutorial proposals working on the following
languages: Finnish, Hungarian, Estonian, V?ro, the S?mi languages,
Komi (Zyrian, Permyak), Mordvin (Erzya, Moksha), Mari (Hill, Meadow),
Udmurt, Nenets (Tundra, Forest), Enets, Nganasan, Selkup, Mansi,
Khanty, Veps, Karelian (Olonets), Karelian, Ingrian (Izhorian),
Votic, Livonian, Ludic, and other related languages.

All Uralic languages exhibit rich morphological structure, which
makes processing them challenging for state-of-the-art computational
linguistic approaches, the majority also suffer from a lack of
resources and many are endangered.

Research papers should be original, substantial and unpublished
research, that can describe work-in-progress systems, frameworks,
standards and evaluation schemes. Demos and tutorials will present
systems and standards towards the goal of interoperability and
unification of different projects, applications and research groups .
Appropriate topics include (but are not limited to):
* Parsers, analysers and processing pipelines of Uralic languages
* Lexical databases, electronic dictionaries
* Finished end-user applications aimed at Uralic languages, such as
spelling or grammar checkers, machine translation or speech
processing
* Evaluation methods and gold standards, tagged corpora, treebanks
* Reports on language-independent or unsupervised methods as
applied to Uralic languages
* Surveys and review articles on subjects related to computational
linguistics for one or more Uralic languages
* Any work that aims at combining efforts and reducing duplication
of work
* How to elicit activity from the language community, agitation
campaigns, games with a purpose

To maximise the possibility of reproducibility, replication and
reuse, we particularly encourage submissions which present
free/open-source language resources and make use of free/open-source
software.

One of the aims of this gathering is to avoid unnecessary duplicated
work in field of Uralistics by establishing connections and
interoperability standards between researchers and research groups
working at different sites. We have also identified a serious lack of
gold standards and evaluation metrics for all Uralic languages
including those with national support, any work towards better
resources in these fields will be greatly appreciated. To further
these goals we propose to start discussions on forming an ACL special
interest group (or similar) on Uralistics at the event.

Important Dates

* 1st July 2014: Call-for papers announced
* 1st November 2014: Paper submission deadline
* 1st December 2014: Paper notification
* 14th December 2014: Camera-ready deadline
* 16th January 2015: Workshop held in Troms?

Submission of papers

Language of submission: Submissions should be made in English or
Russian with an optional abstract in Finnish.

Submission format: There are multiple submission types: research
papers and demonstrations and tutorials. Research papers should be
up to 10 pages in length excluding references, the descriptions for
demonstrations and tutorials up to 5 pages. Submissions should be
formatted using LaTeX default article style with b5paper option.
Citations should be managed with bibtex and e.g., unsrt bibliography
style. Linguistic glosses should follow Leipzig glossing rules and
use expex LaTeX package (make sure to update expex regularly as it is
developed actively). Preferred LaTeX version is XeLaTeX and therefore
you should use UTF-8 encoded Unicode in your sources rather than TeX
encoded characters where possible. You will find the workshop
template [4]here. Submissions can be made [5]here using the
EasyChair conference management system.

Conflicts of interest: The reviewing process will not be anonymous,
authors should state in their submission all conflicts of interest
with programme committee. The members of programme committee are
expected to state their conflicts of interest during review bidding.
If the programme committee finds themselves unable to review some of
the submissions, external reviewers may be used.

Double submission: To maximise the impact of work in the field of
computational linguistics for the Uralic languages we are open to the
possibility of double submission, or submission of work which has
been partially published elsewhere. Any double submission should
however be reported to the programme committee at the time of
submission. In the advent of double acceptance the authors should
choose in which venue to publish.

Venue

The workshop will be held at the HSL-fakultetet at UiT The Arctic
University of Norway in Troms?, Norway.

Organisers

* Tommi A. Pirinen, Dublin City University
* Francis M. Tyers, UiT Norgga ?rktala? universitehta
* Trond Trosterud, UiT Norgga ?rktala? universitehta

Programme committee

* ??????? ?????????????, ???????????? ????????????????? ???????????
"?????? ????? ?????????"
* Lars Borin, G?teborgs universitet
* ?????? ??????????? ??????, ????-????? ???????? ??????????? ?????
???? ???????????? ???????????
* Mark Fishel, Tartu ?likool
* Mikel L. Forcada, Universitat d'Alacant
* Mans Hulden, University of Colorado at Boulder
* Heiki-Jaan Kaalep, Tartu ?likool
* Andr?s Kornai, Budapesti M?szaki ?s Gazdas?gtudom?nyi Egyetem
* Krister Lind?n, Helsingin yliopisto
* Tommi A. Pirinen, Dublin City University
* Gab?r Pr?sz?ky, P?zm?ny P?ter Katolikus Egyetem
* Aarne Ranta, Chalmers tekniska h?gskola
* Jack Rueter, Helsingin yliopisto
* Trond Trosterud, UiT Norgga ?rktala? universitehta
* Francis M. Tyers, UiT Norgga ?rktala? universitehta
* Sami Virpioja, Aalto-yliopisto
* Anssi Yli-Jyr?, Helsingin yliopisto
__________________________________________________________________


Viitteet

1. http://gtweb.uit.no/iwclul2015/index.en.html
2. http://gtweb.uit.no/iwclul2015/index.ru.html
3. http://gtweb.uit.no/iwclul2015/
4. http://gtweb.uit.no/iwclul2015/2015-fiwclul.tar.gz
5. https://www.easychair.org/conferences/?conf=iwclul2015


--
Dr Tommi A Pirinen <http://www.computing.dcu.ie/~tpirinen/>.
Computational Linguist, Dublin City University / CNGL, Abu-matran
<http://abumatran.eu>. This official disclaimer may be automatically
inserted, sorry:
<https://iss.servicedesk.dcu.ie/index.php?/News/NewsItem/
View/37/dcu-email-disclaimer-information
>


------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 93, Issue 1
********************************************

0 Response to "Moses-support Digest, Vol 93, Issue 1"

Post a Comment