Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. sending my email to the mailing list (sofiane bouzaher)
2. Re: Polysynthetic languages? (Michael Joyner)
----------------------------------------------------------------------
Message: 1
Date: Sat, 6 Feb 2016 19:40:52 +0100
From: sofiane bouzaher <bouzaher.sofiane@gmail.com>
Subject: [Moses-support] sending my email to the mailing list
To: moses-support@mit.edu
Message-ID:
<CAOVxcXkb8H3TvK_hoJE39m7zh3RUESLyhF4o1-_toajDjDbQ3Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
bouzaher.sofiane@gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160206/4179b42d/attachment-0001.html
------------------------------
Message: 2
Date: Sat, 6 Feb 2016 17:14:05 -0500
From: Michael Joyner <mjoyner@vbservices.net>
Subject: Re: [Moses-support] Polysynthetic languages?
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAAdxTGi7yiQD90R9sKfZ1B7=EGCGv=x_DwJYbOBK=OvACykp8A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
I tried both... very poor results. Cherokee is a bit morphological for
sounds and uses a Syllabary which obscures some of these things to the
machine.
But.. I came up with a hopeful help by creating a small program which does
some simple infix guessing and splitting for the most relevant infixes
(pronouns, benefactive, etc).
In case someone might find what I have done useful:
https://github.com/mjoyner-vbservices-net/CherokeeAffixSplitter
It would be better if I were to take some known valid verb entries and
generate the needed permutations to split against, but, hopefully this will
be enough to help.
On Mon, Feb 1, 2016 at 3:07 PM, Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
wrote:
> Plain tokenized text is good enough. It may even work as a tokenizer(?)
> if none is available. There is no specific notion of "infix themes"
> though. The segmentation is purely frequency-based, no linguistic
> motivation there, but it may just work.
>
> It's easy enough, just run it and take a look at the results. Even it
> looks strange to you it may be worth to do a test training anyway. As I
> said, for Russian->English I get a nice improvement for patent data.
>
> On 01.02.2016 19:30, Michael Joyner wrote:
> > So how does that work?
> >
> > it just takes all the words from the corpus and guesses "infix themes"
> > ? Or do I have to supply pre-tagged data?
> >
> > On Mon, Feb 1, 2016 at 9:04 AM, Rico Sennrich <rico.sennrich@gmx.ch
> > <mailto:rico.sennrich@gmx.ch>> wrote:
> >
> > Hi Mike,
> >
> > here's a link to the tool Marcin mentioned:
> > https://github.com/rsennrich/subword-nmt
> >
> > I haven't tried it on phrase-based MT myself, but feel free to
> > give it a try.
> >
> > You could also try other unsupervised morpheme segmenters like
> > morfessor: https://github.com/aalto-speech/morfessor
> >
> > I don't know if there's any segmentation methods specific for
> > Cherokee.
> >
> > best wishes,
> > Rico
> >
> >
> > On 01.02.2016 13:31, Marcin Junczys-Dowmunt wrote:
> >>
> >> Hi Mike,
> >>
> >> Maybe take a look at Rico's tool for handling unknown words in
> >> neural machine translation. I have been playing around with that
> >> for Russian-English and standard phrase-based SMT with some
> >> success. I am just not sure if your small corpora will be enough
> >> to learn useful segmentations though.
> >>
> >> It's an unsupervised method for word segmentation. For
> >> Russian-English I created a code dictionary of the 100,000
> >> most-frequent segments per language. Unseen tokens will get
> >> segmented. The segmentation is not neccessarily similar to a
> >> linguisticly correct segmentation, though. You will probably want
> >> to try smaller numbers.
> >>
> >> Best,
> >>
> >> Marcin
> >>
> >> W dniu 2016-02-01 14:12, Michael Joyner napisa?(a):
> >>
> >>> I am trying to use Moses with Cherokee using the New Testament
> >>> and Genesis as primary corpus. I am feeding it the WEB, BBE as
> >>> source English texts at the moment.
> >>>
> >>> As Cherokee uses bound pronouns and no articles and has almost
> >>> nil preposition analogues, (these features are mostly verb
> >>> infixes), is there a technique for corpus adjustment that can be
> >>> done to improve the phrase mapping between Cherokee and English?
> >>>
> >>> I am currently doing Cherokee => English.
> >>> Thanks, Mike
> >>> --
> >>>
> >>> WEB: World English Bible (Public Domain)
> >>> BBE: Basic English Bible (Public Domain)
> >>>
> >>> * Learn to the Cherokee language:
> http://jalagigawoni.gnomio.com/
> >>>
> >>>
> >>> _______________________________________________
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > --
> >
> > * Learn to the Cherokee language: http://jalagigawoni.gnomio.com/
> >
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
--
- Learn to the Cherokee language: http://jalagigawoni.gnomio.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160206/599aa1ae/attachment-0001.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 112, Issue 16
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 112, Issue 16"
Post a Comment