Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: filter parallel corpus (Amin Farajian)
----------------------------------------------------------------------
Message: 1
Date: Thu, 16 Jan 2014 16:58:11 +0100
From: Amin Farajian <ma.farajian@gmail.com>
Subject: Re: [Moses-support] filter parallel corpus
To: Saeed Farzi <saeedfarzi@gmail.com>
Cc: moses-support <moses-support@mit.edu>, "corpora@uib.no"
<corpora@uib.no>
Message-ID:
<CAA+Df5VkLKgCn8Mq3fAhwaYA75e7FiH7OQV3D5qHSuuPehDTYw@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Dear Saeed,
You can do the data selection using IRSTLM. I think it fits your need. Take
a look at the following link:
http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=Data_selection
It helps you to find the subset of sentences within your large training
corpus that fits better with your test corpus.
Note that it is originally designed for the monolingual scenario. But, If
you want to filter the parallel corpus, you can do the following:
1. add line numbers to the beginning of the lines of the source side of
your training corpus.
2. Do the data selection as is described in the manual
3. Extract the corresponding translations of the selected source lines.
4. Enjoy life
Bests,
Amin
On Thu, Jan 16, 2014 at 4:43 PM, Saeed Farzi <saeedfarzi@gmail.com> wrote:
> Dear all,
>
> I am working on a translation task with a very large parallel corpus.
> Because of computational cost of training such a parallel corpus, i am
> going to filter it regarding to the test set ( of course , by the
> filtering, the evaluation must be still fair).
>
> I am looking for a solution or a tool for filtering parallel corpus
> sentences.
>
> Note that i do not need to filter phrase table. I know that the
> filter_ moses tool reduces the phrase table size.
>
> cheers
> --
> S.Farzi, Ph.D. Student
> Natural Language Processing Lab,
> School of Electrical and Computer Eng.,
> Tehran University
> Tel: +9821-6111-9719
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140116/2ebf8f83/attachment-0001.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 87, Issue 41
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 87, Issue 41"
Post a Comment