Moses-support Digest, Vol 98, Issue 15

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: how to clean the UN corpus (joerg)
2. Re: how to clean the UN corpus (Rajen Chatterjee)


----------------------------------------------------------------------

Message: 1
Date: Mon, 1 Dec 2014 17:51:26 +0100
From: joerg <tiedeman@gmail.com>
Subject: Re: [Moses-support] how to clean the UN corpus
To: emna hkiri <emna.hkiri@gmail.com>
Cc: moses-support@mit.edu
Message-ID: <C539FF55-7080-49E4-BB06-4E1DB853BE19@gmail.com>
Content-Type: text/plain; charset="iso-8859-1"


You could use the word-aligned version (or even the phrase-tables) from OPUS:
http://opus.lingfil.uu.se/MultiUN/wordalign/ar-en/
http://opus.lingfil.uu.se/UN/wordalign/ar-en/

Best,
J?rg

**********************************************************************************
J?rg Tiedemann http://stp.lingfil.uu.se/~joerg/



On Dec 1, 2014, at 4:56 PM, emna hkiri wrote:

>
> Dear Friends thank you a lot for your help before and i hope that you will help me
> again
> i try to build an arabic-english SMT with moses
> but in the training Giza do not do the alignment it is because the corpus UN ar-en is not well cleaned ; in fact this is the problem because they are not parallel ;they have not the same number of lines. i'm working with 2000 directory (2000ar and 2000en). does anyone worked with UN ar-en corpus???
> i want to ask how to make the same number of lines for ar-en in 2000 in order to pass the cleaning step
>
> thank you in advance i hope you will answer my question
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/7a3d3e11/attachment-0001.htm

------------------------------

Message: 2
Date: Mon, 1 Dec 2014 18:04:45 +0100
From: Rajen Chatterjee <rajen.k.chatterjee@gmail.com>
Subject: Re: [Moses-support] how to clean the UN corpus
To: emna hkiri <emna.hkiri@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAC4-+NyeM8QkkFFAoraJsW78RTkmAwL+SeK7-09aL3F1CGFzKg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

If your parallel corpus is not sentence aligned then you may look at some
sentence aligner tool, which can extract parallel sentences with some
confidence.
For eg.Microsoft Bilingual Sentence Aligner
http://research.microsoft.com/en-us/downloads/aafd5dcf-4dcc-49b2-8a22-f7055113e656/


On Mon, Dec 1, 2014 at 4:56 PM, emna hkiri <emna.hkiri@gmail.com> wrote:

>
> Dear Friends thank you a lot for your help before and i hope that you will
> help me
> again
> i try to build an arabic-english SMT with moses
> but in the training Giza do not do the alignment it is because the corpus
> UN ar-en is not well cleaned ; in fact this is the problem because they are
> not parallel ;they have not the same number of lines. i'm working with 2000
> directory (2000ar and 2000en). does anyone worked with UN ar-en corpus???
> i want to ask how to make the same number of lines for ar-en in 2000 in
> order to pass the cleaning step
>
> thank you in advance i hope you will answer my question
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
-Regards,
Rajen Chatterjee.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/a15fcf5c/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 98, Issue 15
*********************************************

0 Response to "Moses-support Digest, Vol 98, Issue 15"

Post a Comment