Moses-support Digest, Vol 86, Issue 78

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Warning during tokenizing Urdu Corpus (Asad)
2. Re: Warning during tokenizing Urdu Corpus (Hieu Hoang)
3. Re: Does Moses support C++11 compilation? (Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Fri, 27 Dec 2013 23:21:47 +0500
From: Asad <asad_12204@yahoo.com>
Subject: Re: [Moses-support] Warning during tokenizing Urdu Corpus
To: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <2AE1C1C1-EF79-4F86-9B57-03CDBB3DFC68@yahoo.com>
Content-Type: text/plain; charset="us-ascii"

And what about truecaser and cleaning??? Will I have to create that also for urdu?

Regards
Asad A.Malik

Sent from my iPod

On Dec 27, 2013, at 9:07 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:

> The output will be tokenized, but probably very badly. If you know Urdu and can create a better tokenizer, please add it to Moses.
>
> You can start by looking at the configuration file for the English tokenizer in
> scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en
> You can copy that and change it specifically for Urdu.
>
>
>
> On 26 December 2013 16:35, Asad A.Malik <asad_12204@yahoo.com> wrote:
> Hi All,
>
> I am trying to develop Urdu SMT using MOSES. I have Urdu parallel corpus and the 1st step in manual is to tokenize the corpus, but when I enter following command:
>
> ~/SMT/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ur < ~/SMT/corpus/training/mycorpus.ur-en.ur > ~/SMT/corpus/mycorpus.ur-en.tok.ur
>
> it gives me warning:
>
> WARNING: No known abbreviations for language 'ur', attempting fall-back to English version...
>
> It also generates the output file but I don't know that this output is tokenized or not
>
>
> Regards
>
> Asad A.Malik
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131227/4b2e1f21/attachment-0001.htm

------------------------------

Message: 2
Date: Fri, 27 Dec 2013 19:10:54 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Warning during tokenizing Urdu Corpus
To: Asad <asad_12204@yahoo.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAEKMkbjVbhDN6i-8r2kUYjqtX+11DRYb4YCg_-tdpXVp-QFoSw@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

nope, just the tokenizer


On 27 December 2013 18:21, Asad <asad_12204@yahoo.com> wrote:

> And what about truecaser and cleaning??? Will I have to create that also
> for urdu?
>
> Regards
> Asad A.Malik
>
> Sent from my iPod
>
> On Dec 27, 2013, at 9:07 PM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
>
> The output will be tokenized, but probably very badly. If you know Urdu
> and can create a better tokenizer, please add it to Moses.
>
> You can start by looking at the configuration file for the English
> tokenizer in
> scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en
> You can copy that and change it specifically for Urdu.
>
>
>
> On 26 December 2013 16:35, Asad A.Malik <asad_12204@yahoo.com> wrote:
>
>> Hi All,
>>
>> I am trying to develop Urdu SMT using MOSES. I have Urdu parallel corpus
>> and the 1st step in manual is to tokenize the corpus, but when I enter
>> following command:
>>
>> ~/SMT/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ur <
>> ~/SMT/corpus/training/mycorpus.ur-en.ur > ~/SMT/corpus/mycorpus.ur-en.tok.ur
>>
>>
>> it gives me warning:
>>
>> WARNING: No known abbreviations for language 'ur', attempting fall-back
>> to English version...
>>
>> It also generates the output file but I don't know that this output is
>> tokenized or not
>>
>>
>> Regards
>>
>> Asad A.Malik
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>


--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131227/f2bcbdef/attachment-0001.htm

------------------------------

Message: 3
Date: Sat, 28 Dec 2013 01:04:44 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Does Moses support C++11 compilation?
To: Li Xiang <lixiang.ict@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbgohODKeTsFxs+x4DjPiWmM5eJ7shWkuSWpforZsu7ZyQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

from stack overflow:

http://stackoverflow.com/questions/2887707/how-to-build-boost-with-c0x-support

http://stackoverflow.com/questions/18452723/change-boost-build-jamfile-for-c11-support

./bjam ... cxxflags=-std=gnu++0x
or

bjam ... cxxflags="-std=c++11"




On 27 December 2013 09:46, Li Xiang <lixiang.ict@gmail.com> wrote:

> Hi,
>
> Does Moses support C++11 compilation?
> Because I want to integrate my code which is base on C++11 into Moses.
> How to modify the bjam config file to compile Moses using C++11?
> Thanks.
>
> --
> Xiang Li
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20131228/89c1b419/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 86, Issue 78
*********************************************

0 Response to "Moses-support Digest, Vol 86, Issue 78"

Post a Comment