Moses-support Digest, Vol 97, Issue 11

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Help (Maria Marpaung)
2. Re: Warning: Too many arguments while IRSTLM language model
Training (Hieu Hoang)
3. Re: Warning: Too many arguments while IRSTLM language model
Training (Tom Hoar)


----------------------------------------------------------------------

Message: 1
Date: Sun, 9 Nov 2014 12:50:12 +0000 (UTC)
From: Maria Marpaung <maria_marpaung@yahoo.co.id>
Subject: [Moses-support] Help
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<1300475814.89497.1415537412462.JavaMail.yahoo@jws10943.mail.sg3.yahoo.com>

Content-Type: text/plain; charset="utf-8"

Hello please help me,
I want to ask, I have can run the translation using Moses. First, I using 1000 words. next I add a document to 2000 words. But, results generated translation worse than the 1000 document words.
Can you give advice, what should I do?
Best regards,
Maria Marpaung
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141109/51ac0b7a/attachment-0001.htm

------------------------------

Message: 2
Date: Sun, 9 Nov 2014 13:08:26 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Warning: Too many arguments while IRSTLM
language model Training
To: Sovath-MITE-319 <sovath.mite319@rupp.edu.kh>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbiYAZic6HK4OTNss5ENkDzvrWz8cubk0+2wKtpjNjKa5A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

There is no specific Khmer tokenizer in Moses so the tokenizer uses the
english scheme.

Each language tokenizer needs a file in
scripts/share/nonbreaking_prefixes
You should create your own for Khmer. If you do, please share it with us.

If this is still not good enough, you should write your own program to
tokenize Khmer.


On 9 November 2014 02:29, Sovath-MITE-319 <sovath.mite319@rupp.edu.kh>
wrote:

> Dear Mr. Hieu Hoang,
>
> Thank you very much for you quick reply. I can get it works with your tips.
>
> However, i have been working with Khmer Unicode (utf8), i seem to have
> problem with tokenizers which unable me to render not properly.
>
>
> Do you have any tips of how to get moses work with unicode (utf8, i
> means Khmer Unicode).
>
>
> My Best Regards,
>
> Sovath Chhinh
>
> On Tue, Nov 4, 2014 at 1:10 AM, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
> > I think there's differences in different versions of irstlm. Maybe try
> > --text yes
> > --text
> > -text yes
> > -text
> > Also, Moses comes with the script
> > scripts/generic/trainlm-irst2.perl
> > which runs IRSTLM for you. You just need to give it the text file.
> >
> > Also, you might want to look at KenLM's lmplz command, which also
> creates a
> > LM
> >
> > On 30 October 2014 15:19, Sovath-MITE-319 <sovath.mite319@rupp.edu.kh>
> > wrote:
> >>
> >> Dear Sir,
> >>
> >> I am a student from Royal University of Phnom Penh, Cambodia.
> >>
> >> I am under taking Master Degree of Computer Science and my thesis is
> >> working on Paralell Corpus from Khmer to English.
> >>
> >> However, I have no problem with moses installation as well as the other
> >> tools.
> >>
> >> Come to step number 5, i seem to get stuck and can't find any resource
> >> to fix this problem.
> >> I have found one article that has the same problem too,
> >> (http://comments.gmane.org/gmane.comp.nlp.moses.user/9924).
> >> But there seems to have no solution. I am not sure if there is
> >> something that require to configure before processing step number 5.
> >>
> >> PS: Step that i have issue
> >>
> >> mkdir ~/lm
> >> cd ~/lm
> >> ~/irstlm/bin/add-start-end.sh \
> >> < ~/corpus/news-commentary-v8.fr-en.true.en \
> >> > news-commentary-v8.fr-en.sb.en
> >> export IRSTLM=$HOME/irstlm; ~/irstlm/bin/build-lm.sh \
> >> -i news-commentary-v8.fr-en.sb.en \
> >> -t ./tmp -p -s improved-kneser-ney -o news-commentary-v8.fr-en.lm.en
> >> ~/irstlm/bin/compile-lm \
> >> --text yes \
> >> news-commentary-v8.fr-en.lm.en.gz \
> >> news-commentary-v8.fr-en.arpa.en
> >>
> >> Looking forward to hearing from your support.
> >>
> >> Best Regards,
> >> Sovath
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > --
> > Hieu Hoang
> > Research Associate
> > University of Edinburgh
> > http://www.hoang.co.uk/hieu
> >
>
>


--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141109/a8653ebf/attachment-0001.htm

------------------------------

Message: 3
Date: Sun, 09 Nov 2014 20:19:11 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] Warning: Too many arguments while IRSTLM
language model Training
To: moses-support@mit.edu
Message-ID: <545F69CF.2080603@precisiontranslationtools.com>
Content-Type: text/plain; charset="windows-1252"

Are you familiar with the KhmerOS project on Sourceforge.net?

http://sourceforge.net/projects/khmer/?source=directory

At one time, it included an implementation of Moses through our DoMY
distribution. There was parallel corpus and -- if I'm not mistaken --
there was a tokenizer. A quick look at the project shows it's changed.
So, you might have to dig deeper. Let me know if you can't find
anything, and I'll try again.

Tom



On 11/09/2014 08:08 PM, Hieu Hoang wrote:
> There is no specific Khmer tokenizer in Moses so the tokenizer uses
> the english scheme.
>
> Each language tokenizer needs a file in
> scripts/share/nonbreaking_prefixes
> You should create your own for Khmer. If you do, please share it with us.
>
> If this is still not good enough, you should write your own program to
> tokenize Khmer.
>
>
> On 9 November 2014 02:29, Sovath-MITE-319 <sovath.mite319@rupp.edu.kh
> <mailto:sovath.mite319@rupp.edu.kh>> wrote:
>
> Dear Mr. Hieu Hoang,
>
> Thank you very much for you quick reply. I can get it works with
> your tips.
>
> However, i have been working with Khmer Unicode (utf8), i seem to have
> problem with tokenizers which unable me to render not properly.
>
>
> Do you have any tips of how to get moses work with unicode (utf8, i
> means Khmer Unicode).
>
>
> My Best Regards,
>
> Sovath Chhinh
>
> On Tue, Nov 4, 2014 at 1:10 AM, Hieu Hoang <Hieu.Hoang@ed.ac.uk
> <mailto:Hieu.Hoang@ed.ac.uk>> wrote:
> > I think there's differences in different versions of irstlm.
> Maybe try
> > --text yes
> > --text
> > -text yes
> > -text
> > Also, Moses comes with the script
> > scripts/generic/trainlm-irst2.perl
> > which runs IRSTLM for you. You just need to give it the text file.
> >
> > Also, you might want to look at KenLM's lmplz command, which
> also creates a
> > LM
> >
> > On 30 October 2014 15:19, Sovath-MITE-319
> <sovath.mite319@rupp.edu.kh <mailto:sovath.mite319@rupp.edu.kh>>
> > wrote:
> >>
> >> Dear Sir,
> >>
> >> I am a student from Royal University of Phnom Penh, Cambodia.
> >>
> >> I am under taking Master Degree of Computer Science and my
> thesis is
> >> working on Paralell Corpus from Khmer to English.
> >>
> >> However, I have no problem with moses installation as well as
> the other
> >> tools.
> >>
> >> Come to step number 5, i seem to get stuck and can't find any
> resource
> >> to fix this problem.
> >> I have found one article that has the same problem too,
> >> (http://comments.gmane.org/gmane.comp.nlp.moses.user/9924).
> >> But there seems to have no solution. I am not sure if there is
> >> something that require to configure before processing step
> number 5.
> >>
> >> PS: Step that i have issue
> >>
> >> mkdir ~/lm
> >> cd ~/lm
> >> ~/irstlm/bin/add-start-end.sh \
> >> < ~/corpus/news-commentary-v8.fr-en.true.en \
> >> > news-commentary-v8.fr-en.sb.en
> >> export IRSTLM=$HOME/irstlm; ~/irstlm/bin/build-lm.sh \
> >> -i news-commentary-v8.fr-en.sb.en \
> >> -t ./tmp -p -s improved-kneser-ney -o
> news-commentary-v8.fr-en.lm.en
> >> ~/irstlm/bin/compile-lm \
> >> --text yes \
> >> news-commentary-v8.fr-en.lm.en.gz \
> >> news-commentary-v8.fr-en.arpa.en
> >>
> >> Looking forward to hearing from your support.
> >>
> >> Best Regards,
> >> Sovath
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > --
> > Hieu Hoang
> > Research Associate
> > University of Edinburgh
> > http://www.hoang.co.uk/hieu
> >
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141109/977aa128/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 97, Issue 11
*********************************************

0 Response to "Moses-support Digest, Vol 97, Issue 11"

Post a Comment