Moses-support Digest, Vol 107, Issue 33

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Performance issue with Neural LM for English-Hindi SMT
(Raj Dabre)
2. Re: Performance issue with Neural LM for English-Hindi SMT
(Rico Sennrich)
3. Re: sgm generation for personalized test sets (Tom Hoar)
4. How to Add dictionary to Moses (Asad A.Malik)


----------------------------------------------------------------------

Message: 1
Date: Mon, 14 Sep 2015 01:56:14 +0900
From: Raj Dabre <prajdabre@gmail.com>
Subject: Re: [Moses-support] Performance issue with Neural LM for
English-Hindi SMT
To: Rajnath Patel <patelrajnath@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAB3gfjCGapWtYTheh6mKHhica7v7d=q81iA5L7jiZS4kKjudfA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,
I have had a similar experience with NPLM.
Do you perhaps have a small corpus?

On Sun, Sep 13, 2015 at 6:51 PM, Rajnath Patel <patelrajnath@gmail.com>
wrote:

> Hi all,
>
> I have tried Neural LM(nplm) with phrase based English-Hindi SMT, but
> translation quality is kind of not good as compared to n-gram LM(scores are
> given below). I have trained LM for 3-gram and 5-gram with default
> setting(as mentioned on statmt.org/moses). Kindly suggest, If some one
> has tried the same English-Hindi SMT and got improved results. What may be
> probable cause of degraded results?
>
> BLEU scores:
> n-gram(5-gram)=24.40
> neural-lm(5-gram)=11.30
> neural-lm(3-gram)=12.10
>
> Thank you.
>
> --
> Regards:
> Raj Nath Patel
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
Raj Dabre.
Doctoral Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150913/7fa15fdd/attachment-0001.html

------------------------------

Message: 2
Date: Sun, 13 Sep 2015 23:19:19 +0100
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] Performance issue with Neural LM for
English-Hindi SMT
To: moses-support@mit.edu
Message-ID: <55F5F667.9030704@gmx.ch>
Content-Type: text/plain; charset="windows-1252"

Hello Raj,

Usually, nplm is used in addition to a back-off LM for best results.
That being said, your results indicate that nplm is performing poorly.
If you have little training data, a smaller vocabulary size and more
training epochs may be appropriate. I would advise to provide a
development set to the nplm training program so that you can track the
training progress, and compare perplexity with back-off models.

best wishes,
Rico

On 13/09/15 10:51, Rajnath Patel wrote:
> Hi all,
>
> I have tried Neural LM(nplm) with phrase based English-Hindi SMT, but
> translation quality is kind of not good as compared to n-gram
> LM(scores are given below). I have trained LM for 3-gram and 5-gram
> with default setting(as mentioned on statmt.org/moses
> <http://statmt.org/moses>). Kindly suggest, If some one has tried the
> same English-Hindi SMT and got improved results. What may be probable
> cause of degraded results?
>
> BLEU scores:
> n-gram(5-gram)=24.40
> neural-lm(5-gram)=11.30
> neural-lm(3-gram)=12.10
>
> Thank you.
>
> --
> Regards:
> Raj Nath Patel
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150913/685d541f/attachment-0001.html

------------------------------

Message: 3
Date: Mon, 14 Sep 2015 09:57:30 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: Re: [Moses-support] sgm generation for personalized test sets
To: moses-support@mit.edu
Message-ID: <55F6379A.1050001@precisiontranslationtools.com>
Content-Type: text/plain; charset=windows-1252; format=flowed

Thanks Vincent,

Good catch about Python's Unicode processing. This script uses Python's
`codecs` library, which treats characters according to their Unicode
definitions. So, the function fh.splitlines() splits the string into a
list as expected with traditional ASCII cr/lf sequences. In addition,
however, it also splits on three Unicode characters. They are:

\u2028 or \xe2\x80\xa8 - line separator; LSEP
\u2029 or \xe2\x80\xa9 - paragraph separator; PSEP
\u2063 or \xe2\x81\xa3 - invisible separator; ISEP

We discovered this after contributing this script to Moses. In our
experience, Asian-language text editors more often create these are
characters, and European editors typically don't. This means you can end
up with a line count mis-match between the two languages.

Do you think we should update t this script, or should users be
responsible for how they handle these cases?



On 9/13/2015 11:01 PM, moses-support-request@mit.edu wrote:
> Date: Sun, 13 Sep 2015 10:44:02 +0200
> From: Vincent Nguyen<vnguyen@neuf.fr>
> Subject: Re: [Moses-support] sgm generation for personalized test sets
> To: moses-support<moses-support@mit.edu>
> Message-ID:<55F53752.9060603@neuf.fr>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
>
> in order to use makemteval.py we need to remove 0D and E2 80 A8 from txt
> files.
> python handles them as additional line breakers.
>
> Le 12/09/2015 22:07, Vincent Nguyen a ?crit :
>> >Hi,
>> >
>> >What script do you guys use to generate sgm sets based on txt file ?
>> >
>> >I have tried makemteval.py in contrib
>> >but there are a few issues.
>> >
>> >I think these lines:
>> >lines =
>> >[l.replace('&quot;','\"').replace('&apos;','\'').replace('&gt;','>').replace('&lt;','<').replace('&amp;','&')
>> >for l in filein.read().splitlines()]
>> >filein.close()
>> >lines =
>> >[l.replace('&','&amp;').replace('<','&lt;').replace('>','&gt;').replace('\'','&apos;').replace('\"','&quot;')
>> >for l in lines]
>> >
>> >are not 100% bullet proof.
>> >
>> >in the output I still get &apos; and such
>> >it does not handle the &nbsp;
>> >it does not handle the \r\n sequence I think since the output has more
>> >lines than in the txt file.
>> >
>> >Maybe there is another script.
>> >
>> >thanks.
>> >
>> >
>> >
>> >_______________________________________________
>> >Moses-support mailing list
>> >Moses-support@mit.edu
>> >http://mailman.mit.edu/mailman/listinfo/moses-support



------------------------------

Message: 4
Date: Mon, 14 Sep 2015 08:23:47 +0500
From: "Asad A.Malik" <asad_12204@yahoo.com>
Subject: [Moses-support] How to Add dictionary to Moses
To: moses-support@mit.edu, Philipp Koehn <phi@jhu.edu>, Hieu Hoang
<Hieu.Hoang@ed.ac.uk>
Message-ID: <367ACBE3-CE88-404D-96F8-9E2E6163F53D@yahoo.com>
Content-Type: text/plain; charset="us-ascii"

Hi All,

I've trained system and as I was having very small corpus some words are not translated into target language. I wanted to know that is it possible to add dictionary in Moses that translate those words which are not translated by SMT.

--

Kind Regards,

Mr. Asad Abdul Malik
Sent from my iPhone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150913/3a7681ac/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 107, Issue 33
**********************************************

0 Response to "Moses-support Digest, Vol 107, Issue 33"

Post a Comment