Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Performance issue with Neural LM for English-Hindi SMT
(Raj Dabre)
2. Re: sgm generation for personalized test sets (Vincent Nguyen)
3. Re: Performance issue with Neural LM for English-Hindi SMT
(Rajnath Patel)
4. Problem compiling Moses ( ??? )
5. Re: Problem when compiling moses (Hieu Hoang)
----------------------------------------------------------------------
Message: 1
Date: Mon, 14 Sep 2015 06:18:11 +0000
From: Raj Dabre <prajdabre@gmail.com>
Subject: Re: [Moses-support] Performance issue with Neural LM for
English-Hindi SMT
To: Rajnath Patel <patelrajnath@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAB3gfjBOD8dnirPqohL-jwgAaimKcuxk2w3GWoa9QU7+3HkWrw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi.... I think that you misinterpreted what Rico said.
He said : nplm is used in addition to a back-off LM for best results
What he meant is that nplm is not a backoff but actually an additional LM.
A kenlm which has backoff weights is an example of backoff LM. To sum
up.... Rico says: Use Kenlm as LM0 and NPLM as LM1.
Regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150914/7d3e237a/attachment-0001.html
------------------------------
Message: 2
Date: Mon, 14 Sep 2015 08:55:28 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] sgm generation for personalized test sets
To: moses-support@mit.edu
Message-ID: <55F66F60.40807@neuf.fr>
Content-Type: text/plain; charset=windows-1252; format=flowed
Hi Tom,
If this script is intended exactly and only to generate sgm test/dev
files from txt file then yes it needs to be amended.
1) line breakers except 0A need to be removed prior to the python
execution (byte stream replace)
2) even though XML standard is to replace ' by ' and so on for
others I have noticed that all test/dev sets do not include the xml
codes like '
so waht I did I removed the second string replace in your code.
however I added 2 others replaces in the first sequence : => " "
and   => " "
3) even though this is standard for XML I removed the first 3 lines for
the doc
XML DOCTYPE and MTEVAL
also the last one MTEVAL
all of this to stick to the expected file for test sets.
If you have the chance, you could add 2 options :
- nb = nb of lines you want to take from the file
- selection = either nb first lines or random in the txt file
I am just wondering if there is not another perl script developped by
someone. how were the sets generated to start with ?
cheers,
Vincent
Le 14/09/2015 04:57, Tom Hoar a ?crit :
> Thanks Vincent,
>
> Good catch about Python's Unicode processing. This script uses Python's
> `codecs` library, which treats characters according to their Unicode
> definitions. So, the function fh.splitlines() splits the string into a
> list as expected with traditional ASCII cr/lf sequences. In addition,
> however, it also splits on three Unicode characters. They are:
>
> \u2028 or \xe2\x80\xa8 - line separator; LSEP
> \u2029 or \xe2\x80\xa9 - paragraph separator; PSEP
> \u2063 or \xe2\x81\xa3 - invisible separator; ISEP
>
> We discovered this after contributing this script to Moses. In our
> experience, Asian-language text editors more often create these are
> characters, and European editors typically don't. This means you can end
> up with a line count mis-match between the two languages.
>
> Do you think we should update t this script, or should users be
> responsible for how they handle these cases?
>
>
>
> On 9/13/2015 11:01 PM, moses-support-request@mit.edu wrote:
>> Date: Sun, 13 Sep 2015 10:44:02 +0200
>> From: Vincent Nguyen<vnguyen@neuf.fr>
>> Subject: Re: [Moses-support] sgm generation for personalized test sets
>> To: moses-support<moses-support@mit.edu>
>> Message-ID:<55F53752.9060603@neuf.fr>
>> Content-Type: text/plain; charset=windows-1252; format=flowed
>>
>>
>> in order to use makemteval.py we need to remove 0D and E2 80 A8 from txt
>> files.
>> python handles them as additional line breakers.
>>
>> Le 12/09/2015 22:07, Vincent Nguyen a ?crit :
>>>> Hi,
>>>>
>>>> What script do you guys use to generate sgm sets based on txt file ?
>>>>
>>>> I have tried makemteval.py in contrib
>>>> but there are a few issues.
>>>>
>>>> I think these lines:
>>>> lines =
>>>> [l.replace('"','\"').replace(''','\'').replace('>','>').replace('<','<').replace('&','&')
>>>> for l in filein.read().splitlines()]
>>>> filein.close()
>>>> lines =
>>>> [l.replace('&','&').replace('<','<').replace('>','>').replace('\'',''').replace('\"','"')
>>>> for l in lines]
>>>>
>>>> are not 100% bullet proof.
>>>>
>>>> in the output I still get ' and such
>>>> it does not handle the
>>>> it does not handle the \r\n sequence I think since the output has more
>>>> lines than in the txt file.
>>>>
>>>> Maybe there is another script.
>>>>
>>>> thanks.
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
------------------------------
Message: 3
Date: Mon, 14 Sep 2015 12:31:32 +0530
From: Rajnath Patel <patelrajnath@gmail.com>
Subject: Re: [Moses-support] Performance issue with Neural LM for
English-Hindi SMT
To: Raj Dabre <prajdabre@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAE-r4umQckaFXBsTCoETnDPonFXw7j-1xDZ7hM5RsdyJYQ4SPw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Thanks,
I guess, I misinterpreted Rico's response. :(
I tried the configuration you suggested and got improved results(12.10 to
19.37). But I have used 5-gram arpa LM by Kenlm (using "lmplz" command). Is
that ok? or It should be trained in some other way.
--
Regards
On Mon, Sep 14, 2015 at 11:48 AM, Raj Dabre <prajdabre@gmail.com> wrote:
> Hi.... I think that you misinterpreted what Rico said.
> He said : nplm is used in addition to a back-off LM for best results
>
> What he meant is that nplm is not a backoff but actually an additional LM.
> A kenlm which has backoff weights is an example of backoff LM. To sum
> up.... Rico says: Use Kenlm as LM0 and NPLM as LM1.
>
> Regards.
>
--
Regards:
??? ??? ????/Raj Nath Patel
KBCS dept.
CDAC Mumbai.
http://kbcs.in/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150914/295785d9/attachment-0001.html
------------------------------
Message: 4
Date: Mon, 14 Sep 2015 15:25:15 +0800
From: " ??? " <545308066@qq.com>
Subject: [Moses-support] Problem compiling Moses
To: " moses-support " <moses-support@mit.edu>
Message-ID: <tencent_2A46380914D42DB01CED3FB9@qq.com>
Content-Type: text/plain; charset="gb18030"
Hi,
I compile Moses and get the wrong error.
Command I execute:
./bjam --with-irstlm=/home/daisy/MT/IRSTLM --with-giza=/home/daisy/MT/GIZA/tools -j4
(I am sure the IRSTML and GIZA are well installed).
Could you help me figure out what the problem is?
Thanks,
Daisy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150914/d0137149/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: build.log.gz
Type: application/octet-stream
Size: 2250 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150914/d0137149/attachment-0001.obj
------------------------------
Message: 5
Date: Mon, 14 Sep 2015 11:29:59 +0200
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Problem when compiling moses
To: "Danqing Huang (MSR Student-Person Consulting)"
<v-danhua@microsoft.com>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID: <55F69397.2080105@gmail.com>
Content-Type: text/plain; charset="windows-1252"
there seems to be a casting problem, or a template instantiation problem
with some compilers. It works with my compiler (clang 6.1.0) but I can
believe there may be problem with yours. Do you know what compiler
you're using?
I've slightly changed the problem code:
https://github.com/moses-smt/mosesdecoder/commit/f6853ee37d634b1d841bc7f39cc2d82a1885e90e
Do and git pull and recompile. Let me know if it works for you.
If not, play around with that code, if you get it to work, please give
me back the working code
On 14/09/2015 08:21, Danqing Huang (MSR Student-Person Consulting) wrote:
>
> Hi,
>
> I am using Moses on Ubuntu 12.04.
>
> When I tried to compile the Moses, it failed. I use boost 1.46,
> IRSTLM, and the above tools are well installed.
>
> My command to install Moses: ./bjam ?with-irstlm=/home/MT/IRSTLM ?j4
>
> And the error seems to be ?moses/parameters/ServerOptions.cpp:51:69:
> error: no matching function for call to
> ?Moses::Parameter::SetParameter(size_t&, const char [19], long
> unsigned int) const?? in the log.
>
> ?failed gcc.compile.c++
> moses/bin/gcc-4.6/release/link-static/threading-multi/parameters/ServerOptions.o...?
>
> Any idea how to solve this problem?
>
> Thanks,
>
> Danqing Huang (MSR Student-Person Consulting)
>
--
Hieu Hoang
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150914/6da492d9/attachment.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 107, Issue 36
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 107, Issue 36"
Post a Comment