Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Problems with segmentation mismatch and many unknown
words for Chinese translation (Gideon Wenniger)
2. Re: Importing Eclipse projects (Hieu Hoang)
----------------------------------------------------------------------
Message: 1
Date: Tue, 3 Jun 2014 14:26:12 +0200
From: Gideon Wenniger <gemdbw@hotmail.com>
Subject: Re: [Moses-support] Problems with segmentation mismatch and
many unknown words for Chinese translation
To: Hieu Hoang <hieuhoang@gmail.com>, "Vincentwang0229@hotmail.com"
<vincentwang0229@hotmail.com>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID: <DUB118-W7C4F09CE3731BC4D22B9AD1230@phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"
Hey Vincent and Hieu,
Thanks a lot for your replies.
This falls in line with my own expectation that something in the preprocessing of the data could be not going
right.
But in my case I sort of wrote my own pipeline for the preprocessing of the LDC Honkong Hansards,Laws and News
data, so this is not a problem of Moses itself.
I am actually still working to run some new experiments using the MultiUN data from http://opus.lingfil.uu.se/.
Looking up some of the unknown words from my old test output, it turns out that many of them can be found in the
new segmented MultiUN data. So while I have no results yet, I am hopeful that the segmentation mismatch problem
will be resoleved using the MultiUN data (which is already in simplified Chinese and further preprocessed than the LDC
Honkong data, so less chance of errors in the segmentation/preprocessing to slip in).
If experiments with this data indeed will give "normal" results, then I'll be sure that it is the preparation of the
Honkong Hansards data which goes wrong.
This still does not completely answer the question what step(s) in my (pre)processing of the Chinese data are
insufficient, but I thinks that Vincent's suggestion that it could be these special characters and using the script might
resolve it is a very promising one. I will try this out as soon as I finished my pending experiments with the MultiUn data.
I will update you as soon as I know more.
Cheers.
Gideon
CC: Vincentwang0229@hotmail.com; gemdbw@hotmail.com
From: hieuhoang@gmail.com
Subject: Re: [Moses-support] Problems with segmentation mismatch and many unknown words for Chinese translation
Date: Tue, 3 Jun 2014 01:35:44 +0100
To: Hieu.Hoang@ed.ac.uk
Hey Vincent and Gideon
Did you have any details of how it fails on new Moses but runs on on the old Moses? Or is it speculation? It's really important that I know so I can try and fix it
Sent from my flying horse
On 30 May 2014, at 05:11, Hieu Hoang <Hieu.Hoang@ed.ac.uk> wrote:
was it due to the new version of Moses? It shouldn't be, if this is the cause please tell me urgently
On 30 May 2014 03:47, Vincent_hotmail <Vincentwang0229@hotmail.com> wrote:
Hi Gideon,
I recently also came across the similar on training Chinese-other language pairs. I wonder if you use the latest version of Moses. I firstly use the Stanford, NLPIR or my own segmenter to tokenize the sentences, and then use escape-special-chars.perl < input.seg.zh > out.zh to process some special chars in them. Finally, the problem seems to be solved. But I never come across the same problem when using old version of Moses. If you have not solve it, pls try this one.
Best,VincentMay 30, 2014
----------------Longyue WANG, VincentResearch Assistant @ NLP2CT
Postgraduate @ University of MacauTel: (+853) 8397-8051
Homepage: http://nlp2ct.cis.umac.mo/~vincent/
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140603/33451f62/attachment-0001.htm
------------------------------
Message: 2
Date: Tue, 3 Jun 2014 13:29:06 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Importing Eclipse projects
To: Lars Bungum <lars.bungum@idi.ntnu.no>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbhQ2dYun6ggfwSEgjUv1L-Rrc3TAtNKNdD53BwLR=cvAg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
hi lars
glad you're trying to use the eclipse file. I use it everyday, but no-one
else seems to use it. I can help you. Please follow the instructions below,
don't diverge from it
On 3 June 2014 12:39, Lars Bungum <lars.bungum@idi.ntnu.no> wrote:
> Hi Hieu,
>
> thanks!
>
> It's a lot better now. I'm going to do some feature development (for DA
> purposes), so I would like to get into the code, kind of to get a better
> feel of what I'm working on.
>
> It mostly works now, after adding the directory mosesdecoder/lib to the
> g++ linker includes in Eclipse (or doing it manually at the command line).
>
don't copy or add mosesdecoder/lib to Eclipse. The libraries in there are
compiled by bjam. You need the libraries compiled by Eclipse to be able to
debug it
>
> What remains now are several errors from DALM when building moses itself.
> To me this sounds like a version error, as it seems to pertain to arity
> changes, e.g.
>
> mosesdecoder/moses/LM/DALMWrapper.cpp: In member function ?virtual void
> Moses::LanguageModelDALM::Load()?:
> mosesdecoder/moses/LM/DALMWrapper.cpp:213:48: error: no matching function
> for call to ?DALM::LM::LM(std::string&, DALM::Vocabulary&, DALM::Logger&)?
> mosesdecoder/moses/LM/DALMWrapper.cpp:213:48: note: candidates are:
> mosesdecoder/contrib/other-builds/../../DALM/include/dalm.h:23:4: note:
> DALM::LM::LM(std::string, DALM::Vocabulary&, unsigned char, DALM::Logger&)
> mosesdecoder/contrib/other-builds/../../DALM/include/dalm.h:23:4: note:
> candidate expects 4 arguments, 3 provided
> mosesdecoder/contrib/other-builds/../../DALM/include/dalm.h:22:4: note:
> DALM::LM::LM(std::string, std::string, DALM::Vocabulary&, size_t, unsigned
> int, DALM::Logger&)
>
> I just downloaded the latest DALM version and compiled it. Is this the
> correct one?
>
I use the DALM directly from github
https://github.com/hieuhoang/DALM
I've just updated my own version, clean and compile both DALM, bjam moses,
and Eclipse moses. They all work
>
> //LB
>
>
> On 03. juni 2014 10:49, Hieu Hoang wrote:
>
> Some advise:
> 1. Only use the eclipse build if you intend to debug & change the C++
> code. There's no other reason otherwise
> 2. You must switch your workspace to
> {MOSES}/contrib/other-builds
> 3. There is no flexibility in what external libraries you need to link
> to. You must link to
> i. boost
> ii. DALM
> iii. irstlm
> iv. randlm
> v. srilm
> If you've never used these libraries before, I suggest you use the
> Moses bjam build 1st and make sure the external libs are properly compiled.
> 4. The external libs must be in the root of the moses folder, with
> specific names. I softlink them to my moses folder. For example, this is
> the listing of my moses folder:
> #ls -l
> drwxr-xr-x 3 hieu hieu 4096 Jan 15 11:05 biconcor
> drwxrwxr-x 2 hieu hieu 4096 Jun 2 18:17 bin
> -rwxr-xr-x 1 hieu hieu 780 Jan 15 10:58 bjam
> lrwxrwxrwx 1 hieu hieu 39 Jan 20 16:23 boost ->
> /home/hieu/workspace/boost/boost_1_55_0
> -rw-rw-r-- 1 hieu hieu 119 May 30 09:30 BUILD-INSTRUCTIONS.txt
> drwxr-xr-x 24 hieu hieu 4096 Apr 30 16:50 contrib
> drwxr-xr-x 3 hieu hieu 4096 Jan 15 10:58 cruise-control
> lrwxrwxrwx 1 hieu hieu 33 Jan 20 16:23 DALM ->
> /home/hieu/workspace/github/DALM/
> drwxr-xr-x 2 hieu hieu 4096 Mar 12 19:27 defer
> -rw-r--r-- 1 hieu hieu 3399 May 12 19:43 err
> -rw-r--r-- 1 hieu hieu 2473076 Mar 21 21:17 err.ubuntu13.10
> lrwxrwxrwx 1 hieu hieu 27 Jan 20 16:23 irstlm ->
> /home/hieu/workspace/irstlm
> drwxr-xr-x 5 hieu hieu 4096 Mar 16 15:44 jam-files
> -rw-rw-r-- 1 hieu hieu 6835 Apr 29 16:42 Jamroot
> -rw-r--r-- 1 hieu hieu 5848 Mar 3 14:51 Jamroot~
> drwxrwxr-x 2 hieu hieu 4096 May 30 09:37 lib
> drwxr-xr-x 5 hieu hieu 4096 May 8 17:09 lm
> drwxr-xr-x 6 hieu hieu 4096 May 16 15:01 mert
> drwxr-xr-x 3 hieu hieu 4096 Mar 13 17:27 mingw
> drwxr-xr-x 4 hieu hieu 4096 May 16 14:26 mira
> drwxr-xr-x 5 hieu hieu 4096 Mar 12 19:27 misc
> drwxr-xr-x 7 hieu hieu 12288 May 20 19:52 moses
> drwxr-xr-x 3 hieu hieu 4096 May 20 19:52 moses-chart-cmd
> drwxr-xr-x 3 hieu hieu 4096 May 7 09:53 moses-cmd
> -rw------- 1 hieu hieu 0 May 1 10:24 nohup.out
> -rw-r--r-- 1 hieu hieu 159 Jan 15 10:58 NOTICE
> drwxr-xr-x 3 hieu hieu 4096 Apr 17 19:13 OnDiskPt
> drwxr-xr-x 8 hieu hieu 4096 Mar 16 15:44 phrase-extract
> -rwxr-xr-x 1 hieu hieu 260 Jun 2 18:15 previous.sh
> lrwxrwxrwx 1 hieu hieu 36 Jan 22 16:08 probingPT ->
> /home/hieu/workspace/github/proj4.hh
> lrwxrwxrwx 1 hieu hieu 28 Jan 20 16:29 randlm ->
> /home/hieu/workspace/randlm/
> drwxr-xr-x 3 hieu hieu 4096 May 12 19:16 regression-testing
> drwxr-xr-x 16 hieu hieu 4096 Apr 29 16:42 scripts
> drwxr-xr-x 3 hieu hieu 4096 Apr 17 19:13 search
> lrwxrwxrwx 1 hieu hieu 26 Jan 20 16:23 srilm ->
> /home/hieu/workspace/srilm
> drwxr-xr-x 3 hieu hieu 4096 Jan 15 11:04 symal
> drwxr-xr-x 5 hieu hieu 4096 May 8 17:08 util
>
>
>
> On 3 June 2014 08:44, Lars Bungum <lars.bungum@idi.ntnu.no> wrote:
>
>> Hi,
>>
>> I try to import the Eclipse projects in the contrib/other-builds
>> directory, but I am having some problems. Despite following the steps
>> exactly, the projects fail to build with this error message:
>>
>> /usr/bin/ld: cannot find -lmoses
>>
>> (and more for the other modules I didn't symlink in yet). When I look
>> into the other-builds/moses directory, it is empty. What do I have to
>> do to get this working?
>>
>> //LB
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
>
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140603/51b33a35/attachment.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 92, Issue 8
********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 92, Issue 8"
Post a Comment