Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: How to compile moses with -fPIC (Hieu Hoang)
2. Re: How to compile moses with -fPIC (Dingyuan Wang)
3. Re: German compound splitter (Tom Hoar)
----------------------------------------------------------------------
Message: 1
Date: Wed, 1 Feb 2017 11:36:40 +0000
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] How to compile moses with -fPIC
To: Dingyuan Wang <abcdoyle888@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbjriF+s2KSdvuBYa3y7G_i5B5iD9zTKGds3Y7GSgBnGhg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
which debian do you use? Will try and replicate your issue in a VM
Hieu Hoang
http://moses-smt.org/
On 1 February 2017 at 11:26, Hieu Hoang <hieuhoang@gmail.com> wrote:
> in the Jamroot file, you can add
>
> requirements += <cxxflags>-fPIC ;
>
>
> Hieu Hoang
> http://moses-smt.org/
>
> On 1 February 2017 at 09:27, Dingyuan Wang <abcdoyle888@gmail.com> wrote:
>
>> Dear all,
>>
>> I would like to compile moses on Debian testing, but there is some
>> linking problem requires me to recompile it with -fPIC:
>>
>> /usr/bin/ld:
>> /usr/lib/x86_64-linux-gnu/liblzma.a(liblzma_la-common.o): relocation
>> R_X86_64_32 against `.rodata.str1.1' can not be used when making a
>> shared object; recompile with -fPIC
>>
>> I tried ./bjam cxxflags=-fPIC cflags=-fPIC but it doesn't work. The
>> actual commands are like:
>>
>> "g++" -L"/usr/lib" -L"/usr/lib/x86_64-linux-gnu" -Wl,-rpath-link
>> -Wl,"/usr/lib" -o "mert/sentence-bleu" -Wl,--start-group
>> "mert/bin/gcc-6.3.0/release/link-static/threading-multi/sentence-bleu.o"
>> "mert/bin/gcc-6.3.0/release/link-static/threading-multi/libmert_lib.a"
>> -Wl,-Bstatic -lboost_filesystem -lz -lbz2 -llzma -lm -lxmlrpc_xmltok
>> -lxmlrpc_xmlparse -lxmlrpc_util -lxmlrpc_server_abyss++
>> -lxmlrpc_server_abyss -lboost_program_options -lboost_serialization
>> -lboost_thread -lboost_system -ltcmalloc_minimal -lxmlrpc -lxmlrpc++
>> -lxmlrpc_abyss -lxmlrpc_server -lxmlrpc_server++ -Wl,-Bdynamic
>> -lSegFault -lrt -Wl,--end-group -pthread
>>
>>
>> --
>> Dingyuan Wang
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170201/0e0d573f/attachment-0001.html
------------------------------
Message: 2
Date: Wed, 1 Feb 2017 20:06:15 +0800
From: Dingyuan Wang <abcdoyle888@gmail.com>
Subject: Re: [Moses-support] How to compile moses with -fPIC
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <76942018-c1e6-8d3f-c8d6-08cd9e869c7f@gmail.com>
Content-Type: text/plain; charset=utf-8
Debian "testing" (aka. "stretch") again.
2017-02-01 19:36, Hieu Hoang:
> which debian do you use? Will try and replicate your issue in a VM
>
> Hieu Hoang
> http://moses-smt.org/
>
> On 1 February 2017 at 11:26, Hieu Hoang <hieuhoang@gmail.com
> <mailto:hieuhoang@gmail.com>> wrote:
>
> in the Jamroot file, you can add
>
> requirements += <cxxflags>-fPIC ;
>
>
> Hieu Hoang
> http://moses-smt.org/
>
> On 1 February 2017 at 09:27, Dingyuan Wang <abcdoyle888@gmail.com
> <mailto:abcdoyle888@gmail.com>> wrote:
>
> Dear all,
>
> I would like to compile moses on Debian testing, but there is some
> linking problem requires me to recompile it with -fPIC:
>
> /usr/bin/ld:
> /usr/lib/x86_64-linux-gnu/liblzma.a(liblzma_la-common.o): relocation
> R_X86_64_32 against `.rodata.str1.1' can not be used when making a
> shared object; recompile with -fPIC
>
> I tried ./bjam cxxflags=-fPIC cflags=-fPIC but it doesn't work. The
> actual commands are like:
>
> "g++" -L"/usr/lib" -L"/usr/lib/x86_64-linux-gnu" -Wl,-rpath-link
> -Wl,"/usr/lib" -o "mert/sentence-bleu" -Wl,--start-group
> "mert/bin/gcc-6.3.0/release/link-static/threading-multi/sentence-bleu.o"
> "mert/bin/gcc-6.3.0/release/link-static/threading-multi/libmert_lib.a"
> -Wl,-Bstatic -lboost_filesystem -lz -lbz2 -llzma -lm
> -lxmlrpc_xmltok
> -lxmlrpc_xmlparse -lxmlrpc_util -lxmlrpc_server_abyss++
> -lxmlrpc_server_abyss -lboost_program_options -lboost_serialization
> -lboost_thread -lboost_system -ltcmalloc_minimal -lxmlrpc -lxmlrpc++
> -lxmlrpc_abyss -lxmlrpc_server -lxmlrpc_server++ -Wl,-Bdynamic
> -lSegFault -lrt -Wl,--end-group -pthread
>
>
> --
> Dingyuan Wang
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
> <http://mailman.mit.edu/mailman/listinfo/moses-support>
>
>
>
--
Dingyuan Wang
------------------------------
Message: 3
Date: Wed, 1 Feb 2017 20:27:23 +0700
From: Tom Hoar <tahoar@pttools.net>
Subject: Re: [Moses-support] German compound splitter
To: moses-support@mit.edu
Message-ID: <baf196e6-24b4-69b5-0034-84b5f1541304@pttools.net>
Content-Type: text/plain; charset="windows-1252"
Thanks, Rico. Very helpful.
I already started walking through the code and gutting what I don't
need. When I'm done, it'll be a pure unsupervised splitter without
syntax or SMOR support, probably similar to the existing
compound-splitter.perl script, but in Python.
I'm refactoring the code so a Splitter() class does all the work. Users
can import the class into other Python scripts. A call to
`Splitter.split_compounds(line)` splits a single line. From a
command-line executable perspective, it'll continue to function like the
existing script with all the same arguments (less the ones supporting
SMOR and syntax). A `-train` argument creates the model. The `-corpus`
argument reads either a path to a UTF-8 file or piped STDIN. The script
iterates lines and prints UTF-8 to STDOUT. There's one difference. This
version makes/reads only raw JSON model files, not JSON saved as an
Python module to import. That extra code seemed unnecessary since json
is a standard library.
I'll return the update to you. If the Moses team wants, I'm happy to
contribute it. I have a few questions about your comments.
RE "with '-no-truecase', the original case is kept," it looks like
"original case" means the original text being split, not the original
text in the model?
RE "-write-filler," I'll have to play with it to see how it works.
RE "we never tested this on a phrase-based system," I'm betting that on
a phrase-based system, any splitting is better than none. Our customers
use their own translation memories (80K to 150K pairs) to create SMT
models for their private work. In non-DE (RU, FR, ES and others)
language use cases, those SMT models create 40% (or more) suggestions
that are exactly correct in double-blind tests. That's more stringent
than edit-distance zero because the translators aren't influenced by
post-editing. The same source text goes to a human and the engine. When
comparing the totally independent results, number of "exactly the same"
translated segments is very high. Note that these results are a
testament to (a) the Moses team for making such a great tool, and (b)
the translators for having superb TMs. Of course, we take a little
credit for making Moses accessible to the customers with Slate Desktop
on Windows and Linux. Sadly, our DE customers experience less than 25%
correct. So like I said, any DE splitting is bound to improve the
results. I'll let you guys know.
Tom
On 2/1/2017 6:26 PM, moses-support-request@mit.edu wrote:
> Date: Wed, 1 Feb 2017 11:19:51 +0000 From: Rico Sennrich
> <rico.sennrich@gmx.ch> Subject: Re: [Moses-support] German compound
> splitter To: moses-support@mit.edu Hello Tom, 1. no stemming is
> applied, only splitting - we used it on the target side for our
> English->German system, and no information is lost. 2. the truecasing
> model will make each segment upper-/lowercased depending on which is
> more frequent in the training data. with '-no-truecase', the original
> case is kept. 3. the exact string depends on whether the word has a
> "Fugenelement" like "-n", "-e", "-es", "-s", and "-". Here's an
> example of how "Geburtstag" (birthday) is split (if -max-count is high
> enough): default: Geburt tag -write-filler: Geburt @s@ tag
> -merge-filler: Geburts@@ tag if there is no Fugenelement, then yes, @@
> is inserted with -write-filler: Geburttag -> Geburt @@ tag 4. the
> input should be tokenized, but not lowercased. If you want to apply
> lowercasing, you can do this after splitting. For re-joining the
> splits for the final system, we simple used a regex on the filler
> elements: sed -r 's/ \@(\S*?)\@ /\1/g' | sed -r 's/\@\@ //g'" Note
> that we never tested this on a phrase-based system, and there might be
> more spurious reorderings in a phrase-based system than in our
> string-to-tree system in which we used this. best wishes, Rico On
> 01/02/17 01:36, Tom Hoar wrote:
>> I'm sharing some feedback and asking new question.
>>
>> I tried the SoMaJo German tokenizer. After considerable work with some
>> customers, we concluded it does not work as well for SMT as the
>> built-in Moses tokenizer.perl with German. So, back to the drawing board.
>>
>> Rico, I'm revisiting your hybrid splitter and have some questions.
>>
>> 1. Are stemmed tokens in the output or only original tokens simply
>> split? It seems for SMT support, not stemming is applied. I just
>> want to verify because I can not use stemmed output.
>>
>> 2. I need the split output to be natural cased, i.e. not lower-cased.
>> Is this the purpose of the `-no-truecase` argument?
>>
>> 3. Can you confirm that the `-write-filler` argument marks the split
>> using " @@ "?
>>
>> 4. The command to train a model is simple enough:
>>
>> `hybrid_compound_splitter.py -train -syntax -corpus INPUT_FILE
>> -model MODEL_FILE`
>>
>> What state is German INPUT_FILE ? i.e. tokenized or not?
>> lower-cased or not?
>>
>> In a separate but similar line, what is the current state of the art
>> in using compound-split corpus in the target language and then
>> re-joining the splits with proper casing for a final rendering?
>>
>>
>> Thanks!
>> Tom
>>
>>
>> On 8/26/2016 9:15 AM,moses-support-request@mit.edu wrote:
>>> Date: Thu, 25 Aug 2016 09:05:13 -0700
>>> From: Tom Hoar<tahoar@pttools.net>
>>> Subject: Re: [Moses-support] German compound splitter
>>> To:"moses-support@mit.edu" <moses-support@mit.edu>
>>>
>>> Thank you, Rico! Looks promising.
>>>
>>> I found this one on Python's Pypi repository:https://pypi.python.org/pypi/SoMaJo/1.1.2
>>>
>>> Does anyone have any experience with it?
>>>
>>> Tom
>>>
>>>
>>>
>>> On 8/25/2016 11:01 PM,moses-support-request@mit.edu wrote:
>>>
>>>> Date: Wed, 24 Aug 2016 17:23:22 +0100
>>>> From: Rico Sennrich<rico.sennr...@gmx.ch>
>>>> Subject: Re: [Moses-support] German compound splitter
>>>> To:moses-support@mit.edu
>>>>
>>>> Hi Tom,
>>>>
>>>> I've been using this one for the Edinburgh WMT submission (EN-DE
>>>> syntax-based) in the last 3 years:
>>>> https://github.com/rsennrich/wmt2014-scripts/blob/master/hybrid_compound_splitter.py
>>>>
>>>> It implements the hybrid (frequency-based and FST-based) algorithm by
>>>> Fritzinger & Fraser 2010: "How to Avoid Burning Ducks: Combining
>>>> Linguistic Analysis and Corpus Statistics for German Compound Processing"
>>>>
>>>> best wishes,
>>>> Rico
>>>>
>>>> On 24 August 2016 at 09:14, Tom Hoar<tahoar@pttools.net> wrote:
>>>>
>>>>> Does anyone recommend a German compound splitter? I know it's been
>>>>> discussed here before. Thanks.
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
--
Best regards,
Tom Hoar
Chief Executive Officer
*/Precision Translation Tools Pte Ltd/*
Singapore/Thailand
Web: www.precisiontranslationtools.com
<http://www.precisiontranslationtools.com>
Thailand Mobile: +66 87 345-1875
Skype call: tahoar <skype:tahoar?call>
Skype chat: tahoar <skype:tahoar>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170201/d840aad3/attachment.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 124, Issue 2
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 124, Issue 2"
Post a Comment