Moses-support Digest, Vol 124, Issue 1

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. How to compile moses with -fPIC (Dingyuan Wang)
2. Re: German compound splitter (Rico Sennrich)
3. Re: How to compile moses with -fPIC (Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Wed, 1 Feb 2017 17:27:52 +0800
From: Dingyuan Wang <abcdoyle888@gmail.com>
Subject: [Moses-support] How to compile moses with -fPIC
To: moses-support <moses-support@mit.edu>
Message-ID: <c71e3ca4-aead-4e5e-03e6-7f75c1de4e12@gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear all,

I would like to compile moses on Debian testing, but there is some
linking problem requires me to recompile it with -fPIC:

/usr/bin/ld:
/usr/lib/x86_64-linux-gnu/liblzma.a(liblzma_la-common.o): relocation
R_X86_64_32 against `.rodata.str1.1' can not be used when making a
shared object; recompile with -fPIC

I tried ./bjam cxxflags=-fPIC cflags=-fPIC but it doesn't work. The
actual commands are like:

"g++" -L"/usr/lib" -L"/usr/lib/x86_64-linux-gnu" -Wl,-rpath-link
-Wl,"/usr/lib" -o "mert/sentence-bleu" -Wl,--start-group
"mert/bin/gcc-6.3.0/release/link-static/threading-multi/sentence-bleu.o"
"mert/bin/gcc-6.3.0/release/link-static/threading-multi/libmert_lib.a"
-Wl,-Bstatic -lboost_filesystem -lz -lbz2 -llzma -lm -lxmlrpc_xmltok
-lxmlrpc_xmlparse -lxmlrpc_util -lxmlrpc_server_abyss++
-lxmlrpc_server_abyss -lboost_program_options -lboost_serialization
-lboost_thread -lboost_system -ltcmalloc_minimal -lxmlrpc -lxmlrpc++
-lxmlrpc_abyss -lxmlrpc_server -lxmlrpc_server++ -Wl,-Bdynamic
-lSegFault -lrt -Wl,--end-group -pthread


--
Dingyuan Wang


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20170201/febdc6ac/attachment-0001.bin

------------------------------

Message: 2
Date: Wed, 1 Feb 2017 11:19:51 +0000
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] German compound splitter
To: moses-support@mit.edu
Message-ID: <5408b8a3-b912-35ec-4bb5-57bd1d81242a@gmx.ch>
Content-Type: text/plain; charset="windows-1252"

Hello Tom,

1. no stemming is applied, only splitting - we used it on the target
side for our English->German system, and no information is lost.

2. the truecasing model will make each segment upper-/lowercased
depending on which is more frequent in the training data. with
'-no-truecase', the original case is kept.

3. the exact string depends on whether the word has a "Fugenelement"
like "-n", "-e", "-es", "-s", and "-". Here's an example of how
"Geburtstag" (birthday) is split (if -max-count is high enough):

default: Geburt tag
-write-filler: Geburt @s@ tag
-merge-filler: Geburts@@ tag

if there is no Fugenelement, then yes, @@ is inserted with -write-filler:
Geburttag -> Geburt @@ tag

4. the input should be tokenized, but not lowercased. If you want to
apply lowercasing, you can do this after splitting.

For re-joining the splits for the final system, we simple used a regex
on the filler elements:
sed -r 's/ \@(\S*?)\@ /\1/g' | sed -r 's/\@\@ //g'"

Note that we never tested this on a phrase-based system, and there might
be more spurious reorderings in a phrase-based system than in our
string-to-tree system in which we used this.

best wishes,
Rico

On 01/02/17 01:36, Tom Hoar wrote:
>
> I'm sharing some feedback and asking new question.
>
> I tried the SoMaJo German tokenizer. After considerable work with some
> customers, we concluded it does not work as well for SMT as the
> built-in Moses tokenizer.perl with German. So, back to the drawing board.
>
> Rico, I'm revisiting your hybrid splitter and have some questions.
>
> 1. Are stemmed tokens in the output or only original tokens simply
> split? It seems for SMT support, not stemming is applied. I just
> want to verify because I can not use stemmed output.
>
> 2. I need the split output to be natural cased, i.e. not lower-cased.
> Is this the purpose of the `-no-truecase` argument?
>
> 3. Can you confirm that the `-write-filler` argument marks the split
> using " @@ "?
>
> 4. The command to train a model is simple enough:
>
> `hybrid_compound_splitter.py -train -syntax -corpus INPUT_FILE
> -model MODEL_FILE`
>
> What state is German INPUT_FILE ? i.e. tokenized or not?
> lower-cased or not?
>
> In a separate but similar line, what is the current state of the art
> in using compound-split corpus in the target language and then
> re-joining the splits with proper casing for a final rendering?
>
>
> Thanks!
> Tom
>
>
> On 8/26/2016 9:15 AM, moses-support-request@mit.edu wrote:
>> Date: Thu, 25 Aug 2016 09:05:13 -0700
>> From: Tom Hoar<tahoar@pttools.net>
>> Subject: Re: [Moses-support] German compound splitter
>> To:"moses-support@mit.edu" <moses-support@mit.edu>
>>
>> Thank you, Rico! Looks promising.
>>
>> I found this one on Python's Pypi repository:https://pypi.python.org/pypi/SoMaJo/1.1.2
>>
>> Does anyone have any experience with it?
>>
>> Tom
>>
>>
>>
>> On 8/25/2016 11:01 PM,moses-support-request@mit.edu wrote:
>>
>>> Date: Wed, 24 Aug 2016 17:23:22 +0100
>>> From: Rico Sennrich<rico.sennr...@gmx.ch>
>>> Subject: Re: [Moses-support] German compound splitter
>>> To:moses-support@mit.edu
>>>
>>> Hi Tom,
>>>
>>> I've been using this one for the Edinburgh WMT submission (EN-DE
>>> syntax-based) in the last 3 years:
>>> https://github.com/rsennrich/wmt2014-scripts/blob/master/hybrid_compound_splitter.py
>>>
>>> It implements the hybrid (frequency-based and FST-based) algorithm by
>>> Fritzinger & Fraser 2010: "How to Avoid Burning Ducks: Combining
>>> Linguistic Analysis and Corpus Statistics for German Compound Processing"
>>>
>>> best wishes,
>>> Rico
>>>
>>> On 24 August 2016 at 09:14, Tom Hoar<tahoar@pttools.net> wrote:
>>>
>>>> Does anyone recommend a German compound splitter? I know it's been
>>>> discussed here before. Thanks.
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170201/b7282abb/attachment-0001.html

------------------------------

Message: 3
Date: Wed, 1 Feb 2017 11:26:40 +0000
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] How to compile moses with -fPIC
To: Dingyuan Wang <abcdoyle888@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbirjaqg8cMFRV+DPgqkkfN63SG-qzWFBJpvZ6et9wSgnw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

in the Jamroot file, you can add

requirements += <cxxflags>-fPIC ;


Hieu Hoang
http://moses-smt.org/

On 1 February 2017 at 09:27, Dingyuan Wang <abcdoyle888@gmail.com> wrote:

> Dear all,
>
> I would like to compile moses on Debian testing, but there is some
> linking problem requires me to recompile it with -fPIC:
>
> /usr/bin/ld:
> /usr/lib/x86_64-linux-gnu/liblzma.a(liblzma_la-common.o): relocation
> R_X86_64_32 against `.rodata.str1.1' can not be used when making a
> shared object; recompile with -fPIC
>
> I tried ./bjam cxxflags=-fPIC cflags=-fPIC but it doesn't work. The
> actual commands are like:
>
> "g++" -L"/usr/lib" -L"/usr/lib/x86_64-linux-gnu" -Wl,-rpath-link
> -Wl,"/usr/lib" -o "mert/sentence-bleu" -Wl,--start-group
> "mert/bin/gcc-6.3.0/release/link-static/threading-multi/sentence-bleu.o"
> "mert/bin/gcc-6.3.0/release/link-static/threading-multi/libmert_lib.a"
> -Wl,-Bstatic -lboost_filesystem -lz -lbz2 -llzma -lm -lxmlrpc_xmltok
> -lxmlrpc_xmlparse -lxmlrpc_util -lxmlrpc_server_abyss++
> -lxmlrpc_server_abyss -lboost_program_options -lboost_serialization
> -lboost_thread -lboost_system -ltcmalloc_minimal -lxmlrpc -lxmlrpc++
> -lxmlrpc_abyss -lxmlrpc_server -lxmlrpc_server++ -Wl,-Bdynamic
> -lSegFault -lrt -Wl,--end-group -pthread
>
>
> --
> Dingyuan Wang
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170201/4849331d/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 124, Issue 1
*********************************************

Related Posts :

0 Response to "Moses-support Digest, Vol 124, Issue 1"

Post a Comment