Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: apostrophe: detokenization or corpus issue ? (Vincent Nguyen)
2. Compilation problem (Pratik Mehta)
3. Re: Compilation problem (Kenneth Heafield)
----------------------------------------------------------------------
Message: 1
Date: Mon, 14 Mar 2016 21:18:04 +0100
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] apostrophe: detokenization or corpus
issue ?
To: moses-support@mit.edu
Message-ID: <56E71C7C.9060908@neuf.fr>
Content-Type: text/plain; charset="windows-1252"
after a full re-train I confirm what I was saying. For those who need to
use French as one of the language the adjustment is really needed in
normalize-punctuation.perl
Le 14/03/2016 10:01, Vincent Nguyen a ?crit :
>
> I think I found the culprit.
> this is very tricky ..... it's not a detokenizer issue but a
> "normalize-punctuation | tokenizer" issue.
>
> the normalize-punctuation script convert the special apostrophe utf-8
> sequence E2 80 99
> when it is surrounded by [a-z] on both sides.
>
> s/([a-z])?([a-z])/$1\'$2/gi;
> s/([a-z])?([a-z])/$1\'$2/gi;
>
> The problem is that when the apostrophe is followed by a special
> character like ? or ? which are utf-8 sequence C3 A9 or C3 A2
> then it does not work .....
> then the script converts these apostrophes to quotes "
> s/?/\"/g;
> s/?/\"/g;
> s/?/\"/g;
>
> Either we need to correct the [a-z] thing or maybe the last 3
> conversion et convert to the regular ' no matter what.
>
> Hope this is clear.
>
>
>
> Le 10/03/2016 13:00, Philipp Koehn a ?crit :
>> Hi,
>>
>> I do not think that the detokenizer would cause conversion of ' to ".
>> You can check the raw output of the decoder, and see how it is
>> changed by the detokenizer.
>>
>> -phi
>>
>> On Wed, Mar 9, 2016 at 11:44 AM, Vincent Nguyen <vnguyen@neuf.fr
>> <mailto:vnguyen@neuf.fr>> wrote:
>>
>> Hi,
>>
>> I got the following situation:
>>
>> This group age
>> is translated sometimes in:
>> ce groupe d'?ge (correct)
>> ce groupe d" ?ge (incorrect)
>> ce groupe d "?ge (incorrect)
>>
>> I am wondering if this is more a detokenizer issue or a corpus
>> issue, or
>> both.
>>
>> Technically in French, there shouldn't be any space before or
>> after the
>> apostrophe.
>> In the Europarl Corpus, as well as in the News2014 one, there are
>> some
>> instances with a space before or after.
>>
>> Then I have the feeling that the decoder gets a ' with
>> surrounding
>> spaces leading to the detokenizer to transform into "
>>
>> Anyone with a similar issue ?
>>
>> thanks.
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160314/5cd2347c/attachment-0001.html
------------------------------
Message: 2
Date: Tue, 15 Mar 2016 15:16:17 +0530
From: Pratik Mehta <pratikmehta1494@gmail.com>
Subject: [Moses-support] Compilation problem
To: moses-support@mit.edu
Message-ID:
<CADaECBVg8iqfNc9x7wMKt9LJaDrpBON_ayBh=sg1qV=dQDDcng@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hello,
I tried to compile Moses with the following command:
./bjam -j4
The process ended with the following message:
...failed updating 58 targets...
...skipped 69 targets...
...updated 794 targets...
The build failed. If you need support, run:
/usr/bin/bjam -j4 --debug-configuration -d2 |gzip >build.log.gz
I have attached the build.log.gz file here. What went wrong?
--
Regards,
Pratik.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20160315/8733b387/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: build.log.gz
Type: application/x-gzip
Size: 8965 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20160315/8733b387/attachment-0001.gz
------------------------------
Message: 3
Date: Tue, 15 Mar 2016 10:08:49 +0000
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Compilation problem
To: moses-support@mit.edu
Message-ID: <56E7DF31.3050704@kheafield.com>
Content-Type: text/plain; charset=windows-1252
Smells like boost was compiled with a different version of gcc than the
one you're using to compile Moses, which can occasionally cause problems.
On 03/15/2016 09:46 AM, Pratik Mehta wrote:
> Hello,
> I tried to compile Moses with the following command:
> ./bjam -j4
>
> The process ended with the following message:
> ...failed updating 58 targets...
> ...skipped 69 targets...
> ...updated 794 targets...
> The build failed. If you need support, run:
> /usr/bin/bjam -j4 --debug-configuration -d2 |gzip >build.log.gz
>
> I have attached the build.log.gz file here. What went wrong?
>
> --
> Regards,
> Pratik.
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 113, Issue 46
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 113, Issue 46"
Post a Comment