Moses-support Digest, Vol 101, Issue 75

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Problem with corpus preparation (Abdelfetah Boumerdas)
2. Re: Problem with corpus preparation (Rico Sennrich)

----------------------------------------------------------------------

Message: 1
Date: Sat, 28 Mar 2015 14:26:15 +0100
From: Abdelfetah Boumerdas <aa_boumerdas@esi.dz>
Subject: Re: [Moses-support] Problem with corpus preparation
To: Rico Sennrich <rico.sennrich@gmx.ch>, moses-support@mit.edu
Message-ID:
<CABJLC3cu+TnPZiKi=aToEfyZ+s9QEwFGRbRiKM2OBPL0TsCzrA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Rico,
Thank you so much for your help, the deescape-special-chers.perl code did
the job perfectly and removed all the sepcial xml chars.
Now i have another question, i followed the moses manual and trained moses
on the news commentary corpus and now i have the moses.ini file and before
doing the tuning task i tried to test the trained system with a simple
frensh sentence to transalte it to English, but to do that moses consumed
all the memory i have which caused my laptop to stop responding (i have an
Intel i7-4702MQ processor with 8GB RAM and enough space on disk). so can
you please tell me what was the problem??? do i have to binarise the
translation table ??? or is it normal for the system to consume that much
memory???

Thanks again.
?

2015-03-26 12:47 GMT+01:00 Rico Sennrich <rico.sennrich@gmx.ch>:

> Abdelfetah Boumerdas <aa_boumerdas@...> writes:
>
> >
> >
> >
> >
> > Hi All,
> > i'm trying to build a translation model using moses, and to do that i'm
> using 2 corpora (europarl and the news commentary corpus provided in the
> manual) but when i reached the corpus preparation step i noticed the
> following problem: in the prepared europarl files i find that the
> apostrophe
> (') and the quotation marks are replaced respectively with (') and
> (") but in the second corpus they're still unchanged.
> > can anyone please tell me why?? is it a problem with the files encoding
> (i
> checked and they're both utf8)?? or is it another problem that i don't know
> about???
> > Thanks in advance.
> > --
>
>
> Hi Abdelfetah,
>
> some special characters (<, >, [, ], ", ', |) are reserved because they
> have
> special meaning in the phrase table and/or to support XML input. The
> tokenizer.perl script automatically replaces them with escape sequences,
> and
> the detokenizer unescapes them again. There's also the scripts
> (de)escape-special-chars.perl to go from one to the other without
> (de)tokenizing.
>
> consistency (between corpora and between training and test time) is
> important. Is it possible that you used different versions of the
> tokenizer.perl script? Older versions did not do escaping.
>
> best wishes,
> Rico
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

--
BOUMERDAS Abdelfetah
5?me Ann?e Option Syst?mes Informatiques (SIQ)
Ecole nationale Sup?rieure d'Informatique ESI (ex INI)
BP 68 M Oued Smar 16309 - ALGER
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150328/eaea4fcf/attachment-0001.htm

------------------------------

Message: 2
Date: Sat, 28 Mar 2015 15:44:16 +0000
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] Problem with corpus preparation
To: Abdelfetah Boumerdas <aa_boumerdas@esi.dz>, moses-support@mit.edu
Message-ID: <5516CC50.9070101@gmx.ch>
Content-Type: text/plain; charset="utf-8"

On 28/03/15 13:26, Abdelfetah Boumerdas wrote:
> Hi Rico,
> Thank you so much for your help, the deescape-special-chers.perl code
> did the job perfectly and removed all the sepcial xml chars.
> Now i have another question, i followed the moses manual and trained
> moses on the news commentary corpus and now i have the moses.ini file
> and before doing the tuning task i tried to test the trained system
> with a simple frensh sentence to transalte it to English, but to do
> that moses consumed all the memory i have which caused my laptop to
> stop responding (i have an Intel i7-4702MQ processor with 8GB RAM and
> enough space on disk). so can you please tell me what was the
> problem??? do i have to binarise the translation table ??? or is it
> normal for the system to consume that much memory???
>
> Thanks again.
> ?
Hi Abdelfetah,

it's not uncommon for moses to use more than 8GB of RAM during decoding,
depending on the size of your models. Here are some ways to reduce
memory usage, but you might also want to consider using a computer with
more memory: http://www.statmt.org/moses/?n=Moses.Optimize#ntoc19

best wishes,
Rico

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150328/522d2489/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 101, Issue 75
**********************************************

Moses-support Digest, Vol 101, Issue 75

0 Response to "Moses-support Digest, Vol 101, Issue 75"

Post a Comment