Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Fwd: Moses-support post from galaxyh@gmail.com requires
approval (Hieu Hoang)
2. Re: SMT resources for Indian languages (Rajnath Patel)
----------------------------------------------------------------------
Message: 1
Date: Tue, 25 Nov 2014 12:22:10 +0000
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: [Moses-support] Fwd: Moses-support post from
galaxyh@gmail.com requires approval
To: galaxyh@gmail.com, moses-support <moses-support@mit.edu>
Message-ID: <54747472.6080702@gmail.com>
Content-Type: text/plain; charset="utf-8"
hi steven
please subscribe to the Moses mailing list before posting to it. You can
subscribe here:
http://mailman.mit.edu/mailman/listinfo/moses-support
<http://mailman.mit.edu/mailman/listinfo/moses-support>
Have you tried using hierarchical model yet? This uses the same
algorithms as the syntax model, without needing linguistic information.
It also requires less CPU and memory to run than many syntax models.
Once you manage to run the hierachical model, then you can think about
adding syntax.
You can set up the hierarchical training, tuning and evaluation ssetup
by looking at the difference between these two EMS config files.
http://www.statmt.org/moses/RELEASE-2.1/models/de-en/config.pb.recase
http://www.statmt.org/moses/RELEASE-2.1/models/de-en/config.hiero.recase
-------- Forwarded Message --------
Subject: Moses-support post from galaxyh@gmail.com requires approval
Date: Mon, 24 Nov 2014 07:35:29 -0500
From: moses-support-owner@mit.edu
To: moses-support-owner@mit.edu
As list administrator, your authorization is requested for the
following mailing list posting:
List: Moses-support@mit.edu
From: galaxyh@gmail.com
Subject: How to train a tree-based model?
Reason: Post by non-member to a members-only list
At your convenience, visit:
http://mailman.mit.edu/mailman/admindb/moses-support
to approve or deny the request.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141125/447565fd/attachment-0001.htm
-------------- next part --------------
An embedded message was scrubbed...
From: Yu-chun Huang <galaxyh@gmail.com>
Subject: How to train a tree-based model?
Date: Mon, 24 Nov 2014 20:35:07 +0800
Size: 7582
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141125/447565fd/attachment-0002.eml
-------------- next part --------------
An embedded message was scrubbed...
From: moses-support-request@mit.edu
Subject: confirm 18297c7e545476a7a6159a13db23880b7aa4719f
Date: no date
Size: 631
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141125/447565fd/attachment-0003.eml
------------------------------
Message: 2
Date: Tue, 25 Nov 2014 19:02:17 +0530
From: Rajnath Patel <patelrajnath@gmail.com>
Subject: Re: [Moses-support] SMT resources for Indian languages
To: moses-support@mit.edu
Message-ID:
<CAE-r4umSipi+TvW=Fb-_74WrZwUT_qf7GqTd+Aj6OYjS1rsVtw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Very useful. Adding some more resources, available at -
http://kbcs.in/tools.html
On Tue, Nov 25, 2014 at 4:33 PM, <moses-support-request@mit.edu> wrote:
> Send Moses-support mailing list submissions to
> moses-support@mit.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mailman.mit.edu/mailman/listinfo/moses-support
> or, via email, send a message with subject or body 'help' to
> moses-support-request@mit.edu
>
> You can reach the person managing the list at
> moses-support-owner@mit.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Moses-support digest..."
>
>
> Today's Topics:
>
> 1. SMT resources for Indian languages (Anoop (?????))
> 2. Re: (no subject) (Hieu Hoang)
> 3. CFP EAMT 2015: 18th Annual Conference of the European
> Association for Machine Translation (Felipe S?nchez Mart?nez)
> 4. Re: Too large language models - how to handle that? (Hoang Cuong)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 25 Nov 2014 07:59:46 +0530
> From: Anoop (?????) <anoop.kunchukuttan@gmail.com>
> Subject: [Moses-support] SMT resources for Indian languages
> To: moses-support@mit.edu
> Message-ID:
> <
> CADXxMYdi98xs8kz6w8c0oEVZyGb9_FaxVB02bL9+-Wto9zzDgA@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Sharing a few SMT resources for Indian languages.
>
> Center For Indian Language Technology <http://www.cfilt.iitb.ac.in>, IIT
> Bombay has hosted Shata-Anuvaadak (100 Translators), a Statisitical Machine
> Translation system for Indian languages. It currently supports translation
> between 11 Indian languages:
>
>
> - Indo-Aryan languages: Hindi, Urdu, Bengali, Gujarati, Punjabi,
> Marathi, Konkani
> - Dravidian languages: Tamil, Telugu, Malayalam
> - English
>
>
> It is a Phrase-Based MT system with pre-processing and post-processing
> extensions. The pre-processing includes source-side reordering for English
> to Indian language translation. The post-processing includes
> transliteration between Indian languages for OOV words. The system can be
> accessed at:
>
> http://www.cfilt.iitb.ac.in/indic-translator
>
> For more details, see the following publication:
>
> Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee, Ritesh Shah, Pushpak
> Bhattacharyya. 2014. * Shata-Anuvadak: Tackling Multiway Translation of
> Indian Languages* . Language and Resources and Evaluation Conference *(LREC
> 2014)*. 2014.
>
> We are also making available software and resources developed in the Center
> for the system and for ongoing research. These are available under an open
> source license for research use. These include:
>
> *Software*
>
> - Indian Language, NLP tools: Common NLP tools for Indian languages that
> are useful for machine translation. Unicode Normalizers, Tokenizers,
> Morphology-analysers and Transliteration systems.
> - Source Side Reodering system for SMT
> - A simple experiment management system for Moses
>
> *Resources*
>
> - Translation Models for Phrase based SMT systems all language pairs in
> Shata-anuvaadak
> - Language Models for all language in Shata-anuvaadak
> - Transliteration models for some language pairs (Moses-based)
>
> You can access these resources at:
>
> http://www.cfilt.iitb.ac.in/static/download.html
>
> Regards,
> Anoop.
>
> http://www.cse.iitb.ac.in/~anoopk
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://mailman.mit.edu/mailman/private/moses-support/attachments/20141125/63ea2e27/attachment-0001.htm
>
> ------------------------------
>
> Message: 2
> Date: Tue, 25 Nov 2014 09:10:06 +0000
> From: Hieu Hoang <hieuhoang@gmail.com>
> Subject: Re: [Moses-support] (no subject)
> To: Daramola Olaife <d3ripleo@gmail.com>, moses-support@mit.edu,
> user-irstlm@list.fbk.eu
> Message-ID: <5474476E.5090702@gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> I'm getting a different error when compiling irstlm5.80.06 with the
> latest moses from github.
> moses/LM/IRST.cpp:60:21: error: invalid use of incomplete type
> ?class lmContainer?
> if (m_lmtb) m_lmtb->reset_mmap();
>
> Using irstlm5.80.03 works fine
> http://sourceforge.net/projects/irstlm/files/irstlm/irstlm-5.80/
>
>
> On 24/11/14 12:50, Daramola Olaife wrote:
> > After installing irstlm, I tried linking it to moses with
> > ./bjam --with-irstlm=/home/olaife/irstlm-5.80.06 -j8
> > but it was giving me error.
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://mailman.mit.edu/mailman/private/moses-support/attachments/20141125/d2ea373d/attachment-0001.htm
>
> ------------------------------
>
> Message: 3
> Date: Tue, 25 Nov 2014 10:12:27 +0100
> From: Felipe S?nchez Mart?nez <fsanchez@dlsi.ua.es>
> Subject: [Moses-support] CFP EAMT 2015: 18th Annual Conference of the
> European Association for Machine Translation
> To: mt-list@eamt.org, moses-support <moses-support@mit.edu>,
> corpora@uib.no, elsnet-list@elsnet.org
> Cc: "awa >> Andy Way" <away@computing.dcu.ie>, "Mikel L. Forcada"
> <mlf@dlsi.ua.es>
> Message-ID: <547447FB.5060209@dlsi.ua.es>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
>
> Apologies for cross-posting.
> -----------------------------------------------------------
>
> *18th Annual Conference of the European Association for Machine
> Translation (EAMT 2015; Antalya, Turkey)*
>
> The European Association for Machine Translation
> (EAMT,http://www.eamt.org) invites everyone interested in machine
> translation, translation-related tools and resources to participate in
> this conference ? developers, researchers, users, professional
> translators and translation/localisation managers: anyone who has a
> stake in the vision of an information world in which language barriers
> and issues become less visible to the information consumer. We
> especially invite researchers to describe the state of the art and
> demonstrate their cutting-edge results, and professional MT users to
> share their experiences.
>
> EAMT 2015, the 18th Annual Conference of the European Association for
> Machine Translation, will be held in Antalya, Turkey from 11 to 13 May
> 2015.
>
> We expect to receive manuscripts in these three categories:
>
> ------------------------------------
> Research papers
> ------------------------------------
> Long-paper submissions (8 pages) are invited for reports of significant
> research results in any aspect of machine translation and related areas.
> Such reports should include a substantial evaluation component, or have
> a strong theoretical and/or methodological contribution where results
> and in-depth evaluations may not be appropriate. Papers are welcome on
> all topics in the area of Machine Translation or translation-related
> technologies, including:
>
> * Speech translation: speech to text, speech to speech
> * Translation aids (translation memory, terminology databases, etc.)
> * Translation environments (workflow, support tools, conversion tools
> for lexica, etc.)
> * Practical MT systems (MT for professionals, MT for multilingual
> eCommerce, MT for localization, etc.)
> * MT in multilingual public service (eGovernment etc.)
> * MT for the web
> * MT embedded in other services
> * MT evaluation techniques and evaluation results
> * Dictionaries and lexica for MT
> * Text and speech corpora for MT
> * Standards in text and lexicon encoding for MT
> * Human factors in MT and user interfaces
> * Related multilingual technologies (natural language generation,
> information retrieval, text categorization, text summarization,
> information extraction, etc.)
>
> Papers should describe original work. They should emphasize completed
> work rather than intended work, and should indicate clearly the state of
> completion of the reported results. Where appropriate, concrete
> evaluation results should be included.
>
> ------------------------------------
> User studies
> ------------------------------------
> Short-paper submissions (2-4 pages) are invited for reports on users'
> experiences with MT, be it in small or medium size business (SMB),
> enterprise, government, or NGOs. Contributions are welcome on:
>
> * Integrating MT and computer-assisted translation into a translation
> production workflow (e.g. transforming terminology glossaries into MT
> resources, optimizing TM/MT thresholds, mixing online and offline tools,
> using interactive MT, dealing with MT confidence scores);
> * Use of MT to improve translation or localization workflows (e.g.
> reducing turnaround times, improving translation consistency, increasing
> the scope of globalization projects);
> * Managing change when implementing and using MT (e.g. switching between
> multiple MT systems, limiting degradations when updating or upgrading an
> MT system);
> * Implementing open-source MT in the SMB or enterprise (e.g. strategies
> to get support, reports on taking pilot results into full deployment,
> examples of advance customisation sought and obtained thanks to the
> open-source paradigm, collaboration within open-source MT projects);
> * Evaluation of MT in a real-world setting (e.g. error detection
> strategies employed, metrics used, productivity or translation quality
> gains achieved);
> * Post-editing strategies and tools (e.g. limitations of traditional
> translation quality assurance tools, challenges associated with
> post-editing guidelines);
> * Legal issues associated with MT, especially MT in the cloud (e.g.
> copyright, privacy);
> * Use of MT in social networking or real-time communication (e.g.
> enterprise support chat, multilingual content for social media);
> * Use of MT to process multilingual content for assimilation purposes
> (e.g. cross-lingual information retrieval, MT for e-discovery or spam
> detection, MT for highly dynamic content);
> * Use of standards for MT.
>
> Papers should highlight problems and solutions and not merely describe
> MT integration process or project settings. Where solutions do not seem
> to exist, suggestions for MT researchers and developers should be
> clearly emphasized. For user papers produced by academics, we require
> co-authorship with the actual users.
>
> ------------------------------------
> Project/Product description
> ------------------------------------
> Abstract submissions (1 page) are invited to report new, interesting:
>
> * Tools for machine translation, computer aided translation, and the
> like (including commercial products and open-source software). The
> authors should be ready to present the tools in the form of demos or
> posters during the conference.
> * Research projects related to machine translation. The authors should
> be ready to present the projects in the form of posters during the
> conference. This follows on from the successful ?project villages? held
> at the last two EAMT conferences.
>
> ------------------------------------
> Programme
> ------------------------------------
> The programme will include oral presentations and poster sessions.
> Accepted papers may be assigned to an oral or poster session, but no
> differentiation will be made in the conference proceedings.
>
> ------------------------------------
> Important Dates
> ------------------------------------
> * Paper submission: February 5, 2015
> * Notification to authors: March 12, 2015
> * Camera-ready deadline: April 2, 2015
> * Conference: May 11-13, 2015
>
> ------------------------------------
> Conference website
> ------------------------------------
> http://www.eamt2015.org/
>
> For further information about this call for papers please contact the
> track chairs at eamt2015@dlsi.ua.es and put in the title "[user]" or
> "[research]" depending on which track your question is related to. For
> questions about the organisation (venue, registration, accommodation,
> etc.) please contact the local organisers at secretariat@eamt2015.org.
>
> Kind regards
> --
> Gema Ram?rez-S?nchez, Fred Hollowood and Felipe S?nchez-Mart?nez
> on behalf of the EAMT 2015 Organising Committee
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 25 Nov 2014 12:02:32 +0100
> From: Hoang Cuong <hoangcuong2011@gmail.com>
> Subject: Re: [Moses-support] Too large language models - how to handle
> that?
> To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
> Cc: moses-support@mit.edu
> Message-ID:
> <CAG1fz7d=
> J22g1SG1iemAtN9-MvptaXir7eoehZzzSC1oVigFFw@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Raj, Tom and Marcin,
> I binarized the ARPA file last night, following your suggestion. In the
> end, it resulted a binarized LM file of roughly *100GB* (@Marcin - it is
> not 20-30GB as you suggest, is it okay with this size?)
> Fortunately, the infrastructure at my university allows me to run
> experiments with that.
> Thanks a lot for your help.
> It is so great to play with such huge LMs :))
> Best,
>
>
> On Mon, Nov 24, 2014 at 3:19 PM, Marcin Junczys-Dowmunt <
> junczys@amu.edu.pl>
> wrote:
>
> > The command
> >
> > moses/bin/build_binary trie -a 22 -b 8 -q 8 lm.arpa lm.kenlm
> >
> > will build a compressed binarized model with quantization. You can run
> >
> > moses/bin/build_binary lm.arpa
> >
> > without any parameters to get size estimates for different parameter
> > settings. I would guess you will get a binarized LM of roughly 20 to 30
> GB
> > which is managable (provided the size you gave us is that of an
> > uncompressed text file). You can also use lmplz to build pruned models in
> > the first place, these will be much smaller.
> >
> > W dniu 2014-11-24 15:11, Tom Hoar napisa?(a):
> >
> > After binarizing such a large ARPA file with KenLM, you'll need to
> > configure your moses.ini file to "lazily load the model using mmap." This
> > involves using lmodel-file code "9" vs code "8." More details here:
> > https://kheafield.com/code/kenlm/moses/
> >
> > Performance improves significantly if you store the binarized file on an
> > SSD.
> >
> >
> >
> >
> > On 11/24/2014 07:00 PM, Raj Dabre wrote:
> >
> > Hey Hoang,
> > You should binarize the arpa file.
> > The readme of the LM tool (KenLM or IRSTLM or SRILM) will tell you how.
> > Regards.
> >
> > On Mon, Nov 24, 2014 at 7:07 PM, Hoang Cuong <hoangcuong2011@gmail.com>
> > wrote:
> >
> >> Hi all,
> >> I have trained an (unpruned) 5-grams language model on a large corpus of
> >> 5 billion words, resulting an ARPA-format file of roughly 300GB (is it a
> >> normal LM size with such a big monolingual data?). This is obviously too
> >> big for running an SMT system.
> >> I read several works where their system uses language models trained on
> >> similar monolingual corpus. Could you give me some advice how to handle
> >> this, making it feasible to run SMT systems?
> >> I appreciate your help a lot,
> >> Best,
> >> --
> >> Best Regards,
> >> Hoang Cuong
> >> SMTNerd
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> >
> >
> > --
> > Raj Dabre.
> > Research Student,
> > Graduate School of Informatics,
> > Kyoto University.
> > CSE MTech, IITB., 2011-2014
> >
> >
> > _______________________________________________
> > Moses-support mailing listMoses-support@mit.eduhttp://
> mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> > _______________________________________________
> > Moses-support mailing listMoses-support@mit.eduhttp://
> mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
>
>
> --
>
> *Best Regards,Hoang CuongSMTNerd*
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://mailman.mit.edu/mailman/private/moses-support/attachments/20141125/439873f3/attachment.htm
>
> ------------------------------
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> End of Moses-support Digest, Vol 97, Issue 77
> *********************************************
>
--
Regards:
??? ??? ????/Raj Nath Patel
http://kbcs.in/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141125/37ab6c06/attachment.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 97, Issue 79
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 97, Issue 79"
Post a Comment