Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: how much disk sapce for the Giga fr-en corpus ?
(Vincent Nguyen)
2. ParFDA WMT'15 Datasets (Ergun Bicici)
----------------------------------------------------------------------
Message: 1
Date: Sun, 9 Aug 2015 13:47:08 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] how much disk sapce for the Giga fr-en
corpus ?
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <55C73DBC.5050303@neuf.fr>
Content-Type: text/plain; charset="utf-8"
I think at 400GB I was not very far. 500GB was more than enough without
the -sort-compress gzip options.
Now it's binarizing / compact, taking very very long too...
I will update timings when tuning done.
Le 08/08/2015 12:06, Hieu Hoang a ?crit :
> i don't think anyone's measured it. If you have any measurements,
> perhaps you can let us know.
>
> if you have a fairly recent version of unix sort, you can also add
> [TRAINING]
> training-options = "-sort-compress gzip"
> to reduce disk space requirement.
>
> however, i would say you need PLENTY of space. If you just have enough
> to do extraction and no more, you're gonna have a hard time doing the
> rest of the experiments.
>
>
> Hieu Hoang
> Researcher
> New York University, Abu Dhabi
> http://www.hoang.co.uk/hieu
>
> On 8 August 2015 at 13:55, Vincent Nguyen <vnguyen@neuf.fr
> <mailto:vnguyen@neuf.fr>> wrote:
>
> Hi,
> I keep adding 100GB on my space, even at 400GB it crashed at sorting
> time after the extract tables....
> now trying 500GB
> Will I need more ?
> is there a rule ?
> cheers,
> Vincent
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150809/d24b2c2f/attachment-0001.htm
------------------------------
Message: 2
Date: Sun, 9 Aug 2015 14:27:19 +0100
From: Ergun Bicici <Ergun.Bicici@computing.dcu.ie>
Subject: [Moses-support] ParFDA WMT'15 Datasets
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAB2pGnfSBfu8EhS-g03Dg6npuQ+hT4YhFrwQRju8MoVUfqbdcQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
ParFDA WMT'15 Datasets
Dear moses-list,
We make the English, Czech, Finnish, French, German, and Russian datasets
available used when building ParFDA Moses SMT systems for research
purposes. Downloadable from:
https://drive.google.com/a/dcu.ie/folderview?id=0B6Jae6trZb1afjJ1T0ZOZlZFZUk0S2R3Z0U3eVdxN2tpQlVwTUgyX0tteVk4TnlhRHVJR2M&usp=sharing
Results are presented in the following citation from WMT'15 (
http://www.statmt.org/wmt15/).
Citation:
Ergun Bi?ici, Qun Liu, and Andy Way. ParFDA for Fast Deployment of Accurate
Statistical Machine Translation Systems, Benchmarks, and Statistics. In
Proceedings of the EMNLP 2015 Tenth Workshop on Statistical Machine
Translation, Lisbon, Portugal, September 2015.
The datasets and the SMT results can serve as a benchmark for SMT research
where further linguistic processing can be performed. The datasets allow
fast deployment of accurate SMT systems and can be used for benchmarking
the performance of SMT systems.
Language models were built using SRILM (
http://www.speech.sri.com/projects/srilm/). Language model corpora used
contain 15M sentences some of which are selected from LDC Gigaword corpora
by the Parallel FDA5 algorithm:
[5 use the LDC English Gigaword 5th edition]
- Czech - English
- Finnish - English
- French - English
- German - English
- Russian - English
[1 use the LDC French Gigaword 3rd edition]
- English - French
LICENSE: Dublin City University License for Open Data allowing use for
research and academic purposes.
Best Regards,
Ergun
Ergun Bi?ici, School of Computing, DCU, www.cngl.ie
http://www.computing.dcu.ie/~ebicici/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150809/ad475ac5/attachment-0001.htm
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 106, Issue 22
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 106, Issue 22"
Post a Comment