Moses-support Digest, Vol 106, Issue 7

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Deadline extended CFP: MT Summit XV Workshop on Patent and
Scientific Literature Translation (PSLT 2015) (Takashi Tsunakawa)
2. Re: EMS results - makes sense ? (Hieu Hoang)
3. Re: EMS results - makes sense ? (Vincent Nguyen)
4. Re: EMS results - makes sense ? (Barry Haddow)
5. Re: Language model creation error (Hieu Hoang)

----------------------------------------------------------------------

Message: 1
Date: Tue, 4 Aug 2015 17:03:42 +0900
From: Takashi Tsunakawa <tuna@inf.shizuoka.ac.jp>
Subject: [Moses-support] Deadline extended CFP: MT Summit XV Workshop
on Patent and Scientific Literature Translation (PSLT 2015)
To: moses-support@mit.edu
Message-ID: <55C071DE.3020305@inf.shizuoka.ac.jp>
Content-Type: text/plain; charset=utf-8; format=flowed

Dear Moses-support ML members,

I introduce the 2st Call for Papers of The 6th Workshop on Patent and
Scientific Literature Translation (PSLT 2015) held at MT Summit XV.
PSLT 2015 is giving authors an additional week to prepare submissions.
The new deadline for submission is August 17.

[Apologies for multiple copies]

-----------------------------------------------------------------------------------------------
2nd Call for Papers:
The 6th Workshop on Patent and Scientific Literature Translation (PSLT 2015)
-----------------------------------------------------------------------------------------------

Following the success of MT Summit 2005, 2007, 2009, 2011, and 2013
Workshops on Patent Translation, we are organizing the 6th Workshop on
Patent and Scientific Literature Translation (PSLT 2015) as a part of MT
Summit 2015 in Miami, Florida, USA. While patent information is still
one of the major application areas of machine translation, the need for
translation of different kinds of scientific literature has also been
increasing rapidly. This edition extends the area of interest to
translation of scientific literature including patents, scientific
articles, and technical reports, which have common characteristics as
well as their own characteristics. The workshop, which consists of
invited talks, presentation of submitted papers, and free discussion
like the previous editions, will be an opportunity for researchers and
practitioners to get together and exchange their ideas and experiences.
We will have the following invited speakers at PSLT 2015: Stefan Riezler
(Heidelberg University), Bruno Pouliquen (World Intellectual Property
Organization), John Tinsley (Iconic Translation Machines Ltd.), and
Toshiaki Nakazawa (Japan Science and Technology Agency).

-----------------------
Topics of Interests
-----------------------
We solicit original research papers as well as survey papers and user
reports. Topics of interests include but not limited to:
* Machine translation of patents and scientific literature
* Domain adaptation of MT systems
* Translation aids for patents and scientific literature
* Language resources for patent and scientific literature translation
* Evaluation techniques for patent and scientific literature translation
* Controlled languages and machine translation
* Multilingual retrieval and classification of patents and scientific
literature

--------------------
Important Dates
--------------------
* August 17: Paper submission deadline
* August 31: Notifications to authors
* September 7: Camera-ready versions due
* October 30: Workshop

------------------------------
Submission Instructions
------------------------------
The format for papers is the same as for regular MT Summit 2015
submissions. Papers must be written in English and not exceed 12
(twelve) pages plus 4 (four) pages for references. All papers should
follow the formatting instructions included with the style files, and
should be submitted in PDF. Latex, PDF and MS Word style files are
available at
http://www.amtaweb.org/mt-summit-xv-mt-researchers-call-for-papers/. To
allow for blind reviewing, please do not include author names and
affiliations within the paper and avoid obvious self-references.

Papers must be submitted to the START system
(https://www.softconf.com/mtsummit-xv/PSLT-2015/) by 11:59 pm PDT (GMT ?
7 hours), Monday August 17, 2015.

-----------------------------
Workshop Organization
-----------------------------
PC Co-Chairs:
Hiroyuki Kaji (Shizuoka University, Japan)
Katsuhito Sudoh (NTT, Japan)
PC Members:
Key-Sun Choi (KAIST, Korea)
Hiroshi Echizen-ya (Hokkai-Gakuen University, Japan)
Terumasa Ehara (Ehara NLP Research Laboratory, Japan)
Isao Goto (NHK, Japan)
Kinji Hanawa (Japan Patent Information Organization, Japan)
Takayuki Hayakawa (Japan Patent Information Organization, Japan)
Munpyo Hong (Sungkyunkwan University, Korea)
Eduard Hovy (Carnegie Mellon University, USA)
Kenji Imamura (National Institute of Information and Communications
Technology, Japan)
Hideki Isozaki (Okayama Prefectural University, Japan)
Hiroaki Kawai (Japan Patent Information Organization, Japan)
Philipp Koehn (Johns Hopkins University, USA)
Akira Kumano (Toshiba Solutions Corporation, Japan)
Sadao Kurohashi (Kyoto University, Japan)
Jong-Hyeok Lee (Pohang University of Science and Technology, Korea)
Bente Maegaard (University of Copenhagen, Denmark)
Toshimichi Moriya (Japan Patent Information Organization, Japan)
Toshiaki Nakazawa (Japan Science and Technology Agency, Japan)
Takashi Ninomiya (Ehime University, Japan)
Tadaaki Oshio (Japan Patent Information Organization, Japan)
Svetlana Sheremetyeva (Lanaconsult, Denmark)
Sayori Shimohata (Oki Electric Industry Co., Ltd., Japan)
Jun-ichi Tsujii (Artificial Intelligence Research Center, AIST, Japan)
Takashi Tsunakawa (Shizuoka University, Japan)
Takehito Utsuro (University of Tsukuba, Japan)
Andy Way (Dublin City University, Ireland)
Shoichi Yokoyama (Yamagata University, Japan)
Jiajun Zhang (Chinese Academy of Sciences, China)

Publications Chair:
Takashi Tsunakawa (Shizuoka University, Japan)

Workshop Web-site:
http://www.aamtjapio.com/pslt2015

--
Takashi Tsunakawa

Kaji Laboratory,
College of Informatics, Shizuoka University

tuna@inf.shizuoka.ac.jp

------------------------------

Message: 2
Date: Tue, 4 Aug 2015 18:02:48 +0400
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] EMS results - makes sense ?
To: Vincent Nguyen <vnguyen@neuf.fr>, moses-support
<moses-support@mit.edu>
Message-ID: <55C0C608.4080507@gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed

On 03/08/2015 13:00, Vincent Nguyen wrote:
> Hi,
>
> Just a heads up on some EMS results, to get your experienced opinions.
>
> Corpus: Europarlv7 + NC2010
> fr => en
> Evaluation NC2011.
>
> 1) IRSTLM vs KenLM is much slower for training / tuning.
that sounds right. KenLM is also multithreaded, IRSTLM can only be used
in single-threaded decoding.
>
> 2) BLEU results are almost the same (25.7 with Irstlm, 26.14 with KenLM)
true
>
> 3) Compact Mode is faster than onDisk with a short test (77 segments 96
> seconds, vs 126 seconds)
true
>
> 4) One last thing I do not understand though :
> For sake of checking, I replaced NC2011 by NC2010 in the evaluation (I
> know since NC2010 is part of training, should not be relevant)
> I got roughly the same BLEU score. I would have expected a higher score
> with a test set inculded in the training corpus.
>
> makes sense ?
>
>
> Next steps :
> What path should I use to get better scores ? I read the 'optimize'
> section of the website which deals more with speed
> and of course I will appply all of this but I was interested in tips to
> get more quality if possible.
look into domain adaptation if you have multiple training corpora, some
of which is in-domain and some out-of-domain.

Other than that, getting good bleu score is a research open question.

Well done on getting this far
>
>
> Thanks
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

--
Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

------------------------------

Message: 3
Date: Tue, 4 Aug 2015 16:58:57 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] EMS results - makes sense ?
To: Hieu Hoang <hieuhoang@gmail.com>, moses-support
<moses-support@mit.edu>
Message-ID: <55C0D331.9090109@neuf.fr>
Content-Type: text/plain; charset=windows-1252; format=flowed

thanks for your insights.

I am just stuck by the Bleu difference between my 26 and the 30 of
WMT11, and some results of WMT14 close to 36 or even 39

I am currently having trouble with hierarchical rule set instead of
lexical reordering
wondering if I will get better results but I have an error message
filesystem root low disk space before it crashes.
is this model taking more disk space in some ways ?

I will next try to use more corpora of which in domain with my internal TMX

thanks for your answers.

Le 04/08/2015 16:02, Hieu Hoang a ?crit :
>
>
> On 03/08/2015 13:00, Vincent Nguyen wrote:
>> Hi,
>>
>> Just a heads up on some EMS results, to get your experienced opinions.
>>
>> Corpus: Europarlv7 + NC2010
>> fr => en
>> Evaluation NC2011.
>>
>> 1) IRSTLM vs KenLM is much slower for training / tuning.
> that sounds right. KenLM is also multithreaded, IRSTLM can only be
> used in single-threaded decoding.
>>
>> 2) BLEU results are almost the same (25.7 with Irstlm, 26.14 with KenLM)
> true
>>
>> 3) Compact Mode is faster than onDisk with a short test (77 segments 96
>> seconds, vs 126 seconds)
> true
>>
>> 4) One last thing I do not understand though :
>> For sake of checking, I replaced NC2011 by NC2010 in the evaluation (I
>> know since NC2010 is part of training, should not be relevant)
>> I got roughly the same BLEU score. I would have expected a higher score
>> with a test set inculded in the training corpus.
>>
>> makes sense ?
>>
>>
>> Next steps :
>> What path should I use to get better scores ? I read the 'optimize'
>> section of the website which deals more with speed
>> and of course I will appply all of this but I was interested in tips to
>> get more quality if possible.
> look into domain adaptation if you have multiple training corpora,
> some of which is in-domain and some out-of-domain.
>
> Other than that, getting good bleu score is a research open question.
>
> Well done on getting this far
>>
>>
>> Thanks
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>

------------------------------

Message: 4
Date: Tue, 04 Aug 2015 16:28:26 +0100
From: Barry Haddow <bhaddow@inf.ed.ac.uk>
Subject: Re: [Moses-support] EMS results - makes sense ?
To: Vincent Nguyen <vnguyen@neuf.fr>, Hieu Hoang
<hieuhoang@gmail.com>, moses-support <moses-support@mit.edu>
Message-ID: <55C0DA1A.1090001@inf.ed.ac.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hi Vincent

If you are comparing to the results of WMT11, then you can look at the
system descriptions to see what the authors did. In fact it's worth
looking at the WMT14 descriptions (WMT15 will be available next month)
to see how state-of-the-art systems are built.

For fr-en or en-fr, the first thing to look at is the data. There are
some large data sets released for WMT and you can get a good gain from
just crunching more data (monolingual and parallel). Unfortunately this
takes more resources (disk, cpu etc) so you may run into trouble here.

The hierarchical models are much bigger so yes you will need more disk.
For fr-en/en-fr it's probably not worth the extra effort,

cheers - Barry

On 04/08/15 15:58, Vincent Nguyen wrote:
> thanks for your insights.
>
> I am just stuck by the Bleu difference between my 26 and the 30 of
> WMT11, and some results of WMT14 close to 36 or even 39
>
> I am currently having trouble with hierarchical rule set instead of
> lexical reordering
> wondering if I will get better results but I have an error message
> filesystem root low disk space before it crashes.
> is this model taking more disk space in some ways ?
>
> I will next try to use more corpora of which in domain with my internal TMX
>
> thanks for your answers.
>
> Le 04/08/2015 16:02, Hieu Hoang a ?crit :
>>
>> On 03/08/2015 13:00, Vincent Nguyen wrote:
>>> Hi,
>>>
>>> Just a heads up on some EMS results, to get your experienced opinions.
>>>
>>> Corpus: Europarlv7 + NC2010
>>> fr => en
>>> Evaluation NC2011.
>>>
>>> 1) IRSTLM vs KenLM is much slower for training / tuning.
>> that sounds right. KenLM is also multithreaded, IRSTLM can only be
>> used in single-threaded decoding.
>>> 2) BLEU results are almost the same (25.7 with Irstlm, 26.14 with KenLM)
>> true
>>> 3) Compact Mode is faster than onDisk with a short test (77 segments 96
>>> seconds, vs 126 seconds)
>> true
>>> 4) One last thing I do not understand though :
>>> For sake of checking, I replaced NC2011 by NC2010 in the evaluation (I
>>> know since NC2010 is part of training, should not be relevant)
>>> I got roughly the same BLEU score. I would have expected a higher score
>>> with a test set inculded in the training corpus.
>>>
>>> makes sense ?
>>>
>>>
>>> Next steps :
>>> What path should I use to get better scores ? I read the 'optimize'
>>> section of the website which deals more with speed
>>> and of course I will appply all of this but I was interested in tips to
>>> get more quality if possible.
>> look into domain adaptation if you have multiple training corpora,
>> some of which is in-domain and some out-of-domain.
>>
>> Other than that, getting good bleu score is a research open question.
>>
>> Well done on getting this far
>>>
>>> Thanks
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

Message: 5
Date: Tue, 4 Aug 2015 19:37:35 +0400
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Language model creation error
To: kalu mera <kalumera7@gmail.com>, moses-support@mit.edu
Message-ID: <55C0DC3F.2000303@gmail.com>
Content-Type: text/plain; charset="windows-1252"

when you compile with IRSTLM, you must get the latest version. The
latest version is 5.80.08, from
http://sourceforge.net/projects/irstlm/files/

On 01/08/2015 12:17, kalu mera wrote:
> Dear Members,
> I am trying to create a language model creation, I entered this command
> kalumera@kalumera-Satellite-C50-A534:~/mosesdecoder$ ./bjam
> --with-boost=~/workspace/temp/boost_1_55_0 -j4
>
> but the build failed
>
> Please check the attachment for the command i entered and the error,
> and help advise me on how to rectify the problem
>
> Christine
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

--
Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150804/22a389d5/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 106, Issue 7
*********************************************

Moses-support Digest, Vol 106, Issue 7

0 Response to "Moses-support Digest, Vol 106, Issue 7"

Post a Comment