Moses-support Digest, Vol 105, Issue 14

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: NPLM and BilingualNPLM not working as expected in Moses
(Raj Dabre)
2. DEADLINE for submission of ABSTRACTS FOR PRESENTATIONS AND
POSTERS EXTENDED TO 13 JULY 2015: TC37, 26-27 November 2015,
London (Rohit Gupta)

----------------------------------------------------------------------

Message: 1
Date: Tue, 7 Jul 2015 13:39:48 +0900
From: Raj Dabre <prajdabre@gmail.com>
Subject: Re: [Moses-support] NPLM and BilingualNPLM not working as
expected in Moses
To: Rico Sennrich <rico.sennrich@gmx.ch>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAB3gfjCgcUpqehbnj7_anN9WW5Gq5C9JgcOnNxB8U+8D5KbqAQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello again,
I have checked the encoding of all my LM, TM, tune and test set files and
they are all utf-8.
So, encoding does not seem to be the issue.
I also verified that there were no mistakes involving the source and target
side while making and using the LM's.
I am retraining all my model files and will try this again.

Regards.

On Tue, Jul 7, 2015 at 12:43 PM, Raj Dabre <prajdabre@gmail.com> wrote:

> Hello Rico,
> Now that you mention it I also performed an additional test.
> I took a translation and obtained the perplexity score by querying the
> kenlm and nplm from the command line. In this case the difference between
> the scores was not that large.
> It might be an encoding issue.
> I will check again and let you know.
>
> However the data I am using to train the LM's (KENLM, NPLM and BILM) is
> the same as I am using to train. I should also mention that I did no
> tokenization etc before training the LM's and the TM.
> Thanks for your replies.
> Regards.
>
> On Tue, Jul 7, 2015 at 1:18 AM, Rico Sennrich <rico.sennrich@gmx.ch>
> wrote:
>
>> Hi Raj,
>>
>> the information you provide is pretty vague, so I'm just making some wild
>> guesses here:
>>
>> it could be a user error, for instance an inconsistency between the
>> training sets used for training BilingualNPLM and the phrase table. Check
>> that the same version of the corpus (including tokenization, truecasing
>> etc.) was used for training, and that you did not mix up source and target
>> language. Also check that the settings during training are consistent with
>> those in the moses.ini file.
>>
>> it's possible that some of the settings (vocabulary size, number of
>> training epochs, or similar) are unsuitable for your task. For example,
>> since you have a relatively small training corpus, you may need more epochs
>> of training to get good results (use a validation set to see if model
>> perplexity converges).
>>
>> please double-check that there were no problems with the unicode-handling
>> of Japanese/Chinese characters, and that the encoding of your vocabulary
>> files matches that of the translation model, and the decoder input. We have
>> never experienced such problems, but they could arise for some system
>> configurations.
>>
>> best wishes,
>> Rico
>>
>>
>>
>> On 06.07.2015 16:31, Raj Dabre wrote:
>>
>> Hello Rico,
>> I trained both mono as well as bilingual LM's.
>> Both seemed ineffective.
>> As I mentioned before, I am working with Chinese-Japanese and the domain
>> is paper abstracts.
>> I did check the n-best lists and I saw a significant difference between
>> the LM scores when comparing the runs for KenLm and NPLM.
>> What could have gone wrong during the training?
>> Regards.
>>
>> On Mon, Jul 6, 2015 at 10:53 PM, Rico Sennrich <rico.sennrich@gmx.ch>
>> wrote:
>>
>>> Hello Raj,
>>>
>>> can you please clarify if you tried to train a monolingual LM
>>> (NeuralLM), a bilingual LM (BilingualNPLM), or both? Our previous
>>> experiences with BilingualNPLM are mixed, and we observed improvements for
>>> some tasks and language pairs, but not for others. See for instance:
>>>
>>> Alexandra Birch, Matthias Huck, Nadir Durrani, Nikolay Bogoychev and
>>> Philipp Koehn. 2014. Edinburgh SLT and MT System Description for the IWSLT
>>> 2014 Evaluation. Proceedings of IWSLT 2014.
>>>
>>> To help debugging, you can check the scores in the n-best lists of the
>>> tuning runs. If the NPLM features give much higher costs than KenLM
>>> (trained on the same data), this can indicate that something went wrong
>>> during training.
>>>
>>> best wishes,
>>> Rico
>>>
>>> On 06.07.2015 14:29, Raj Dabre wrote:
>>>
>>> Dear all,
>>> I have checked out the latest version of moses and nplm and compiled
>>> moses successfully with the --with-nplm option.
>>> I got a ton of warnings during compilation but in the end it all worked
>>> out and all the desired binaries were created. Simply executing the moses
>>> binary told me the the BilingualNPLM and NeuralLM features were available.
>>>
>>> I trained an NPLM model based on the instructions here:
>>> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc33
>>> The corpus size I used was about 600k lines (for Chinese-Japanese;
>>> Target is Japanese)
>>>
>>> I then integrated the resultant language model (after 10 iterations)
>>> into the decoding process by moses.ini
>>>
>>> I initiated tuning (standard parameters) and I got no errors, which
>>> means that the neural language model (NPLM) was recognized and queried
>>> appropriately.
>>> I also ran tuning without a language model.
>>>
>>> The strange thing is that the tuning and test BLEU scores for both
>>> these cases are almost the same. I checked the weights and saw that the LM
>>> was assigned a very low weight.
>>>
>>> On the other hand when I used KENLM on the same data.... I had
>>> comparatively higher BLEU scores.
>>>
>>> Am I missing something? Am I using the NeuralLM in an incorrect way?
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> --
>>> Raj Dabre.
>>> Doctoral Student,
>>> Graduate School of Informatics,
>>> Kyoto University.
>>> CSE MTech, IITB., 2011-2014
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>>
>> --
>> Raj Dabre.
>> Doctoral Student,
>> Graduate School of Informatics,
>> Kyoto University.
>> CSE MTech, IITB., 2011-2014
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> Raj Dabre.
> Doctoral Student,
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
>
>

--
Raj Dabre.
Doctoral Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150707/eb886e0a/attachment-0001.htm

------------------------------

Message: 2
Date: Tue, 7 Jul 2015 14:26:54 +0200
From: Rohit Gupta <enggrohitgupta@gmail.com>
Subject: [Moses-support] DEADLINE for submission of ABSTRACTS FOR
PRESENTATIONS AND POSTERS EXTENDED TO 13 JULY 2015: TC37, 26-27
November 2015, London
To: mt-list@eamt.org, moses-support@mit.edu
Message-ID:
<CAB-CSF-UfbMKzS9ZYGKNzVmsLNWGC9brREkZ9Ouo0Bk0zhUHbw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

[apologies for cross-posting]

*** *DEADLINE *for submission of ABSTRACTS FOR PRESENTATIONS AND
POSTERS *EXTENDED
TO 13 JULY 2015* ***

(see revised submission guidelines)

*37th Translating and the Computer Conference (TC37), 26-27 November 2015,
London*

ASLING (Association internationale pour la promotion des technologies
linguistiques / International Association for the Advancement in Language
Technology) is delighted to announce the forthcoming, 37th edition of the
annual Translating and the Computer Conference (TC37) to take place on 26
and 27 November 2015 in London. The TC conference has emerged as a leading
forum for users, developers and vendors of Translation Technology tools,
being a distinctive event for discussing the latest developments. It is an
annual meeting where translators, researchers and business people, from
translation companies, international organisations, universities and
research labs, as well as freelance professionals, exchange ideas and
discuss hot topics.

The TC conference takes the form of presentations as well as posters and
also features panel discussions and workshops. If you or a colleague have
something interesting to contribute, this call invites you to consider
submitting an extended abstract of a paper or poster for the TC37
conference. If on the other hand you have a workshop to propose, please
provide an abstract (maximum 750 words) describing the topic and an outline
of the structure.

*Conference topics*

Contributions are invited on any topic related to the technology used in
translation and interpreting, including, but not limited to, CAT tools
(Translation Memory (TM) systems, integration of Machine Translation in TM
systems), Terminology Management, Machine Translation (training, quality
assessment, post-editing, adaptation), Quality Control, Interoperability,
Crowd-sourcing, Natural Language Processing and Translation Workflow and
Management. Among the other important topics are training (including
university-level translation and interpretation programmes and the rapidly
changing translation industry), resources for translators, tools and
resources for interpreters, how to facilitate collaboration between
translators and translation companies, and mobile technologies to support
translators' work.

*Submission guidelines*

Original unpublished papers and posters on all aspects of translation and
interpretation technology are invited. Topics of interest include, but are
not limited to, those listed above. Papers and posters may report (among
other things) on research, on commercial translation products or on user
experiences. The difference between papers and posters is that while
papers are expected to report on more conclusive results, posters can present
ongoing and not necessarily completed research, teaching or training
activity, practical work, software programs, projects or new developments.

*Papers*:

Authors are invited to submit an extended abstract (maximum of 750 words)
of the paper they would like to present, together with a short 200 word
abstract and short biography. Although the extended abstract is limited to
750 words (longer papers will NOT be considered), it should provide
sufficient information to allow evaluation of the submission by the
programme committee. The abstracts of accepted papers will be used in
online programme and event advertising.

Camera-ready versions of the accepted papers will be published in the
conference e-proceedings with an assigned ISBN number, subject to the
presenter having duly registered for the conference. Their length should
not exceed 5,000 words.

*Posters*:

Poster proposals are invited in the form of poster abstracts not exceeding
500 words. Authors should submit a 200-word version as well as a short
biography for dissemination.

Camera-ready versions of the accepted posters will be published in the
conference e-proceedings with an assigned ISBN number, subject to the
presenter having duly registered for the conference. Their length should
not exceed 2,000 words.

*Submission:*

Both papers and posters should be submitted via the START conference
submission system. For further information, go to the *conference
website (**http://www.translatingandthecomputer.com/
<http://www.translatingandthecomputer.com/>) *and follow the information
and links to "upload submissions for Papers and Posters".

A direct link to the START conference management system site, to which you
need to upload your proposal for this conference
is:*https://www.softconf.com/e/tc2015/
<https://www.softconf.com/e/tc2015/>*

*(If you have never registered as a user in the Softconf START conference
management system, follow the link "New user? please register first by
clicking HERE." to create a user entry for yourself.** If you used this
system to submit a proposal to Translating and the Computer 2012-2014, or
for any other conference that uses this START system, you can use your
previous user name to enter the system. "If you lost or forgot your
password", there is a link on the site labelled as such, to reset your
password.)*

The website also provides further guidelines. Successful submissions will
require that the final full length papers or poster make use of format
stylesheets. These will be made available in the form of Word (and LaTeX)
stylesheets, in due time.

*Schedule*

13 July 2015 - *extended *deadline for abstracts of papers and posters

10 August 2015 - all authors notified of decisions

30 September 2015 - speakers' full papers and posters to be submitted for
inclusion in the e-proceedings

14 November 2015 - speakers' presentations to be submitted

26-27 November 2015 - conference takes place in London

*Conference Chairs*

Jo?o Esteves-Ferreira, Tradulex, International Association for Quality
Translation

Juliet Macan, Arancho Doc srl.

Ruslan Mitkov, University of Wolverhampton

Olaf-Michael Stefanov, United Nations (ret), JIAMCATT

*Programme Committee*

Juanjo Arevalillo, Hermes Traducci?nes

Wilker Aziz, University of Amsterdam

David Chambers, World Intellectual Property Organisation (ret)

Gloria Corpas Pastor, University of M?laga

Iwan Davies, Institute of Translation and Interpreting

Joanna Drugan, University of East Anglia

David Filip, CNGL / ADAPT

Paola Valli, TAUS/University of Trieste

Nelson Ver?stegui, International Telecommunications Union (ret)

David Verhofstadt, International Atomic Energy Agency

*AsLing.org <http://asling.org/>*

*Association internationale pour la promotion des technologies
linguistiques*

*International Association for Advancement in Language Technology*

----------------------
Thanks & Regards
Rohit Gupta

*Marie Curie Early Stage Researcher, EXPERT Project*Research Group in
Computational Linguistics
Research Institute of Information and Language Processing
University of Wolverhampton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150707/44b99292/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 105, Issue 14
**********************************************

Moses-support Digest, Vol 105, Issue 14

0 Response to "Moses-support Digest, Vol 105, Issue 14"

Post a Comment