Moses-support Digest, Vol 150, Issue 7

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: About Bilingual LM in Moses (Rico Sennrich)
2. Call for Papers for the 2nd Workshop on Human-aided
translation (HAT) (Maxim Khalilov)

----------------------------------------------------------------------

Message: 1
Date: Mon, 15 Apr 2019 13:36:04 +0100
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] About Bilingual LM in Moses
To: moses-support@mit.edu
Message-ID: <d4065b56-572a-f2d0-7895-5aadf380a51f@gmx.ch>
Content-Type: text/plain; charset="utf-8"

Hello Ergun,

we've had the 'nan' issue reported before ( see
https://moses-support.mit.narkive.com/hs8LwsnT/blingual-neural-lm-log-likelihood-nan
https://moses-support.mit.narkive.com/fklzlBiW/bilingual-lm-nan-nan-nan ).

You can follow Nick's recommendation of lowering the learning rate, or
try to enable gradient clipping (which is commented out in the code).

I'm afraid nlpm is no longer heavily used, so it's unlikely that
somebody has fresh experience.

best wishes,
Rico

On 15/04/2019 12:44, Ergun Bicici wrote:
>
> I found that training also produced 'nan' scores:
> Training NCE log-likelihood: nan.
>
> I used EMS training:
> [LM:comb]
> nplm-dir = "Programs/nplm/"
> order = 5
> source-window = 4
> bilingual-lm = yes
> bilingual-lm-settings = "--prune-source-vocab 100000
> --prune-target-vocab 100000"
>
> I am re-running train_nplm.py.
>
> Ergun
>
> On Mon, Apr 15, 2019 at 2:26 PM Ergun Bicici <bicici@gmail.com
> <mailto:bicici@gmail.com>> wrote:
>
>
> Dear moses-support,
>
> I tried the nplm model on the German-English baseline dataset
> (?wget
> http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz)?and it
> improved the scores from 0.2266 to 0.2317 BLEU.
>
> I tried the bilingual LM:
> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc37
> However:
> - vocab files were not written in the end and I used
> extract_training.py to obtain them.
> - I still obtained 'nan' scores from the bilingual lm model.
> Error: "Not a label, not a score 'nan'. Failed to parse the scores
> string:
> 0 ||| ...????? ... ??????? .? ||| LexicalReordering0= -11.3723
> -15.4848 -26.5152 -17.8301 -6.95664 -16.8553 -29.4425 -22.5538
> OpSequenceModel0= -403.825 99 22 45 5 Distortion0= -146 LM0=
> -685.828 BLMcomb= nan WordPenalty0= -76 PhrasePenalty0= 53
> TranslationModel0= -242.874 -179.189 -291.623 -342.085 ||| nan
>
> KENLM name=LM0 factor=0 path=en-kk/lm.corpus.tok.kk.6.blm.bin order=6
> BilingualNPLM name=BLMcomb order=5 source_window=4
> path=wmt19_en-kk/lm/comb.blm.2/train.10
> source_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.source
> target_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.target
>
> Therefore, this may be due to some bug in moses C++ code and not
> the input data / configuration.
>
> The documentation appears also not in sync about "average the
> <null> word embedding as per the instructions here
> <http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#anchorNULL>."
> part since averageNullEmbedding.py asks for -i, -o, and -t.
>
> I found some related note in a paper by Barry Haddow at WMT'15
> saying that the model is not used in the final submission due to
> insignificant differences.
>
> Do you have any recent results on the bilingual LM model?
>
> --
>
> Regards,
> Ergun
>
>
>
>
> --
>
> Regards,
> Ergun
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20190415/06f49aff/attachment-0001.html

------------------------------

Message: 2
Date: Mon, 15 Apr 2019 13:49:33 +0100
From: Maxim Khalilov <maxkhalilov@gmail.com>
Subject: [Moses-support] Call for Papers for the 2nd Workshop on
Human-aided translation (HAT)
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAPKeiSBnk+gzdydbNLtGqsihT_ycTLoFBoHq7NzXghSi0_-Y+A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

*Call for Papers*

2nd Workshop on Human-aided translation (HAT)
Co-located with MT-Summit, Dublin (Ireland), 19th August 2019

With the recent advances in the machine translation era and the high
quality translations obtained by neural MT systems, we observe human
translators and MT systems changing their roles. Instead of using the MT
outputs as the raw material to start the translation, human translators now
just need to perform the very last touches on the automatic translations
and send them to the end-users.

The increased trust in MT quality, however, requires a more careful
monitoring of MT systems in the production line in order to spot errors at
the end of the translation pipeline and to fix them, either automatically
or manually.

In this pipeline, Quality, Cost, and Delivery speed are the three main
factors. We ultimately want to preserve translation quality while
increasing translation speed and keeping the final cost of translation in
different scenarios under control. To this end, quality estimation and
automatic post-editing solutions play important roles. The goal of quality
estimation is to evaluate a translation system?s quality without access to
the reference translations (Blatz et al., 2004; Specia et al., 2009). This
has many potential uses: informing the end user about the reliability of
translated content; deciding if a translation is ready for publishing or if
it requires human post-editing; and highlighting the words that need to be
changed. Quality estimation systems are particularly appealing for
crowd-sourced and professional translation services due to their potential
to dramatically reduce post-editing times and to save labor costs (Specia,
2011). The increasing interest in this problem from an industry angle comes
as no surprise (Federico et al., 2014; de Souza et al., 2015; Kozlova et
al., 2016; Martins et al., 2016, 2017; Wang et al., 2018). Recently, it has
also started to attract attention in the direct publishing scenario, mostly
from e-commerce companies (Ueffing, 2018; Wang et al. 2018).

Automatic post-editing, on the other hand, aims to automatically correct
the output of machine translation (Simard et al. (2007), Junczys-Dowmunt
and Grundkiewicz (2017, 2018)). Given the high quality translations
obtained by neural MT systems, the key question is if quality estimation
and automatic post-editing are still the thing!

The workshop of ?Human-aided Translation? builds upon the workshop of
?First Workshop on Translation Quality Estimation and Automatic
Post-Editing?, a successful and well-attended workshop recently held with
AMTA 2018. It will bring together academic and industry researchers, as
well as practitioners interested in the tasks of quality estimation (word,
sentence, or document level) and automatic post-editing, both from a
research perspective and with the goal of applying these systems in
industry settings for routing, for improving translation quality, or for
making human post-editors more efficient. In this edition, we will give
special emphasis to neural-based solutions for quality estimation and
automatic post-editing tools and their integration with neural machine
translation systems.

*Submissions*

We invite the submission of extended abstracts related to the topics of the
workshop. The authors of the accepted submissions will be invited for
contribution talks in the workshop. The abstracts should be no longer than
two pages, including references. Topics of the workshop include but are not
limited to:

- Research, review, and position papers on document-level, sentence-level,
or word-level Quality Estimation of neural MTs
- Research, review, and position papers on Automatic Post-Editing for
neural MTs
- Research, review, and position papers on Interactive neural MTs
- Corpora curation technologies for developing Quality Estimation datasets
- User studies showing the impact of Quality Estimation tools in translator
productivity
- Automatic metrics for translation fluency and adequacy
- Industrial experiences of adopting Quality Estimation for neural MTs
- Industrial experiences of adopting Automatic Post-Editing for neural MTs

Submissions should be formatted according to the ACL template (
http://www.acl2019.org/medias/340-acl2019-latex.zip).

The extended abstracts should be submitted via EasyChair system:
https://easychair.org/conferences/?conf=hat19. Abstracts will be reviewed
for relevance and quality. Accepted submissions will be posted online, and
offered oral presentations.

*Important dates*

Submission deadline: May 31
Notification date: June 28
Workshop day: August 19

*Confirmed invited speakers*

- Marko Turchi (FBK)
- Lucia Specia (University fo Sheffield)
- Marcin Junczys-Dowmunt (Microsoft)
- Dimitar Shterionov (ADAPT Centre)

*Organizers*

Maxim Khalilov (Unbabel): maxim@unbabel.com
M. Amin Farajian (Unbabel): amin@unbabel.com
Andr? Martins (Unbabel): andre.martins@unbabel.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20190415/a91f2846/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 150, Issue 7
*********************************************

Moses-support Digest, Vol 150, Issue 7

0 Response to "Moses-support Digest, Vol 150, Issue 7"

Post a Comment