Moses-support Digest, Vol 150, Issue 6

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. About Bilingual LM in Moses (Ergun Bicici)
2. Re: About Bilingual LM in Moses (Ergun Bicici)

----------------------------------------------------------------------

Message: 1
Date: Mon, 15 Apr 2019 14:26:17 +0300
From: Ergun Bicici <bicici@gmail.com>
Subject: [Moses-support] About Bilingual LM in Moses
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAB59qTNduwgRS+VNdUesrqapzUinGFPjV6vhK1jP7-f1AXXjwg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear moses-support,

I tried the nplm model on the German-English baseline dataset ( wget
http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz) and it improved
the scores from 0.2266 to 0.2317 BLEU.

I tried the bilingual LM:
http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc37
However:
- vocab files were not written in the end and I used extract_training.py to
obtain them.
- I still obtained 'nan' scores from the bilingual lm model.
Error: "Not a label, not a score 'nan'. Failed to parse the scores string:
0 ||| ... ???? ... ??????? . ||| LexicalReordering0= -11.3723 -15.4848
-26.5152 -17.8301 -6.95664 -16.8553 -29.4425 -22.5538 OpSequenceModel0=
-403.825 99 22 45 5 Distortion0= -146 LM0= -685.828 BLMcomb= nan
WordPenalty0= -76 PhrasePenalty0= 53 TranslationModel0= -242.874 -179.189
-291.623 -342.085 ||| nan

KENLM name=LM0 factor=0 path=en-kk/lm.corpus.tok.kk.6.blm.bin order=6
BilingualNPLM name=BLMcomb order=5 source_window=4
path=wmt19_en-kk/lm/comb.blm.2/train.10
source_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.source
target_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.target

Therefore, this may be due to some bug in moses C++ code and not the input
data / configuration.

The documentation appears also not in sync about "average the <null> word
embedding as per the instructions here
<http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#anchorNULL>."
part since averageNullEmbedding.py asks for -i, -o, and -t.

I found some related note in a paper by Barry Haddow at WMT'15 saying that
the model is not used in the final submission due to insignificant
differences.

Do you have any recent results on the bilingual LM model?

--

Regards,
Ergun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20190415/7782f70f/attachment-0001.html

------------------------------

Message: 2
Date: Mon, 15 Apr 2019 14:44:32 +0300
From: Ergun Bicici <bicici@gmail.com>
Subject: Re: [Moses-support] About Bilingual LM in Moses
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAB59qTNjEipjCFLv6B=YW1pe4o2V9B5kj4dHGWvkq-W+ecFcqA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I found that training also produced 'nan' scores:
Training NCE log-likelihood: nan.

I used EMS training:
[LM:comb]
nplm-dir = "Programs/nplm/"
order = 5
source-window = 4
bilingual-lm = yes
bilingual-lm-settings = "--prune-source-vocab 100000 --prune-target-vocab
100000"

I am re-running train_nplm.py.

Ergun

On Mon, Apr 15, 2019 at 2:26 PM Ergun Bicici <bicici@gmail.com> wrote:

>
> Dear moses-support,
>
> I tried the nplm model on the German-English baseline dataset ( wget
> http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz) and it improved
> the scores from 0.2266 to 0.2317 BLEU.
>
> I tried the bilingual LM:
>
> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc37
> However:
> - vocab files were not written in the end and I used extract_training.py
> to obtain them.
> - I still obtained 'nan' scores from the bilingual lm model.
> Error: "Not a label, not a score 'nan'. Failed to parse the scores string:
> 0 ||| ... ???? ... ??????? . ||| LexicalReordering0= -11.3723 -15.4848
> -26.5152 -17.8301 -6.95664 -16.8553 -29.4425 -22.5538 OpSequenceModel0=
> -403.825 99 22 45 5 Distortion0= -146 LM0= -685.828 BLMcomb= nan
> WordPenalty0= -76 PhrasePenalty0= 53 TranslationModel0= -242.874 -179.189
> -291.623 -342.085 ||| nan
>
> KENLM name=LM0 factor=0 path=en-kk/lm.corpus.tok.kk.6.blm.bin order=6
> BilingualNPLM name=BLMcomb order=5 source_window=4
> path=wmt19_en-kk/lm/comb.blm.2/train.10
> source_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.source
> target_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.target
>
> Therefore, this may be due to some bug in moses C++ code and not the input
> data / configuration.
>
> The documentation appears also not in sync about "average the <null> word
> embedding as per the instructions here
> <http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#anchorNULL>."
> part since averageNullEmbedding.py asks for -i, -o, and -t.
>
> I found some related note in a paper by Barry Haddow at WMT'15 saying that
> the model is not used in the final submission due to insignificant
> differences.
>
> Do you have any recent results on the bilingual LM model?
>
> --
>
> Regards,
> Ergun
>
>
>

--

Regards,
Ergun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20190415/ee214e6e/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 150, Issue 6
*********************************************

Moses-support Digest, Vol 150, Issue 6

0 Response to "Moses-support Digest, Vol 150, Issue 6"

Post a Comment