Moses-support Digest, Vol 96, Issue 10

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Factored Model / <s> Error ??? (Marwa Refaie)
2. Re: Factored Model / <s> Error ??? (Kenneth Heafield)
3. perplexity scores (koormoosh)
4. Re: perplexity scores (Rico Sennrich)

----------------------------------------------------------------------

Message: 1
Date: Wed, 8 Oct 2014 00:32:31 +0000
From: Marwa Refaie <basmallah@hotmail.com>
Subject: [Moses-support] Factored Model / <s> Error ???
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <DUB118-W23B0359F8E1EDE984F0183BAA30@phx.gbl>
Content-Type: text/plain; charset="windows-1256"

Thank's for help ,, I fix all what mentioned now I'm stuck in this error ::Start loading text SCFG phrase table. Moses format : [1.000] secondsReading /cygdrive/c/mosesdecoder-master/try/ai/sep/fsmt/work/model/phrase-table. 0,1-0,1.gz----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80 ---85---90---95--100Either your data contains <s> in a position other than the first word or your la nguage model is missing <s>. Did you build your ARPA using IRSTLM and forget to run add-start-end.sh?[1]+ Done c:/mosesdecoder-master/scripts/training/train-mode !
l.perl -external-bin-dir c:/mosesdecoder-master/tools/bin -root-dir work -corpus c:/mosesdecoder-master/try/ai/sep/fsmt/data/UNpos.lo -f en -e ar -Translation-f actors 0,1-0,1 -lm 0:5:/cygdrive/c:/mosesdecoder-master/try/ai/sep/fsmt/model/su rface.lm -lm 1:5:/cygdrive/c:/mosesdecoder-master/try/ai/sep/fsmt/model/pos.lm & >training.outAborted (core dumped)>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>As mentioned I build non factored model , it worked well, but when I start use the pos & sur!
face language model for my training data I got these errors ??I used t
he SRILM.

Marwa N. Refaie

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141008/2f038bc7/attachment-0001.htm

------------------------------

Message: 2
Date: Tue, 07 Oct 2014 21:13:36 -0400
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Factored Model / <s> Error ???
To: moses-support@mit.edu
Message-ID: <54348FC0.6010103@kheafield.com>
Content-Type: text/plain; charset=ISO-8859-1

Well, does your data contain <s> in a position other than the first
word? If so you should be escaping it e.g. with the Moses tokenizer.

On 10/07/14 20:32, Marwa Refaie wrote:
>
>
> Thank's for help ,, I fix all what mentioned now I'm stuck in this error ::
>
> Start loading text SCFG phrase table. Moses format : [1.000] seconds
> Reading
> /cygdrive/c/mosesdecoder-master/try/ai/sep/fsmt/work/model/phrase-table.
>
> 0,1-0,1.gz
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80
>
> ---85---90---95--100
> Either your data contains <s> in a position other than the first word or
> your la
> nguage model is missing <s>. Did you
> build your ARPA using IRSTLM and forget to
> run
> add-start-end.sh?
> [1]+ Done
> c:/mosesdecoder-master/scripts/training/train-mode
>
> l.perl -external-bin-dir c:/mosesdecoder-master/tools/bin -root-dir
> work -corpus
>
> c:/mosesdecoder-master/try/ai/sep/fsmt/data/UNpos.lo -f en -e ar
> -Translation-f
> actors 0,1-0,1 -lm
> 0:5:/cygdrive/c:/mosesdecoder-master/try/ai/sep/fsmt/model/su
>
> rface.lm -lm
> 1:5:/cygdrive/c:/mosesdecoder-master/try/ai/sep/fsmt/model/pos.lm &
>
> >training.out
> Aborted (core dumped)
>
>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> As mentioned I build non factored model , it worked well, but when I
> start use the pos & surface language model for my training data I got
> these errors ??
> I used the SRILM.
>
>
>
>
>
> /*Marwa N. Refaie*/
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

------------------------------

Message: 3
Date: Wed, 8 Oct 2014 14:16:02 +1100
From: koormoosh <koormoosh@gmail.com>
Subject: [Moses-support] perplexity scores
To: moses-support@mit.edu
Message-ID:
<CAN3_CDiO3H3BQ-1GKQ2kz0aVoyxzWwUWXDWrzGYUPBNq_6tX-A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I am doing some preliminary experiments with different LM models including
SRILM, RandLM, KenLM. The problem I've noticed is the massive disagreement
between what SRILM and KenLM report on perplexity. I noticed that the
backoff technique used are different (GoodTuring,Katz V.S. Kneser-Ney) but
I suspect that should have so much contribution between the reported scores.

Here what I do, on both, and what I get:

On SRI:
./ngram-count -order 5 -text test.txt -write text.ngrams
./ngram-count -order 5 -read text.ngrams -lm text.arpa

and then I query via:
./ngram ?lm text.arpa ?ppl query.txt

On KenLM:
bin/lmplz -o 5 <text.txt >text.arpa

binarized with:
bin/build_binary text.arpa text.binary

and then I query via:
bin/query text.arpa <query.txt

The perplexity reported by KenLM is 8.98 and on the same dataset by SRI is
73.7443.
This is the same dataset and I suspect if backoff will have such an effect.

K.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141008/71298c04/attachment-0001.htm

------------------------------

Message: 4
Date: Wed, 8 Oct 2014 08:40:45 +0000 (UTC)
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] perplexity scores
To: moses-support@mit.edu
Message-ID: <loom.20141008T103759-455@post.gmane.org>
Content-Type: text/plain; charset=utf-8

koormoosh <koormoosh@...> writes:
> and then I query via:./ngram ?lm text.arpa ?ppl query.txt

Hi Kormoosh,

not sure if that's the only problem, but ngram does not automatically use
the order of the ARPA file, but defaults to 3.

./ngram -order 5 ?lm text.arpa ?ppl query.txt

should get you closer to the right results.

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 96, Issue 10
*********************************************

Moses-support Digest, Vol 96, Issue 10

0 Response to "Moses-support Digest, Vol 96, Issue 10"

Post a Comment