Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. BLEU score (Tomasz Gawryl)
2. Re: BLEU score (Marcin Junczys-Dowmunt)
3. Re: BLEU score (Barry Haddow)
4. Incremental / combination theory question (Vincent Nguyen)
----------------------------------------------------------------------
Message: 1
Date: Mon, 7 Sep 2015 09:15:39 +0200
From: "Tomasz Gawryl" <tomasz.gawryl@skrivanek.pl>
Subject: [Moses-support] BLEU score
To: <moses-support@mit.edu>
Message-ID: <005201d0e93c$ffeaec80$ffc0c580$@gawryl@skrivanek.pl>
Content-Type: text/plain; charset="us-ascii"
Hi All!
This is my first post here and AT first I want to apologize for my English
but I would like to ask you some questions. I finished a full phrase based
Moses training of EN-PL (English - Polish) corpus (few million sentences
from free sources + half million sentences from commercial tmx). Training
pipeline always ends with test translation and BLEU score. I didn't expect
the first score around 30% but my result 4.5% surprised me. Why my result is
so bad? Is it a consequence of chosen language pair? Polish language is very
flexible - we can interchange words in a sentence without losing sense. What
should I do to improve this result? Or maybe that's all I can get ;).
Regards,
Tomek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150907/a9edc1b6/attachment-0001.html
------------------------------
Message: 2
Date: Mon, 7 Sep 2015 10:09:52 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] BLEU score
To: moses-support@mit.edu
Message-ID: <55ED4650.4050708@amu.edu.pl>
Content-Type: text/plain; charset="windows-1252"
Hi Tomek,
4.5% definitely indicate that there was an error in your pipeline (or
test data?). However, there are so many places where things could go
wrong, that based on the little information you have us I could not even
start guessing. Check if your line numbers match, that you use tokenized
text, etc.
Best,
Marcin
W dniu 07.09.2015 o 09:15, Tomasz Gawryl pisze:
>
> Hi All!
>
> This is my first post here and AT first I want to apologize for my
> English but I would like to ask you some questions. I finished a full
> phrase based Moses training of EN-PL (English - Polish) corpus (few
> million sentences from free sources + half million sentences from
> commercial tmx). Training pipeline always ends with test translation
> and BLEU score. I didn?t expect the first score around 30% but my
> result 4.5% surprised me. Why my result is so bad? Is it a consequence
> of chosen language pair? Polish language is very flexible ? we can
> interchange words in a sentence without losing sense. What should I do
> to improve this result? Or maybe that?s all I can get ;).
>
> Regards,
>
> Tomek
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150907/aaab3f86/attachment-0001.html
------------------------------
Message: 3
Date: Mon, 07 Sep 2015 09:10:16 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] BLEU score
To: Tomasz Gawryl <tomasz.gawryl@skrivanek.pl>, moses-support@mit.edu
Message-ID: <55ED4668.4060107@staffmail.ed.ac.uk>
Content-Type: text/plain; charset="windows-1252"
Hi Tomek
Yes, that's quite a low score. Have a look at the translation output, do
the sentences have lots of English words in them, are they very long,
very short, or scrambled in some other way?
The commonest problem is that something went wrong in corpus
preparation, for example the corpora weren't correctly aligned, some
parts got swapped around accidentally, they were not consistently
tokenised or truecased, etc.
Did you run tuning? , and if so double-check that you passed the correct
files (input and reference) to tuning,
cheers - Barry
On 07/09/15 08:15, Tomasz Gawryl wrote:
>
> Hi All!
>
> This is my first post here and AT first I want to apologize for my
> English but I would like to ask you some questions. I finished a full
> phrase based Moses training of EN-PL (English - Polish) corpus (few
> million sentences from free sources + half million sentences from
> commercial tmx). Training pipeline always ends with test translation
> and BLEU score. I didn?t expect the first score around 30% but my
> result 4.5% surprised me. Why my result is so bad? Is it a consequence
> of chosen language pair? Polish language is very flexible ? we can
> interchange words in a sentence without losing sense. What should I do
> to improve this result? Or maybe that?s all I can get ;).
>
> Regards,
>
> Tomek
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150907/58aaa0b3/attachment-0001.html
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150907/58aaa0b3/attachment-0001.pl
------------------------------
Message: 4
Date: Mon, 7 Sep 2015 10:25:56 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: [Moses-support] Incremental / combination theory question
To: moses-support <moses-support@mit.edu>
Message-ID: <55ED4A14.1080309@neuf.fr>
Content-Type: text/plain; charset=utf-8; format=flowed
Hi experts,
I have a question about the phrase table theory.
If we take a corpus A to create a TM model TMA and a LM model LMA.
if we consider a corpus B.
Method 1 :
We add corpus B to A => corpus AB => TM-AB and LM-AB
Method 2:
We process corpus B => TMB and LMB
then we combine TMA + TMB and LMA + LMB [whether linear interpolation,
fill-up or backoff]
Everything in phrase table off course.
Should we expect the same results with AB or A+B ? close ?
if not why ?
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 107, Issue 19
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 107, Issue 19"
Post a Comment