Moses-support Digest, Vol 107, Issue 41

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Oldest version of boost to work with --with-mm
(Marcin Junczys-Dowmunt)
2. Re: analysis.perl / mteval-v13a.pl / BLEU-annotation
(Philipp Koehn)
3. Help on pipeline .... (Vincent Nguyen)


----------------------------------------------------------------------

Message: 1
Date: Wed, 16 Sep 2015 15:11:28 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: [Moses-support] Oldest version of boost to work with
--with-mm
To: moses-support <moses-support@mit.edu>
Message-ID: <55F97890.2020605@amu.edu.pl>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi,
what's the currently oldest version of boost for moses with the
--with-mm option? It seems boost 1.54 is not supported any more although
that is still standard for the current Ubuntu LTS? It works with a
by-hand installation of 1.59, I haven't tried any in-betweeners.
Best,
Marcin


------------------------------

Message: 2
Date: Wed, 16 Sep 2015 10:36:37 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] analysis.perl / mteval-v13a.pl /
BLEU-annotation
To: Vincent Nguyen <vnguyen@neuf.fr>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAAFADDBuHHA+-hMc5+wGMqkJJPJp-QZDbg9VtMFPJ9KRvihTew@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

a difference between the BLUE score reported in the analysis
and the NIST BLEU score is that the former uses the tokenization
as used in the Moses pipeline, and the NIST tool does its own
tokenization from the detokenized output. This leads to different
scores, even if they are mostly minor.

About the line numbering - yes, this may be annoying, but it was
designed by a computer scientist who famously start counting
with 0.

-phi

On Mon, Sep 14, 2015 at 6:13 AM, Vincent Nguyen <vnguyen@neuf.fr> wrote:

> Guys,
>
> While running EMS with a big test file I realized that the analysis.perl
> was executed very quickly while the actual Nist-Bleu was much much longer.
>
> Also one thing is that the file "BLEU-Annotation" generated during
> analysis does not contain the right line numbering.
> it takes 0 as the first line thus, all line number are offset by 1.
>
> Last, when you "average" the BLEU score from all these lines, it is not
> the actual Nist BLEU score reported, slightly different.
>
> Is it computed differently ?
>
> Thanks,
>
> Vincent
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150916/66352971/attachment-0001.html

------------------------------

Message: 3
Date: Wed, 16 Sep 2015 17:30:48 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: [Moses-support] Help on pipeline ....
To: moses-support <moses-support@mit.edu>
Message-ID: <55F98B28.8020605@neuf.fr>
Content-Type: text/plain; charset="utf-8"


I am struggling with a pipeline .....

Here is the text1.txt file I would like to translate from FR to EN
<g id="1">Les banques de la zone euro sont soumises :</g>
<g id="1">au ratio de capital li? ? la d?tention d?actifs risqu?s (nous
nous int?ressons ici au cr?dit) ;</g>
<g id="1">au ratio de levier, qui d?termine le capital r?glementaire ?
partir de la taille du bilan de la banque ;</g>
<g id="1">au ratio de liquidit?, qui impose aux banques de d?tenir en
particulier des portefeuilles importants de titres publics.</g>

I am running the following properly :

/home/moses/mosesdecoder/scripts/tokenizer/normalize-punctuation.perl fr
< text1.txt > text2.txt
/home/moses/matecat/matecat_util/code/tokenizer/deescape-special-chars.perl
< text2.txt > text3.txt
/home/moses/matecat/matecat_util/code/tokenizer/tokenizer.perl -X -a -l
fr < text3.txt > text4.txt
/home/moses/mosesdecoder/scripts/recaser/truecase.perl --model
/home/moses/working/truecaser/truecase-model.1.fr < text4.txt > text5.txt
/home/moses/mosesdecoder/bin/moses -f
/home/moses/working/tuning/moses.tuned.ini.1 < text5.txt > text6.txt

then in my text6.txt I have

<g id="1"> banks in the euro zone are subject :</g>
<g id="1"> ratio of capital linked to the detention of risky assets ( we
are here to credit ;</g> )
<g id="1"> the leverage ratio , which determines the regulatory capital
from the size of the balance sheet of the bank ;</g>
<g id="1"> ratio of liquidity , which requires banks to hold especially
important portfolios of securities .</g> public

but then neither the detokenizer nor the detruecaser will give me the
correct output.
"banks" will not get the uppercase B


I also tried to look at this
https://github.com/christianbuck/matecat_util/tree/master/python_server
or this
https://github.com/christianbuck/matecat_util/tree/master/code/tags4moses

but no luck.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150916/ee3b40cb/attachment-0001.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 107, Issue 41
**********************************************

Related Posts :

0 Response to "Moses-support Digest, Vol 107, Issue 41"

Post a Comment