Moses-support Digest, Vol 107, Issue 57

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: is there a way to remove a bad entry in the phrase table
? (Matthias Huck)
2. Re: is there a way to remove a bad entry in the phrase table
? (Vincent Nguyen)


----------------------------------------------------------------------

Message: 1
Date: Thu, 24 Sep 2015 14:21:35 +0100
From: Matthias Huck <mhuck@inf.ed.ac.uk>
Subject: Re: [Moses-support] is there a way to remove a bad entry in
the phrase table ?
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <1443100895.2298.13.camel@inf.ed.ac.uk>
Content-Type: text/plain; charset="UTF-8"

Hi Vincent,

Pruning the phrase table will discard many bad entries.

The decoder is typically configured to load no more than a maximum
number of translation options per distinct source side. Use
table-limit=20 as a parameter to your translation model feature to limit
the amount of candidates to the top 20.

Alternatively you can pre-prune the phrase table. The following page
provides instructions:
http://www.statmt.org/moses/?n=Advanced.RuleTables

In case you want to remove just a handful of individual entries, I
recommend grep -v on the Linux command line.

Cheers,
Matthias


On Thu, 2015-09-24 at 11:05 +0100, Hieu Hoang wrote:
> i've just added a new feature function that allows you to give a list
> of rules that you don't want to be used:
> &quot; 1 ||| One Million Roofs
>
> oui ||| no
>
> To use this list, add the following to your moses.ini file
>
> [feature]
> DeleteRules path=/path/to/list
>
> Not tested.
>
>
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
>
> On 24 September 2015 at 10:11, Vincent Nguyen <vnguyen@neuf.fr> wrote:
>
> well at times it does, the sequence:
> &quot; 1 &quot;
> became
> One Million Roofs
> completely off ....
>
>
> &quot; 1 &quot; . ||| one . ||| 4.77044e-05 2.56689e-08
> 0.103519 0.0135382 ||| 1-0 3-1 ||| 2170 1 1 ||| |||
> &quot; 1 &quot; une ||| &quot; 1 &quot; meaning ||| 0.0517593
> 0.00140486 0.103519 5.98457e-06 ||| 0-0 1-1 0-2 2-2 2-3 ||| 2
> 1 1 ||| |||
> &quot; 1 &quot; ||| &quot; 1 &quot; meaning ||| 0.0517593
> 0.121628 0.0517593 5.98457e-06 ||| 0-0 1-1 0-2 2-2 2-3 ||| 2 2
> 1 ||| |||
> &quot; 1 &quot; ||| one ||| 1.34779e-06 2.65512e-08 0.0517593
> 0.0141179 ||| 1-0 ||| 76806 2 1 ||| |||
> &quot; 1 + ||| &apos; one @-@ on ||| 0.0517593 8.76241e-09
> 0.0345062 2.43009e-07 ||| 0-0 2-0 1-1 ||| 2 3 1 ||| |||
> &quot; 1 + ||| &apos; one @-@ ||| 0.0129398 8.76241e-09
> 0.0345062 1.65217e-05 ||| 0-0 2-0 1-1 ||| 8 3 1 ||| |||
> &quot; 1 + ||| &apos; one ||| 0.000685554 8.76241e-09
> 0.0345062 0.00189493 ||| 0-0 2-0 1-1 ||| 151 3 1 ||| |||
> &quot; 1 . ||| &apos;1 . ||| 0.103519 0.241693 0.0345062
> 5.37965e-05 ||| 0-0 1-0 2-1 ||| 1 3 1 ||| |||
> &quot; 1 . ||| &quot; 1 . ||| 0.508332 0.34958 0.338888
> 0.180103 ||| 0-0 1-1 2-2 ||| 2 3 2 ||| |||
> &quot; 1 billion de dollars ||| $ 1 trillion of ||| 0.0207037
> 2.46862e-05 0.103519 0.0679424 ||| 4-0 1-1 2-2 3-3 ||| 5 1 1
> ||| |||
> &quot; 1 billion de ||| 1 trillion of ||| 0.0345062
> 5.93019e-05 0.103519 0.161697 ||| 1-0 2-1 3-2 ||| 3 1 1 |||
> |||
> &quot; 1 billion ||| 1 trillion ||| 0.00108967 0.000131965
> 0.103519 0.536768 ||| 1-0 2-1 ||| 95 1 1 ||| |||
> &quot; 1 milliard $ , ||| $ 1 billion ||| 0.00199074
> 2.23776e-06 0.103519 0.420148 ||| 3-0 1-1 2-2 ||| 52 1 1 |||
> |||
> &quot; 1 milliard $ ||| $ 1 billion ||| 0.00199074 3.32223e-05
> 0.103519 0.420148 ||| 3-0 1-1 2-2 ||| 52 1 1 ||| |||
> &quot; 1 milliard d&apos; euros ||| EUR 1 billion |||
> 0.00026749 3.23583e-05 0.103519 0.179568 ||| 4-0 1-1 2-2 3-2
> ||| 387 1 1 ||| |||
> &quot; 1 milliard d&apos; ||| 1 billion ||| 0.000137475
> 6.11551e-05 0.103519 0.25129 ||| 1-0 2-1 3-1 ||| 753 1 1 |||
> |||
> &quot; 1 milliard de dollars ||| $ 1 billion ||| 0.0195512
> 2.47433e-05 0.508332 0.105231 ||| 0-0 4-0 1-1 2-2 ||| 52 2 2
> ||| |||
> &quot; 1 milliard de personnes ||| one billion people |||
> 0.00252484 9.77577e-09 0.103519 0.00258395 ||| 2-0 1-1 2-1 4-2
> ||| 41 1 1 ||| |||
> &quot; 1 milliard de ||| 1 billion of ||| 0.00941078
> 0.000159942 0.0517593 0.15086 ||| 1-0 2-1 3-2 ||| 11 2 1 |||
> |||
> &quot; 1 milliard de ||| one billion ||| 0.000509944
> 4.32371e-08 0.0517593 0.00492989 ||| 2-0 1-1 2-1 ||| 203 2 1
> ||| |||
> &quot; 1 milliard ||| 1 billion ||| 0.0026678 0.000355919
> 0.502213 0.500792 ||| 1-0 2-1 ||| 753 4 3 ||| |||
> &quot; 1 milliard ||| one billion ||| 0.000509944 3.43309e-07
> 0.0258796 0.00492989 ||| 2-0 1-1 2-1 ||| 203 4 1 ||| |||
> &quot; 1 million $ ||| $ 1 million ||| 0.0172531 1.31973e-05
> 0.103519 0.221619 ||| 0-0 3-0 1-1 2-2 ||| 6 1 1 ||| |||
> &quot; 1 million de toits ||| one million solar roofs |||
> 0.0517593 5.86831e-10 0.103519 1.43348e-10 ||| 2-0 1-1 4-3 |||
> 2 1 1 ||| |||
> &quot; 1 million de ||| one million solar ||| 0.0258796
> 9.85876e-10 0.0517593 3.44036e-10 ||| 2-0 1-1 ||| 4 2 1 |||
> |||
> &quot; 1 million de ||| one million ||| 0.00021344 9.85876e-10
> 0.0517593 0.000202374 ||| 2-0 1-1 ||| 485 2 1 ||| |||
> &quot; 1 million ||| one million solar ||| 0.0258796
> 7.82802e-09 0.0517593 3.44036e-10 ||| 2-0 1-1 ||| 4 2 1 |||
> |||
> &quot; 1 million ||| one million ||| 0.00021344 7.82802e-09
> 0.0517593 0.000202374 ||| 2-0 1-1 ||| 485 2 1 ||| |||
> &quot; 1 ou 2 % ||| one or two percent ||| 0.0258796
> 6.85867e-09 0.103519 1.36871e-06 ||| 1-0 2-1 3-2 4-3 ||| 4 1 1
> ||| |||
> &quot; 1 ou 2 ||| one or two ||| 0.000164315 2.30435e-08
> 0.103519 0.00032742 ||| 1-0 2-1 3-2 ||| 630 1 1 ||| |||
> &quot; 1 ou ||| one or ||| 8.83264e-05 3.76903e-06 0.103519
> 0.0112293 ||| 1-0 2-1 ||| 1172 1 1 ||| |||
> &quot; 1 seul coup , ||| &apos; 1 shot , ||| 0.103519
> 1.88862e-06 0.103519 0.00165224 ||| 0-0 1-1 3-2 4-3 ||| 1 1 1
> ||| |||
> &quot; 1 seul coup ||| &apos; 1 shot ||| 0.103519 2.45247e-06
> 0.103519 0.00222575 ||| 0-0 1-1 3-2 ||| 1 1 1 ||| |||
> &quot; 1 seul ||| &apos; 1 ||| 0.0129398 2.78897e-05 0.103519
> 0.214656 ||| 0-0 1-1 ||| 8 1 1 ||| |||
> &quot; 1 ||| &apos; 1 ||| 0.127083 0.278063 0.0391025 0.214656
> ||| 0-0 1-1 ||| 8 26 2 ||| |||
> &quot; 1 ||| &apos;1 ||| 0.103519 0.25 0.00398148 5.61e-05 |||
> 0-0 1-0 ||| 1 26 1 ||| |||
> &quot; 1 ||| &quot; 1 ||| 0.503492 0.361595 0.11619 0.187815
> ||| 0-0 1-1 ||| 6 26 4 ||| |||
> &quot; 1 ||| 1 ||| 0.0010136 0.00278649 0.461538 0.805151 |||
> 1-0 ||| 11839 26 12 ||| |||
> &quot; 1 ||| One Million Roofs ||| 0.103519 0.00213892
> 0.00398148 3.32314e-15 ||| 0-0 1-0 0-1 0-2 ||| 1 26 1 ||| |||
> &quot; 1 ||| hardly 1 ||| 0.0258796 0.00278649 0.00398148
> 1.73108e-05 ||| 1-1 ||| 4 26 1 ||| |||
> &quot; 1 ||| million solar ||| 0.0345062 3.55949e-06
> 0.00398148 3.29783e-09 ||| 1-0 ||| 3 26 1 ||| |||
> &quot; 1 ||| million ||| 5.83433e-06 3.55949e-06 0.00398148
> 0.0019399 ||| 1-0 ||| 17743 26 1 ||| |||
> &quot; 1 ||| of 1 ||| 0.000263406 0.00278649 0.00398148
> 0.0270917 ||| 1-1 ||| 393 26 1 ||| |||
> &quot; 1 ||| one ||| 1.32368e-05 5.22671e-06 0.0391025
> 0.0141179 ||| 1-0 ||| 76806 26 2 ||| |||
> &quot; 1,1 % ||| 1.1 % ||| 0.0022504 0.00241746 0.103519
> 0.875731 ||| 1-0 2-1 ||| 46 1 1 ||| |||
> &quot; 1,1 milliard d&apos; euros ||| EUR 1.1 billion |||
> 0.00544835 6.98053e-05 0.0517593 0.110019 ||| 3-0 4-0 1-1 2-1
> 2-2 ||| 19 2 1 ||| |||
> &quot; 1,1 milliard d&apos; euros ||| by EUR 1.1 billion |||
> 0.0345062 6.98053e-05 0.0517593 0.000791519 ||| 3-1 4-1 1-2
> 2-2 2-3 ||| 3 2 1 ||| |||
>
>
>
> Le 24/09/2015 09:54, Felipe S?nchez Mart?nez a ?crit :
>
> > Hi,
> >
> > This is quite common. If you look at the scores, they are
> > pretty low when they do not make sense, so, even though they
> > are in the phrase table, most probably they will never be
> > used for translation. I would not bother.
> >
> > Cheers
> > --
> > Felipe
> >
> > El 23/09/15 a las 16:50, Vincent Nguyen escribi?:
> > > I agree and would like to.
> > > But this is tricky, look at the first 30 lines of my
> > > phrase table below.
> > >
> > > and this happens a lot in the first line of tables where
> > > there are &apos
> > > or weird codes, EN/FR pairs do not match.
> > >
> > >
> > >
> > >
> > > ! ! ! ! ||| ! ! ! ! ||| 0.103413 0.132185 0.103413
> > > 0.401758 ||| 0-0 1-1
> > > 2-2 3-3 ||| 1 1 1 ||| |||
> > > ! ! ! ) ||| ! ! ! ) ||| 0.339323 0.167884 0.508985 0.4246
> > > ||| 0-0 1-0
> > > 2-0 2-1 2-2 3-3 ||| 3 2 2 ||| |||
> > > ! ! ! ||| ! ! ! ||| 0.501834 0.219223 0.716905 0.50463 |||
> > > 0-0 1-1 2-2
> > > ||| 10 7 6 ||| |||
> > > ! ! ! ||| budget ! ! ! ||| 0.0517067 0.219223 0.0147733
> > > 4.50635e-05 |||
> > > 0-1 1-2 2-3 ||| 2 7 1 ||| |||
> > > ! ! ) , ||| ! ! ) - , ||| 0.103413 0.111989 0.103413
> > > 0.00192967 ||| 0-0
> > > 1-1 2-2 3-3 3-4 ||| 1 1 1 ||| |||
> > > ! ! ) ||| ! ! ) ||| 0.103413 0.278429 0.103413 0.533321
> > > ||| 0-0 1-1 2-2
> > > ||| 1 1 1 ||| |||
> > > ! ! ||| ! ! ||| 0.625 0.363573 0.769231 0.633844 ||| 0-0
> > > 1-1 ||| 16 13
> > > 10 ||| |||
> > > ! ! ||| . ||| 4.65922e-08 6.71089e-07 0.00795487 0.140779
> > > ||| 0-0 1-0
> > > ||| 2.21954e+06 13 1 ||| |||
> > > ! ! ||| budget ! ! ||| 0.0517067 0.363573 0.00795487
> > > 5.66022e-05 ||| 0-1
> > > 1-2 ||| 2 13 1 ||| |||
> > > ! ! ||| n?cessaire ! ! ||| 0.103413 0.363573 0.00795487
> > > 0.000130572 |||
> > > 0-1 1-2 ||| 1 13 1 ||| |||
> > > ! &#91; never again ! ||| ! ||| 6.51628e-06 5.42074e-13
> > > 0.103413
> > > 0.796143 ||| 0-0 4-0 ||| 15870 1 1 ||| |||
> > > ! &#93; this is ||| tel est ||| 7.38667e-05 9.16191e-11
> > > 0.103413
> > > 0.00147917 ||| 2-0 3-1 ||| 1400 1 1 ||| |||
> > > ! &#93; this ||| tel ||| 1.09594e-05 1.44188e-10 0.103413
> > > 0.0035893 |||
> > > 2-0 ||| 9436 1 1 ||| |||
> > > ! &#93; ||| ! &#93; ||| 0.103413 0.352335 0.103413
> > > 0.472387 ||| 0-0 1-1
> > > ||| 1 1 1 ||| |||
> > > ! &amp; quot ; ||| ! &quot; . et ||| 0.0517067 2.36396e-12
> > > 0.0517067
> > > 1.88268e-05 ||| 0-0 1-1 2-1 3-3 ||| 2 2 1 ||| |||
> > > ! &amp; quot ; ||| ! &quot; ||| 0.000222394 1.44515e-11
> > > 0.0517067
> > > 0.518419 ||| 0-0 2-1 ||| 465 2 1 ||| |||
> > > ! &amp; quot ||| ! &quot; . ||| 0.000662906 8.30626e-09
> > > 0.0344711
> > > 0.00232791 ||| 0-0 1-1 2-1 ||| 156 3 1 ||| |||
> > > ! &amp; quot ||| ! &quot; ||| 0.00218918 8.30626e-09
> > > 0.339323 0.518419
> > > ||| 0-0 2-1 ||| 465 3 2 ||| |||
> > > ! &amp; ||| ! ||| 6.51628e-06 7.21755e-05 0.103413
> > > 0.796143 ||| 0-0 |||
> > > 15870 1 1 ||| |||
> > > ! &apos; &#93; , addressed ||| ! &quot; adress? |||
> > > 0.103413 3.70838e-07
> > > 0.103413 0.00596848 ||| 0-0 1-1 2-1 4-2 ||| 1 1 1 ||| |||
> > > ! &apos; &#93; , ||| ! &quot; ||| 0.000222394 2.49698e-06
> > > 0.103413
> > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| |||
> > > ! &apos; &#93; ||| ! &quot; ||| 0.000222394 3.57128e-05
> > > 0.103413
> > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| |||
> > > ! &apos; &apos; Alstom shares ||| l&apos; on constate un
> > > dysfonctionnement ||| 0.0344711 5.62605e-16 0.103413
> > > 1.03361e-14 ||| 1-0
> > > 2-0 1-1 3-4 4-4 ||| 3 1 1 ||| |||
> > > ! &apos; &apos; ||| l&apos; on constate un ||| 0.0147733
> > > 1.56906e-11
> > > 0.0129267 2.2766e-12 ||| 1-0 2-0 1-1 ||| 7 8 1 ||| |||
> > > ! &apos; &apos; ||| l&apos; on constate ||| 0.000984889
> > > 1.56906e-11
> > > 0.0129267 2.36929e-10 ||| 1-0 2-0 1-1 ||| 105 8 1 ||| |||
> > > ! &apos; &apos; ||| l&apos; on ||| 6.76656e-06 1.56906e-11
> > > 0.0129267
> > > 6.18613e-06 ||| 1-0 2-0 1-1 ||| 15283 8 1 ||| |||
> > > ! &apos; &apos; ||| ou que l&apos; on constate |||
> > > 0.0344711 1.56906e-11
> > > 0.0129267 4.69534e-15 ||| 1-2 2-2 1-3 ||| 3 8 1 ||| |||
> > > ! &apos; &apos; ||| ou que l&apos; on ||| 0.00304157
> > > 1.56906e-11
> > > 0.0129267 1.22594e-10 ||| 1-2 2-2 1-3 ||| 34 8 1 ||| |||
> > > ! &apos; &apos; ||| que l&apos; on constate un |||
> > > 0.0344711 1.56906e-11
> > > 0.0129267 4.56092e-14 ||| 1-1 2-1 1-2 ||| 3 8 1 ||| |||
> > > ! &apos; &apos; ||| que l&apos; on constate ||| 0.00323167
> > > 1.56906e-11
> > > 0.0129267 4.74661e-12 ||| 1-1 2-1 1-2 ||| 32 8 1 ||| |||
> > >
> > >
> > >
> > > Le 23/09/2015 15:12, Tom Hoar a ?crit :
> > > > Vincent,
> > > >
> > > > If you suspect bad entries, isn't it better to address
> > > > the root of the
> > > > problem and prepare your training corpus better?
> > > >
> > > >
> > > > On 9/23/2015 6:46 PM, moses-support-request@mit.edu
> > > > wrote:
> > > > > Date: Tue, 22 Sep 2015 20:24:02 +0200
> > > > > From: Philipp Koehn<phi@jhu.edu>
> > > > > Subject: Re: [Moses-support] is there a way to remove
> > > > > a bad entry in
> > > > > the phrase table ?
> > > > > To: Vincent Nguyen<vnguyen@neuf.fr>
> > > > > Cc: moses-support<moses-support@mit.edu>
> > > > >
> > > > > Hi,
> > > > >
> > > > > you can remove it manually (just edit the text file),
> > > > > there will be no
> > > > > negative consequences.
> > > > >
> > > > > However, it is not a realistic strategy to try to
> > > > > remove by hand every
> > > > > offending phrase table entry.
> > > > >
> > > > > -phi
> > > > >
> > > > > On Tue, Sep 22, 2015 at 4:05 PM, Vincent
> > > > > Nguyen<vnguyen@neuf.fr> wrote:
> > > > >
> > > > > > >Hi,
> > > > > > >
> > > > > > >I was wondering if after an analysis of the
> > > > > > BLEU-Annotation file we
> > > > > > >realize that there must be a bad entry in the
> > > > > > phrase table,
> > > > > > >we could remove it manually or in some other
> > > > > > ways ?
> > > > > > >
> > > > > > >Gracias.
> > > > > > >V.
> > > > > > >_______________________________________________
> > > > > > >Moses-support mailing list
> > > > > > >Moses-support@mit.edu
> > > > > > >http://mailman.mit.edu/mailman/listinfo/moses-support
> > > > > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > > Tom Hoar
> > > > Chief Executive Officer
> > > > /*Precision Translation Tools Pte Ltd*/
> > > > Singapore/Thailand
> > > > Web: www.precisiontranslationtools.com
> > > > <http://www.precisiontranslationtools.com>
> > > > Thailand Mobile: +66 87 345-1875
> > > > Skype: tahoar
> > > >
> > > >
> > > > _______________________________________________
> > > > Moses-support mailing list
> > > > Moses-support@mit.edu
> > > > http://mailman.mit.edu/mailman/listinfo/moses-support
> > >
> > >
> > >
> > > _______________________________________________
> > > Moses-support mailing list
> > > Moses-support@mit.edu
> > > http://mailman.mit.edu/mailman/listinfo/moses-support
> > >
> >
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



------------------------------

Message: 2
Date: Thu, 24 Sep 2015 16:08:57 +0200
From: Vincent Nguyen <vnguyen@neuf.fr>
Subject: Re: [Moses-support] is there a way to remove a bad entry in
the phrase table ?
To: Matthias Huck <mhuck@inf.ed.ac.uk>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <560403F9.6060304@neuf.fr>
Content-Type: text/plain; charset=utf-8; format=flowed

Matthias,

Pruning :
I use the cube pop limit at 400 instead of default values (1000 or 5000)
I use the MinScore 0.001
I tried sigtest filtering once, it never worked.

table-limit=20
I have the feeling this is only for CreateOnDiskPt
am I wrong ?
does it work with ProcessPhrasetableMin ?




Le 24/09/2015 15:21, Matthias Huck a ?crit :
> Hi Vincent,
>
> Pruning the phrase table will discard many bad entries.
>
> The decoder is typically configured to load no more than a maximum
> number of translation options per distinct source side. Use
> table-limit=20 as a parameter to your translation model feature to limit
> the amount of candidates to the top 20.
>
> Alternatively you can pre-prune the phrase table. The following page
> provides instructions:
> http://www.statmt.org/moses/?n=Advanced.RuleTables
>
> In case you want to remove just a handful of individual entries, I
> recommend grep -v on the Linux command line.
>
> Cheers,
> Matthias
>
>
> On Thu, 2015-09-24 at 11:05 +0100, Hieu Hoang wrote:
>> i've just added a new feature function that allows you to give a list
>> of rules that you don't want to be used:
>> &quot; 1 ||| One Million Roofs
>>
>> oui ||| no
>>
>> To use this list, add the following to your moses.ini file
>>
>> [feature]
>> DeleteRules path=/path/to/list
>>
>> Not tested.
>>
>>
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>>
>> On 24 September 2015 at 10:11, Vincent Nguyen <vnguyen@neuf.fr> wrote:
>>
>> well at times it does, the sequence:
>> &quot; 1 &quot;
>> became
>> One Million Roofs
>> completely off ....
>>
>>
>> &quot; 1 &quot; . ||| one . ||| 4.77044e-05 2.56689e-08
>> 0.103519 0.0135382 ||| 1-0 3-1 ||| 2170 1 1 ||| |||
>> &quot; 1 &quot; une ||| &quot; 1 &quot; meaning ||| 0.0517593
>> 0.00140486 0.103519 5.98457e-06 ||| 0-0 1-1 0-2 2-2 2-3 ||| 2
>> 1 1 ||| |||
>> &quot; 1 &quot; ||| &quot; 1 &quot; meaning ||| 0.0517593
>> 0.121628 0.0517593 5.98457e-06 ||| 0-0 1-1 0-2 2-2 2-3 ||| 2 2
>> 1 ||| |||
>> &quot; 1 &quot; ||| one ||| 1.34779e-06 2.65512e-08 0.0517593
>> 0.0141179 ||| 1-0 ||| 76806 2 1 ||| |||
>> &quot; 1 + ||| &apos; one @-@ on ||| 0.0517593 8.76241e-09
>> 0.0345062 2.43009e-07 ||| 0-0 2-0 1-1 ||| 2 3 1 ||| |||
>> &quot; 1 + ||| &apos; one @-@ ||| 0.0129398 8.76241e-09
>> 0.0345062 1.65217e-05 ||| 0-0 2-0 1-1 ||| 8 3 1 ||| |||
>> &quot; 1 + ||| &apos; one ||| 0.000685554 8.76241e-09
>> 0.0345062 0.00189493 ||| 0-0 2-0 1-1 ||| 151 3 1 ||| |||
>> &quot; 1 . ||| &apos;1 . ||| 0.103519 0.241693 0.0345062
>> 5.37965e-05 ||| 0-0 1-0 2-1 ||| 1 3 1 ||| |||
>> &quot; 1 . ||| &quot; 1 . ||| 0.508332 0.34958 0.338888
>> 0.180103 ||| 0-0 1-1 2-2 ||| 2 3 2 ||| |||
>> &quot; 1 billion de dollars ||| $ 1 trillion of ||| 0.0207037
>> 2.46862e-05 0.103519 0.0679424 ||| 4-0 1-1 2-2 3-3 ||| 5 1 1
>> ||| |||
>> &quot; 1 billion de ||| 1 trillion of ||| 0.0345062
>> 5.93019e-05 0.103519 0.161697 ||| 1-0 2-1 3-2 ||| 3 1 1 |||
>> |||
>> &quot; 1 billion ||| 1 trillion ||| 0.00108967 0.000131965
>> 0.103519 0.536768 ||| 1-0 2-1 ||| 95 1 1 ||| |||
>> &quot; 1 milliard $ , ||| $ 1 billion ||| 0.00199074
>> 2.23776e-06 0.103519 0.420148 ||| 3-0 1-1 2-2 ||| 52 1 1 |||
>> |||
>> &quot; 1 milliard $ ||| $ 1 billion ||| 0.00199074 3.32223e-05
>> 0.103519 0.420148 ||| 3-0 1-1 2-2 ||| 52 1 1 ||| |||
>> &quot; 1 milliard d&apos; euros ||| EUR 1 billion |||
>> 0.00026749 3.23583e-05 0.103519 0.179568 ||| 4-0 1-1 2-2 3-2
>> ||| 387 1 1 ||| |||
>> &quot; 1 milliard d&apos; ||| 1 billion ||| 0.000137475
>> 6.11551e-05 0.103519 0.25129 ||| 1-0 2-1 3-1 ||| 753 1 1 |||
>> |||
>> &quot; 1 milliard de dollars ||| $ 1 billion ||| 0.0195512
>> 2.47433e-05 0.508332 0.105231 ||| 0-0 4-0 1-1 2-2 ||| 52 2 2
>> ||| |||
>> &quot; 1 milliard de personnes ||| one billion people |||
>> 0.00252484 9.77577e-09 0.103519 0.00258395 ||| 2-0 1-1 2-1 4-2
>> ||| 41 1 1 ||| |||
>> &quot; 1 milliard de ||| 1 billion of ||| 0.00941078
>> 0.000159942 0.0517593 0.15086 ||| 1-0 2-1 3-2 ||| 11 2 1 |||
>> |||
>> &quot; 1 milliard de ||| one billion ||| 0.000509944
>> 4.32371e-08 0.0517593 0.00492989 ||| 2-0 1-1 2-1 ||| 203 2 1
>> ||| |||
>> &quot; 1 milliard ||| 1 billion ||| 0.0026678 0.000355919
>> 0.502213 0.500792 ||| 1-0 2-1 ||| 753 4 3 ||| |||
>> &quot; 1 milliard ||| one billion ||| 0.000509944 3.43309e-07
>> 0.0258796 0.00492989 ||| 2-0 1-1 2-1 ||| 203 4 1 ||| |||
>> &quot; 1 million $ ||| $ 1 million ||| 0.0172531 1.31973e-05
>> 0.103519 0.221619 ||| 0-0 3-0 1-1 2-2 ||| 6 1 1 ||| |||
>> &quot; 1 million de toits ||| one million solar roofs |||
>> 0.0517593 5.86831e-10 0.103519 1.43348e-10 ||| 2-0 1-1 4-3 |||
>> 2 1 1 ||| |||
>> &quot; 1 million de ||| one million solar ||| 0.0258796
>> 9.85876e-10 0.0517593 3.44036e-10 ||| 2-0 1-1 ||| 4 2 1 |||
>> |||
>> &quot; 1 million de ||| one million ||| 0.00021344 9.85876e-10
>> 0.0517593 0.000202374 ||| 2-0 1-1 ||| 485 2 1 ||| |||
>> &quot; 1 million ||| one million solar ||| 0.0258796
>> 7.82802e-09 0.0517593 3.44036e-10 ||| 2-0 1-1 ||| 4 2 1 |||
>> |||
>> &quot; 1 million ||| one million ||| 0.00021344 7.82802e-09
>> 0.0517593 0.000202374 ||| 2-0 1-1 ||| 485 2 1 ||| |||
>> &quot; 1 ou 2 % ||| one or two percent ||| 0.0258796
>> 6.85867e-09 0.103519 1.36871e-06 ||| 1-0 2-1 3-2 4-3 ||| 4 1 1
>> ||| |||
>> &quot; 1 ou 2 ||| one or two ||| 0.000164315 2.30435e-08
>> 0.103519 0.00032742 ||| 1-0 2-1 3-2 ||| 630 1 1 ||| |||
>> &quot; 1 ou ||| one or ||| 8.83264e-05 3.76903e-06 0.103519
>> 0.0112293 ||| 1-0 2-1 ||| 1172 1 1 ||| |||
>> &quot; 1 seul coup , ||| &apos; 1 shot , ||| 0.103519
>> 1.88862e-06 0.103519 0.00165224 ||| 0-0 1-1 3-2 4-3 ||| 1 1 1
>> ||| |||
>> &quot; 1 seul coup ||| &apos; 1 shot ||| 0.103519 2.45247e-06
>> 0.103519 0.00222575 ||| 0-0 1-1 3-2 ||| 1 1 1 ||| |||
>> &quot; 1 seul ||| &apos; 1 ||| 0.0129398 2.78897e-05 0.103519
>> 0.214656 ||| 0-0 1-1 ||| 8 1 1 ||| |||
>> &quot; 1 ||| &apos; 1 ||| 0.127083 0.278063 0.0391025 0.214656
>> ||| 0-0 1-1 ||| 8 26 2 ||| |||
>> &quot; 1 ||| &apos;1 ||| 0.103519 0.25 0.00398148 5.61e-05 |||
>> 0-0 1-0 ||| 1 26 1 ||| |||
>> &quot; 1 ||| &quot; 1 ||| 0.503492 0.361595 0.11619 0.187815
>> ||| 0-0 1-1 ||| 6 26 4 ||| |||
>> &quot; 1 ||| 1 ||| 0.0010136 0.00278649 0.461538 0.805151 |||
>> 1-0 ||| 11839 26 12 ||| |||
>> &quot; 1 ||| One Million Roofs ||| 0.103519 0.00213892
>> 0.00398148 3.32314e-15 ||| 0-0 1-0 0-1 0-2 ||| 1 26 1 ||| |||
>> &quot; 1 ||| hardly 1 ||| 0.0258796 0.00278649 0.00398148
>> 1.73108e-05 ||| 1-1 ||| 4 26 1 ||| |||
>> &quot; 1 ||| million solar ||| 0.0345062 3.55949e-06
>> 0.00398148 3.29783e-09 ||| 1-0 ||| 3 26 1 ||| |||
>> &quot; 1 ||| million ||| 5.83433e-06 3.55949e-06 0.00398148
>> 0.0019399 ||| 1-0 ||| 17743 26 1 ||| |||
>> &quot; 1 ||| of 1 ||| 0.000263406 0.00278649 0.00398148
>> 0.0270917 ||| 1-1 ||| 393 26 1 ||| |||
>> &quot; 1 ||| one ||| 1.32368e-05 5.22671e-06 0.0391025
>> 0.0141179 ||| 1-0 ||| 76806 26 2 ||| |||
>> &quot; 1,1 % ||| 1.1 % ||| 0.0022504 0.00241746 0.103519
>> 0.875731 ||| 1-0 2-1 ||| 46 1 1 ||| |||
>> &quot; 1,1 milliard d&apos; euros ||| EUR 1.1 billion |||
>> 0.00544835 6.98053e-05 0.0517593 0.110019 ||| 3-0 4-0 1-1 2-1
>> 2-2 ||| 19 2 1 ||| |||
>> &quot; 1,1 milliard d&apos; euros ||| by EUR 1.1 billion |||
>> 0.0345062 6.98053e-05 0.0517593 0.000791519 ||| 3-1 4-1 1-2
>> 2-2 2-3 ||| 3 2 1 ||| |||
>>
>>
>>
>> Le 24/09/2015 09:54, Felipe S?nchez Mart?nez a ?crit :
>>
>> > Hi,
>> >
>> > This is quite common. If you look at the scores, they are
>> > pretty low when they do not make sense, so, even though they
>> > are in the phrase table, most probably they will never be
>> > used for translation. I would not bother.
>> >
>> > Cheers
>> > --
>> > Felipe
>> >
>> > El 23/09/15 a las 16:50, Vincent Nguyen escribi?:
>> > > I agree and would like to.
>> > > But this is tricky, look at the first 30 lines of my
>> > > phrase table below.
>> > >
>> > > and this happens a lot in the first line of tables where
>> > > there are &apos
>> > > or weird codes, EN/FR pairs do not match.
>> > >
>> > >
>> > >
>> > >
>> > > ! ! ! ! ||| ! ! ! ! ||| 0.103413 0.132185 0.103413
>> > > 0.401758 ||| 0-0 1-1
>> > > 2-2 3-3 ||| 1 1 1 ||| |||
>> > > ! ! ! ) ||| ! ! ! ) ||| 0.339323 0.167884 0.508985 0.4246
>> > > ||| 0-0 1-0
>> > > 2-0 2-1 2-2 3-3 ||| 3 2 2 ||| |||
>> > > ! ! ! ||| ! ! ! ||| 0.501834 0.219223 0.716905 0.50463 |||
>> > > 0-0 1-1 2-2
>> > > ||| 10 7 6 ||| |||
>> > > ! ! ! ||| budget ! ! ! ||| 0.0517067 0.219223 0.0147733
>> > > 4.50635e-05 |||
>> > > 0-1 1-2 2-3 ||| 2 7 1 ||| |||
>> > > ! ! ) , ||| ! ! ) - , ||| 0.103413 0.111989 0.103413
>> > > 0.00192967 ||| 0-0
>> > > 1-1 2-2 3-3 3-4 ||| 1 1 1 ||| |||
>> > > ! ! ) ||| ! ! ) ||| 0.103413 0.278429 0.103413 0.533321
>> > > ||| 0-0 1-1 2-2
>> > > ||| 1 1 1 ||| |||
>> > > ! ! ||| ! ! ||| 0.625 0.363573 0.769231 0.633844 ||| 0-0
>> > > 1-1 ||| 16 13
>> > > 10 ||| |||
>> > > ! ! ||| . ||| 4.65922e-08 6.71089e-07 0.00795487 0.140779
>> > > ||| 0-0 1-0
>> > > ||| 2.21954e+06 13 1 ||| |||
>> > > ! ! ||| budget ! ! ||| 0.0517067 0.363573 0.00795487
>> > > 5.66022e-05 ||| 0-1
>> > > 1-2 ||| 2 13 1 ||| |||
>> > > ! ! ||| n?cessaire ! ! ||| 0.103413 0.363573 0.00795487
>> > > 0.000130572 |||
>> > > 0-1 1-2 ||| 1 13 1 ||| |||
>> > > ! &#91; never again ! ||| ! ||| 6.51628e-06 5.42074e-13
>> > > 0.103413
>> > > 0.796143 ||| 0-0 4-0 ||| 15870 1 1 ||| |||
>> > > ! &#93; this is ||| tel est ||| 7.38667e-05 9.16191e-11
>> > > 0.103413
>> > > 0.00147917 ||| 2-0 3-1 ||| 1400 1 1 ||| |||
>> > > ! &#93; this ||| tel ||| 1.09594e-05 1.44188e-10 0.103413
>> > > 0.0035893 |||
>> > > 2-0 ||| 9436 1 1 ||| |||
>> > > ! &#93; ||| ! &#93; ||| 0.103413 0.352335 0.103413
>> > > 0.472387 ||| 0-0 1-1
>> > > ||| 1 1 1 ||| |||
>> > > ! &amp; quot ; ||| ! &quot; . et ||| 0.0517067 2.36396e-12
>> > > 0.0517067
>> > > 1.88268e-05 ||| 0-0 1-1 2-1 3-3 ||| 2 2 1 ||| |||
>> > > ! &amp; quot ; ||| ! &quot; ||| 0.000222394 1.44515e-11
>> > > 0.0517067
>> > > 0.518419 ||| 0-0 2-1 ||| 465 2 1 ||| |||
>> > > ! &amp; quot ||| ! &quot; . ||| 0.000662906 8.30626e-09
>> > > 0.0344711
>> > > 0.00232791 ||| 0-0 1-1 2-1 ||| 156 3 1 ||| |||
>> > > ! &amp; quot ||| ! &quot; ||| 0.00218918 8.30626e-09
>> > > 0.339323 0.518419
>> > > ||| 0-0 2-1 ||| 465 3 2 ||| |||
>> > > ! &amp; ||| ! ||| 6.51628e-06 7.21755e-05 0.103413
>> > > 0.796143 ||| 0-0 |||
>> > > 15870 1 1 ||| |||
>> > > ! &apos; &#93; , addressed ||| ! &quot; adress? |||
>> > > 0.103413 3.70838e-07
>> > > 0.103413 0.00596848 ||| 0-0 1-1 2-1 4-2 ||| 1 1 1 ||| |||
>> > > ! &apos; &#93; , ||| ! &quot; ||| 0.000222394 2.49698e-06
>> > > 0.103413
>> > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| |||
>> > > ! &apos; &#93; ||| ! &quot; ||| 0.000222394 3.57128e-05
>> > > 0.103413
>> > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| |||
>> > > ! &apos; &apos; Alstom shares ||| l&apos; on constate un
>> > > dysfonctionnement ||| 0.0344711 5.62605e-16 0.103413
>> > > 1.03361e-14 ||| 1-0
>> > > 2-0 1-1 3-4 4-4 ||| 3 1 1 ||| |||
>> > > ! &apos; &apos; ||| l&apos; on constate un ||| 0.0147733
>> > > 1.56906e-11
>> > > 0.0129267 2.2766e-12 ||| 1-0 2-0 1-1 ||| 7 8 1 ||| |||
>> > > ! &apos; &apos; ||| l&apos; on constate ||| 0.000984889
>> > > 1.56906e-11
>> > > 0.0129267 2.36929e-10 ||| 1-0 2-0 1-1 ||| 105 8 1 ||| |||
>> > > ! &apos; &apos; ||| l&apos; on ||| 6.76656e-06 1.56906e-11
>> > > 0.0129267
>> > > 6.18613e-06 ||| 1-0 2-0 1-1 ||| 15283 8 1 ||| |||
>> > > ! &apos; &apos; ||| ou que l&apos; on constate |||
>> > > 0.0344711 1.56906e-11
>> > > 0.0129267 4.69534e-15 ||| 1-2 2-2 1-3 ||| 3 8 1 ||| |||
>> > > ! &apos; &apos; ||| ou que l&apos; on ||| 0.00304157
>> > > 1.56906e-11
>> > > 0.0129267 1.22594e-10 ||| 1-2 2-2 1-3 ||| 34 8 1 ||| |||
>> > > ! &apos; &apos; ||| que l&apos; on constate un |||
>> > > 0.0344711 1.56906e-11
>> > > 0.0129267 4.56092e-14 ||| 1-1 2-1 1-2 ||| 3 8 1 ||| |||
>> > > ! &apos; &apos; ||| que l&apos; on constate ||| 0.00323167
>> > > 1.56906e-11
>> > > 0.0129267 4.74661e-12 ||| 1-1 2-1 1-2 ||| 32 8 1 ||| |||
>> > >
>> > >
>> > >
>> > > Le 23/09/2015 15:12, Tom Hoar a ?crit :
>> > > > Vincent,
>> > > >
>> > > > If you suspect bad entries, isn't it better to address
>> > > > the root of the
>> > > > problem and prepare your training corpus better?
>> > > >
>> > > >
>> > > > On 9/23/2015 6:46 PM, moses-support-request@mit.edu
>> > > > wrote:
>> > > > > Date: Tue, 22 Sep 2015 20:24:02 +0200
>> > > > > From: Philipp Koehn<phi@jhu.edu>
>> > > > > Subject: Re: [Moses-support] is there a way to remove
>> > > > > a bad entry in
>> > > > > the phrase table ?
>> > > > > To: Vincent Nguyen<vnguyen@neuf.fr>
>> > > > > Cc: moses-support<moses-support@mit.edu>
>> > > > >
>> > > > > Hi,
>> > > > >
>> > > > > you can remove it manually (just edit the text file),
>> > > > > there will be no
>> > > > > negative consequences.
>> > > > >
>> > > > > However, it is not a realistic strategy to try to
>> > > > > remove by hand every
>> > > > > offending phrase table entry.
>> > > > >
>> > > > > -phi
>> > > > >
>> > > > > On Tue, Sep 22, 2015 at 4:05 PM, Vincent
>> > > > > Nguyen<vnguyen@neuf.fr> wrote:
>> > > > >
>> > > > > > >Hi,
>> > > > > > >
>> > > > > > >I was wondering if after an analysis of the
>> > > > > > BLEU-Annotation file we
>> > > > > > >realize that there must be a bad entry in the
>> > > > > > phrase table,
>> > > > > > >we could remove it manually or in some other
>> > > > > > ways ?
>> > > > > > >
>> > > > > > >Gracias.
>> > > > > > >V.
>> > > > > > >_______________________________________________
>> > > > > > >Moses-support mailing list
>> > > > > > >Moses-support@mit.edu
>> > > > > > >http://mailman.mit.edu/mailman/listinfo/moses-support
>> > > > > > >
>> > > >
>> > > > --
>> > > > Best regards,
>> > > >
>> > > > Tom Hoar
>> > > > Chief Executive Officer
>> > > > /*Precision Translation Tools Pte Ltd*/
>> > > > Singapore/Thailand
>> > > > Web: www.precisiontranslationtools.com
>> > > > <http://www.precisiontranslationtools.com>
>> > > > Thailand Mobile: +66 87 345-1875
>> > > > Skype: tahoar
>> > > >
>> > > >
>> > > > _______________________________________________
>> > > > Moses-support mailing list
>> > > > Moses-support@mit.edu
>> > > > http://mailman.mit.edu/mailman/listinfo/moses-support
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > Moses-support mailing list
>> > > Moses-support@mit.edu
>> > > http://mailman.mit.edu/mailman/listinfo/moses-support
>> > >
>> >
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 107, Issue 57
**********************************************

0 Response to "Moses-support Digest, Vol 107, Issue 57"

Post a Comment