Moses-support Digest, Vol 103, Issue 33

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: -decoding-graph-backoff (Hieu Hoang)
2. numerical precision in srilm, etc (koormoosh)
3. TUNING crashes when using placeables (Carla Parra)


----------------------------------------------------------------------

Message: 1
Date: Tue, 12 May 2015 10:27:49 +0400
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] -decoding-graph-backoff
To: Jeremy Gwinnup <jeremy@gwinnup.org>, moses-support
<moses-support@mit.edu>, "moses-developers@mit.edu"
<moses-developers@mit.edu>
Message-ID:
<CAEKMkbi-QR8ZOg6uVNJkci2ETYhH=NOTUxzqO6wcamH2kHOyjw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

i don't think it work properly even in phrase-based. It only works when the
entry in the main table covers 1 word. See attached simple model, run the
following and look in the nbest file
./moses -f moses.ini -i in -n-best-list nbest 1000

I'm gonna change it so that it guarantees to use the main table if an entry
exist. However, is some cases this will lead to catastrophic results, eg if
the input is
a b c d
and the main table has
a b c
b c d
No translation is possible.

What do people think?

Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

On 12 May 2015 at 01:48, Jeremy Gwinnup <jeremy@gwinnup.org> wrote:

> Not sure if I want entries from the main rule table competing with the
> backoff rule table. For my use case say I have a rule table for Russian to
> English. I have a second rule table of Lemmatized Russian to English. I
> wish to use the lemmatized rule table as a backoff option if a translation
> isn?t found in the main rule table.
>
> I can do this with a phrase based system using:
>
> [decoding-graph-backoff]
> 0
> 1
>
> [decode-mapping]
> 0 T 0
> 1 T 1
>
> I get a (small) benefit from using this approach and want to apply it to
> chart decodes
>
>
>
> On May 11, 2015, at 3:32 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>
> I might be missing something but I've been looking at the definition here:
> http://www.statmt.org/moses/?n=Advanced.Models#ntoc5
> It seems to be kinda strange.
>
> As an example, lets say the agument is
> -decoding-graph-backoff 0 1
> and input is
> abcd
> If the phrase
> bc
> is found in the primary phrase-table then, ideally, rules for the
> following should NOT be used
> 1. b
> 2. c
> 3. abc
> 4. bcd
> 5. abcd
> However, from the definition, it seems like (1) and (2) will be used so
> rules from primary rule will still have to compete. Is this intended?
>
> Also, for syntax model, you may want a rule
> a X d
> So the stating maximum length doesn't really make sense.
>
> Should the -decoding-graph-backoff argument be changed to booleans to
> indicate which phrase-table is the primary, and which aren't
>
> Hieu Hoang
> Researcher
> New York University, Abu Dhabi
> http://www.hoang.co.uk/hieu
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150512/0f1fdab7/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: backoff.zip
Type: application/zip
Size: 4670 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150512/0f1fdab7/attachment-0001.zip

------------------------------

Message: 2
Date: Tue, 12 May 2015 18:23:54 +1000
From: koormoosh <koormoosh@gmail.com>
Subject: [Moses-support] numerical precision in srilm, etc
To: moses-support@mit.edu, Kenneth Heafield <moses@kheafield.com>
Message-ID:
<CAN3_CDgBzg-3DEevuv-zrJhM7q69yx3RS0Mx=MNqLv=X1Fb_ZA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,

I wonder why the numerical precision used in srilm is off, and how come
this has never turned into a real problem when using inside moses (for
translation)?

It can be shown in toy size dataset that for long test sentences (40 words
let's say) you may get a perplexity by srilm that is 1 points below what it
should actually be having it precisely computed. Assuming that perplexity
is a measure people use for comparing language models, then obviously
relying on what srilm produces for benchmarking purpose is unfair.

I suppose part of this precision problem is due to the fact that these
precomputed counts are being loaded from the arpa files into some hacky
suffix tree/array data structures which demand a lossy coding of
numbers-with-floating-points to keep the size of the data structure
reasonable. Is this the reason, or I am absolutely wrong?

regards,
-K
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150512/88f49cfd/attachment-0001.htm

------------------------------

Message: 3
Date: Tue, 12 May 2015 13:36:14 +0200
From: Carla Parra <carla.parra@hermestrans.com>
Subject: [Moses-support] TUNING crashes when using placeables
To: Moses Support <moses-support@mit.edu>
Message-ID: <fd6bc25822cb86a1813fe41a0f941f96@hermestrans.com>
Content-Type: text/plain; charset="utf-8"

Dear all,

I am running an experiment using placeables. We have considered several
types of placeables besides numbers, and therefore we are using our own
script to generate the encoding before tokenization.

One of the types of placeables we have are in fact tags, as it is a kind
of localisation project.

I added a special list of patterns (i.e. <ne translation="@tag@"
entity=".*">@tag@</ne>) to be ignored at tokenisation and also scaped
special characters to avoid converting the placeables' syntax to
something else.


First I encountered problems at word alignment, because
"prepare-fast-align.perl" also removes markup and thus the experiment
was crashing.

"# remove markup
foreach my $line (\$source,\$target) {
$$line =~ s/\<[^\>]+\>/ /g;
$$line =~ s/\s+/ /g;
$$line =~ s/^ //;
$$line =~ s/ $//;
}"

After commenting out this loop, alignment worked. However, now I
encounter problems at tuning. I fear it is due to the tags I encoded
within placeables, but I am unsure. Could anyone confirm this and maybe
tell me what strategy would be best to deal with this kind of
experiments?

I attach my config.file and the TUNING_tune.1.STDERR file. I hope they
illustrate my problem.

Thank you very much,

Carla


--
Carla Parra Escart?n
Marie Curie Experienced Researcher - EXPERT ITN
http://expert-itn.eu/
Hermes Traducciones
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.placeables
Type: text/x-c
Size: 20008 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20150512/580d7d30/attachment.bin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: TUNING_tune.1.STDERR
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150512/580d7d30/attachment.bat

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 103, Issue 33
**********************************************

0 Response to "Moses-support Digest, Vol 103, Issue 33"

Post a Comment