Moses-support Digest, Vol 100, Issue 97

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Target-syntax (Rico Sennrich)
2. Re: kbmira segfault (Matt Post)


----------------------------------------------------------------------

Message: 1
Date: Fri, 27 Feb 2015 17:17:58 +0000 (UTC)
From: Rico Sennrich <rico.sennrich@gmx.ch>
Subject: Re: [Moses-support] Target-syntax
To: moses-support@mit.edu
Message-ID: <loom.20150227T180822-580@post.gmane.org>
Content-Type: text/plain; charset=us-ascii

Massinissa Ahmim <massinissa.ahmim@...> writes:

>
>
> Dear all,
> I'm trying to train a syntactic model english to german. I did the
> annotation on the target part using bitpar and ran :nohup
/mosesdecoder/scripts/training/train-model.perl --glue-grammar
--max-phrase-length 10 --extract-options="--MaxSpan 15"
--score-options="--GoodTuring" -root-dir /home/Massi/KIID_ENDE/syntax/
-corpus /home/Massi/KIID_DEEN/kiid.10 -f en -e de -lm
0:5:/home/Massi/KIID_ENDE/LM/atelier/lm.kiid5.blm.de.mm -hierarchical
-target-syntax /home/Massi/KIID_DEEN/tagged.kiid.10.de -external-bin-dir
/root/external-bin-dir/ -mgiza -mgiza-cpus 30 >& training.out &the training
went very well but outputs empty ruletable, I double-checked my paths but
everything seems to be okay,Any ideas? Many thanks Massinissa

Hi Massinissa,

I cannot tell from here if your files are in the right format. As to your
training parameters, what springs to mind is that your extract options are
only suited for hierarchical models, not for syntactic ones. You can use the
option '-ghkm' to use the GHKM extractor, which has more sensible defaults
for string-to-tree systems. Alternatively, you should consider changing the
following extract-options:

--NonTermConsecSource (to allow consecutive non-terminal symbols on the
source side of a rule)
--MinHoleSource 1 (to allow nonterminals that only span 1 word)
--MinWords 0 (to allow non-lexical rules)
--MaxNonTerm SIZE (to allow SIZE nonterminals per rule (default 2))

The full list is here:
http://www.statmt.org/moses/?n=Moses.SyntaxTutorial#ntoc14

best wishes,
Rico



------------------------------

Message: 2
Date: Fri, 27 Feb 2015 12:19:57 -0500
From: Matt Post <post@cs.jhu.edu>
Subject: Re: [Moses-support] kbmira segfault
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Cc: moses-support@mit.edu
Message-ID: <5A094267-7FA6-4E10-BA2C-37BD3F9FC7F4@cs.jhu.edu>
Content-Type: text/plain; charset="windows-1252"

Hi Barry ? Thanks for the response. I don't think that's it, because I use the exact same approach for lots of other tuning runs. Isn't it the header line of the features file that lists dense features? I've been using this format, where dense features are listed in each header line, and then sparse features in the individual lines:

FEATURES_TXT_BEGIN_0 0 300 9 lm_0 lm_1 tm_pt_1 tm_pt_3 tm_pt_0 tm_pt_2 WordPenalty PhrasePenalty Distortion
-82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8
-82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8 OOVPenalty=-100

This works in lots of places (although, it also raises a separate question, of whether kbmira actually distinguishes between sparse and dense features? I seem to remember Colin once saying that there is a single group weight between the two groups, but I've never been able to find this in the code).

matt


> On Feb 26, 2015, at 5:35 PM, Barry Haddow <bhaddow@staffmail.ed.ac.uk> wrote:
>
> Hi Matt
>
> When mert-moses.pl runs kbmira, it always supplies a list of the dense features (and their initial values) using the --dense-init parameter. I think this is your problem. I've attached a typical file used for this feature list.
>
> Of course, kbmira should have a sensible message rather than a segfault. This is probably my doing,
>
> cheers - Barry
>
> On 26/02/15 22:18, Matt Post wrote:
>> kbmira segfaults on the following command:
>>
>>
>> kbmira run --ffile run1.features.dat --scfile run1.scores.dat -o mert.out
>>
>> Where run1.features.dat (30 MB) and run1.scores.dat (14 MB) can be downloaded here:
>>
>>
>> https://www.dropbox.com/s/yim7ub1bmq5jv2g/run1.features.dat?dl=0 <https://www.dropbox.com/s/yim7ub1bmq5jv2g/run1.features.dat?dl=0>
>>
>> https://www.dropbox.com/s/kkek36o7aflgzuu/run1.scores.dat?dl=0 <https://www.dropbox.com/s/kkek36o7aflgzuu/run1.scores.dat?dl=0>
>>
>> I tracked it down to this line of mert/FeatureStats.cpp.
>>
>> std::string SparseVector::decode(std::size_t id)
>> {
>> return m_id_to_name[id];
>> }
>>
>> Any obvious ideas before I go down this rabbit hole? I verified there are no blank lines or anything else funny with the formatting, at least as far as I can tell (all dense features, plus one sparse feature, OOVPenalty=-100, showing up occasionally).
>>
>> matt
>>
>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support <http://mailman.mit.edu/mailman/listinfo/moses-support>
>
> <run1.dense>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150227/14fb78b4/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 100, Issue 97
**********************************************

0 Response to "Moses-support Digest, Vol 100, Issue 97"

Post a Comment