Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Moses vocabulary code (Lane Schwartz)
2. Re: Moses vocabulary code (Lane Schwartz)
3. Segmentation Fault during Tuning (Alex Martinez)
----------------------------------------------------------------------
Message: 1
Date: Sat, 10 Oct 2015 11:35:33 -0500
From: Lane Schwartz <dowobeha@gmail.com>
Subject: Re: [Moses-support] Moses vocabulary code
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CABv3vZmyVQBxmYWxMzNKFnL-sd460GTnu5VEqT5n8UdZphY0EA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Wouldn't factor->GetId() be the unique integer ID of the string?
On Fri, Oct 9, 2015 at 5:54 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
> const Factor* is the vocab id. It's guaranteed to be unique for each
> unique string. You can map directly to the string using
> factor->GetString()
>
>
>
> On 09/10/2015 22:55, Lane Schwartz wrote:
>
> Thanks, Marcin.
>
> So when the various components of Moses pass words back and forth, what do
> they send each other? std::string? StringPiece?
>
> On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt <junczys@amu.edu.pl
> > wrote:
>
>> For instance in my phrase table that would be
>>
>> mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h
>>
>> StringVector<unsigned char, unsigned, std::allocator>
>> m_sourceSymbols;
>> StringVector<unsigned char, unsigned, std::allocator> m_targetSymbols;
>>
>> That's a memory-mapped vector of strings.
>>
>> W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:
>>
>> Seriously? That sounds inefficient.
>>
>> I've found code in KenLM that maps from strings to integers, but not the
>> other way around.
>>
>> Marcin, do you know, for example, where any Moses code is for doing the
>> mapping for any data structure?
>>
>>
>> On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt <
>> <junczys@amu.edu.pl>junczys@amu.edu.pl> wrote:
>>
>>> Hi,
>>> This would only be a simple thing if there was a common framework for
>>> that, but there isn't. Each datastructure implements its own vocabularies
>>> and look-up tables. There is no common set of integers.
>>> Best,
>>> Marcin
>>>
>>> W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
>>>
>>> Hey,
>>>
>>> I know this should be a simple thing to find, but what code in Moses is
>>> responsible for mapping back and forth between strings and integers?
>>>
>>> Thanks,
>>> Lane
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>>
>> --
>> When a place gets crowded enough to require ID's, social collapse is not
>> far away. It is time to go elsewhere. The best thing about space travel
>> is that it made it possible to go elsewhere.
>> -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away. It is time to go elsewhere. The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"
>
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> --
> Hieu Hoanghttp://www.hoang.co.uk/hieu
>
>
--
When a place gets crowded enough to require ID's, social collapse is not
far away. It is time to go elsewhere. The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151010/90c679c5/attachment-0001.html
------------------------------
Message: 2
Date: Sat, 10 Oct 2015 11:37:53 -0500
From: Lane Schwartz <dowobeha@gmail.com>
Subject: Re: [Moses-support] Moses vocabulary code
To: Kenneth Heafield <moses@kheafield.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CABv3vZkZgVzJP6iCVJtViK-RvyaX6BK5xc61XfNF7g=GHvnr8Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
BTW, what's the rationale for using StringPiece instead of std::string? I
thought the main reason for using StringPiece was for implicit conversion
from char *
On Fri, Oct 9, 2015 at 5:15 PM, Kenneth Heafield <moses@kheafield.com>
wrote:
> The Moses common vocabulary is moses/FactorCollection.h. Common
> practice in core Moses code is to pass around a const Factor * (which
> can be resolved to a StringPiece or a consecutive ID).
>
> If a feature/phrase table has its own ids because e.g. it's baked into
> the binary file, then there's a std::vector to map from Moses ID to
> feature function ID. See moses/LM/Ken.h:99 for an example.
>
> std::string (or even StringPiece) conversion at decode time is a bug. A
> sadly common one.
>
> On 10/09/2015 10:22 PM, Lane Schwartz wrote:
> > Seriously? That sounds inefficient.
> >
> > I've found code in KenLM that maps from strings to integers, but not the
> > other way around.
> >
> > Marcin, do you know, for example, where any Moses code is for doing the
> > mapping for any data structure?
> >
> >
> > On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt
> > <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>> wrote:
> >
> > Hi,
> > This would only be a simple thing if there was a common framework
> > for that, but there isn't. Each datastructure implements its own
> > vocabularies and look-up tables. There is no common set of integers.
> > Best,
> > Marcin
> >
> > W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
> >> Hey,
> >>
> >> I know this should be a simple thing to find, but what code in
> >> Moses is responsible for mapping back and forth between strings
> >> and integers?
> >>
> >> Thanks,
> >> Lane
> >>
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > --
> > When a place gets crowded enough to require ID's, social collapse is not
> > far away. It is time to go elsewhere. The best thing about space travel
> > is that it made it possible to go elsewhere.
> > -- R.A. Heinlein, "Time Enough For Love"
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
--
When a place gets crowded enough to require ID's, social collapse is not
far away. It is time to go elsewhere. The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151010/447b5fdd/attachment-0001.html
------------------------------
Message: 3
Date: Sat, 10 Oct 2015 16:52:13 +0000 (GMT)
From: Alex Martinez <cmxela@me.com>
Subject: [Moses-support] Segmentation Fault during Tuning
To: moses-support@mit.edu
Message-ID: <d95af1ae-71cf-4ee7-8ad8-fc7cd193b535@me.com>
Content-Type: text/plain; charset="utf-8"
Hello,
I'm trying to build a factored system using EMS based on this example from the tutorial:
---------------------------------------------------------------------
% train-model.perl \
? ? --corpus factored-corpus/proj-syndicate.1000 \
? ? --root-dir morphgen-backoff \
? ? --f de --e en \
? ? --lm 0:3:factored-corpus/surface.lm:0 \
? ? --lm 2:3:factored-corpus/pos.lm:0 \
? ? --translation-factors 1-1+3-2+0-0,2 \
? ? --generation-factors 1-2+1,2-0 \
? ? --decoding-steps t0,g0,t1,g1:t2 \
? ? --external-bin-dir .../tools
----------------------------------------------------------------------
I'm getting a segmentation fault during tuning and I have the feeling that the problem is related to the line defining the decoding-steps.
What I have on my EMS config file to get a similar model is:
--------------------------------------------------------------------
### factored training: specify here which factors used
# if none specified, single factor training is assumed
# (one translation step, surface to surface)
#
input-factors = word lemma pos
output-factors = word lemma pos
alignment-factors = "word+lemma -> word+lemma"
translation-factors = "lemma -> lemma, pos -> pos, word -> word + pos"
reordering-factors = "word -> word"
generation-factors = "lemma -> pos, lemma+pos -> word"
decoding-steps = "t0,g0,t1,g1:t2"
generation-type = single
prune-generation = "$moses-bin-dir/pruneGeneration 100"
-------------------------------------------------------------------------
The training fails in the tuning step and I'm getting this in the TUNING_tune.1.STDERR:
Executing: /opt/moses/bin/moses -threads all -v 0 ? -config /mnt/a62/devel/en_es/processfin/model/moses.bin.ini.1 -weight-overwrite 'WordPenalty0= -0.128205 TranslationModel0= 0.025641 0.025641 0.025641 0.025641 LM2= 0.064103 LM0= 0.064103 GenerationModel1= 0.038462 0.000000 TranslationModel2= 0.025641 0.025641 0.025641 0.025641 GenerationModel0= 0.038462 PhrasePenalty0= 0.025641 Distortion0= 0.038462 TranslationModel1= 0.025641 0.025641 0.025641 0.025641 LexicalReordering0= 0.038462 0.038462 0.038462 0.038462 0.038462 0.038462 LM1= 0.064103' ?-n-best-list run1.best100.out 100 distinct ?-input-file /mnt/a62/devel/en_es/data/corpora.tuning.en > run1.out
Segmentation fault (core dumped)
Exit code: 139
The decoder died. CONFIG WAS -weight-overwrite 'WordPenalty0= -0.128205 TranslationModel0= 0.025641 0.025641 0.025641 0.025641 LM2= 0.064103 LM0= 0.064103 GenerationModel1= 0.038462 0.000000 TranslationModel2= 0.025641 0.025641 0.025641 0.025641 GenerationModel0= 0.038462 PhrasePenalty0= 0.025641 Distortion0= 0.038462 TranslationModel1= 0.025641 0.025641 0.025641 0.025641 LexicalReordering0= 0.038462 0.038462 0.038462 0.038462 0.038462 0.038462 LM1= 0.064103'?
cp: cannot stat ?/mnt/a62/devel/en_es/processfin/tuning/tmp.1/moses.ini?: No such file or directory
-------------------------------------------
If I change this line in the config file from
decoding-steps = "t0,g0,t1,g1:t2"
?to
decoding-steps = "t0,g0,t1,g1"
then the training ends without errors.?
I'll appreciate suggestions on how to solve that.
Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151010/2b1c2c32/attachment.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 108, Issue 36
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 108, Issue 36"
Post a Comment