Moses-support Digest, Vol 94, Issue 6

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Fwd: Question on Phrase Extraction implementation (liling tan)
2. mgiza++ force alignment: segmentation fault when reloading a
big N table (Eleftherios Avramidis)
3. Re: Fwd: Question on Phrase Extraction implementation (Hieu Hoang)

----------------------------------------------------------------------

Message: 1
Date: Sun, 3 Aug 2014 23:02:56 +0200
From: liling tan <alvations@gmail.com>
Subject: [Moses-support] Fwd: Question on Phrase Extraction
implementation
To: moses-support@mit.edu
Message-ID:
<CAKzPaJJX4T8b+YKOe0neGM5jKBzc0wSN136emBp3cGb8h9F48Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Moses community,

I have reimplemented the phrasal extraction algorithm as presented on the
page 133 of Philip Koehn's SMT book for NLTK in
https://github.com/alvations/nltk/blob/develop/nltk/align/phrase_based.py

However, there is some bug that i can't figure out why am I not achieving
the desired output as shown on the alignment table, see
http://stackoverflow.com/questions/25109001/phrase-extraction-algorithm-for-statistical-machine-translation
for more detail

*Does anyone find what went wrong with my implementation?*

*Are there other python based implementation of the same algorithm?*

*Where in the Moses toolkit is can the phrasal extraction function be
found? What is the input of that function?*

Regards,
Liling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140803/83736c37/attachment-0001.htm

------------------------------

Message: 2
Date: Mon, 04 Aug 2014 00:34:43 +0200
From: Eleftherios Avramidis <Eleftherios.Avramidis@dfki.de>
Subject: [Moses-support] mgiza++ force alignment: segmentation fault
when reloading a big N table
To: moses-support@mit.edu
Message-ID: <53DEB903.9070602@dfki.de>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

I am trying to produce word alignment for individual sentences. For this
purpose I am using the "force align" functionality of mgiza++
Unfortunately when I am loading a big N table (fertility), mgiza crashes
with a segmentation fault.

In particular, I have initially run mgiza on the full training parallel
corpus using the default settings of the Moses script:

/project/qtleap/software/moses-2.1.1/bin/training-tools/mgiza -CoocurrenceFile /local/tmp/elav01/selection-mechanism/systems/de-en/training/giza.1/en-de.cooc -c /local/tmp/elav01/selection-mechanism/systems/de-en/training/prepared.1/en-de-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 24 -nodumps 0 -nsmooth 4 -o /local/tmp/elav01/selection-mechanism/systems/de-en/training/giza.1/en-de -onlyaldumps 0 -p0 0.999 -s /local/tmp/elav01/selection-mechanism/systems/de-en/training/prepared.1/de.vcb -t /local/tmp/elav01/selection-mechanism/systems/de-en/training/prepared.1/en.vcb

Afterwards, by executing the mgiza force-align script, I run the
following command

/project/qtleap/software/moses-2.1.1/mgizapp-code/mgizapp//bin/mgiza giza.en-de/en-de.gizacfg -c /local/tmp/elav01/selection-mechanism/systems/de-en/falign/qtmp_SOVBrE/prepared./en-de.snt -o /local/tmp/elav01/selection-mechanism/systems/de-en/falign/qtmp_SOVBrE/giza./en-de -s /local/tmp/elav01/selection-mechanism/systems/de-en/falign/qtmp_SOVBrE/prepared./de.vcb -t /local/tmp/elav01/selection-mechanism/systems/de-en/falign/qtmp_SOVBrE/prepared./en.vcb -m1 0 -m2 0 -mh 0 -coocurrence /local/tmp/elav01/selection-mechanism/systems/de-en/falign/qtmp_SOVBrE/giza./en-de.cooc -restart 11 -previoust giza.en-de/en-de.t3.final -previousa giza.en-de/en-de.a3.final -previousd giza.en-de/en-de.d3.final -previousn giza.en-de/en-de.n3.final -previousd4 giza.en-de/en-de.d4.final -previousd42 giza.en-de/en-de.D4.final -m3 0 -m4 1

This runs fine, until I get the following error:

We are going to load previous N model from giza.en-de/en-de.n3.final

Reading fertility table from giza.en-de/en-de.n3.final

Segmentation fault (core dumped)

The n-table that is failing has about 300k entries. For this reason, I
thought I should try to see if the size is a problem. So I concatenated
the table to 60k entries. And it works! But the alignments are not good.

I am struggling to fix this, so any help would be appreciated. I am
running a freshly installed mgiza, on Ubuntu 12.04

cheers,
Lefteris

--
MSc. Inf. Eleftherios Avramidis
DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
Tel. +49-30 238 95-1806

Fax. +49-30 238 95-1810

-------------------------------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140804/d6bd21db/attachment-0001.htm

------------------------------

Message: 3
Date: Mon, 4 Aug 2014 10:38:33 +0100
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Fwd: Question on Phrase Extraction
implementation
To: liling tan <alvations@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbibAX+ECPrx0B7DR2XcPEMTgJ4m_NjOU3Z+dYJsKGW8mQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

hi liling

On 3 August 2014 22:02, liling tan <alvations@gmail.com> wrote:

>
> Dear Moses community,
>
> I have reimplemented the phrasal extraction algorithm as presented on the
> page 133 of Philip Koehn's SMT book for NLTK in
> https://github.com/alvations/nltk/blob/develop/nltk/align/phrase_based.py
>
> However, there is some bug that i can't figure out why am I not achieving
> the desired output as shown on the alignment table, see
> http://stackoverflow.com/questions/25109001/phrase-extraction-algorithm-for-statistical-machine-translation
> for more detail
>
>
> *Does anyone find what went wrong with my implementation?*
>
> *Are there other python based implementation of the same algorithm?*
>
i don't know of a python implementation. There is a java implementaton by
Jonathon Weese called Thrax.
http://cs.jhu.edu/~jonny/thrax/

>
> *Where in the Moses toolkit is can the phrasal extraction function be
> found? What is the input of that function?*
>
phrase-extract/extract-main.cpp
void ExtractTask::extract(SentenceAlignment &sentence) line 350 - 447
You may want to base your code on my cleaned up implementation

https://github.com/hieuhoang/mosesdecoder/tree/hieu/contrib/other-builds/extract-mixed-syntax

> Regards,
> Liling
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140804/1148a0e4/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 94, Issue 6
********************************************

Moses-support Digest, Vol 94, Issue 6

0 Response to "Moses-support Digest, Vol 94, Issue 6"

Post a Comment