Moses-support Digest, Vol 97, Issue 93

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Format of binarized phrase tables (Marcin Junczys-Dowmunt)
2. WG: Unknown single words that are part of phrases
(Vera Aleksic, Linguatec GmbH)
3. Re: Format of binarized phrase tables (Raj Dabre)


----------------------------------------------------------------------

Message: 1
Date: Thu, 27 Nov 2014 14:05:13 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Format of binarized phrase tables
To: Raj Dabre <prajdabre@gmail.com>
Cc: moses-support@mit.edu
Message-ID: <3d6fb89c32993a35bf5f74cf2f6e6345@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"



Hi Raj,

Unfortunately there's no interface to the reordering model. If you need
help with the C++ part let me know. My experience with JNI however is
confined to what you are seeing. And what's worse, I did that two years
ago and by today I hardly understand what is going on there. I must have
been way smarter back then than I am now (I even managed to convince
bjam to run ant and create java classes!).

W dniu 2014-11-27 13:43, Raj Dabre napisa?(a):

> Marcin,
>
> I just finished installing everything and the code works like a charm. I did have to modify the LD_LIBRARY_PATH since libcmph.so.0 did not get linked to libJniQueryPt.so in the lib folder. But thats a small thing. I will study the code in detail and try to make it work for querying the reordering models also (Unless that is also taken care of???). If you have anything new you wanted to add to this then please let me know.
> Some experience in JNI would help a Java programmer like me play with the internals of moses.
>
> Thanks again.
>
> On Thu, Nov 27, 2014 at 8:03 PM, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:
>
> The code in the wipo branch is not segfaulting, it's just an old moses version. I can put my recent attempts to make it run with current master into a separate branch (that's segfaulting). It's best if you start there when you try to fix it. I will let you know once I pushed it. Probably this evening.
>
> W dniu 2014-11-27 11:57, Raj Dabre napisa?(a):
>
> Hello Marcin,
>
> You just saved me a lot of time since I was planning to write this code from scratch. Many thanks for that. I will try to fix the reasons for the segfaults.
>
> Many thanks again!
>
> Regards.
>
> On Thu, Nov 27, 2014 at 7:43 PM, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:
>
> Hi,
>
> I tried to port it to the newest Moses version, but I still get segfaults. This used to work with Moses 1.* before the feature function format was changed. However, since the binary format of the compact phrase table has not changed since then, just use the old interface.
>
> Checkout the "wipo" branch (not wipoNew!) and compile with
>
> ./bjam --with-cmph=/usr/include -j8 --with-java=/usr/lib/jvm/java-7-openjdk-amd64
>
> You see you need to specifiy the location of your Java includes, specifically the directories jni.h resides in. This should build the stuff in misc/jni/* including java classes and stuff. You need ant for that. Then you can run the query tool like this:
>
> echo "test" | ./bin/JniQueryPt_example.sh /some_path/phrase-table.0-0.minphr 4
>
> You need to specify the path and the number of scores in the phrase table. Look inside JniQueryPt_example.sh how to call the jar and inside misc/jni/java/example.java how to call the code directly from Java. I apologize for the code, I am not a Java programmer, so this may be crude.
>
> Best,
>
> Marcin
>
> W dniu 2014-11-26 12:00, Raj Dabre napisa?(a):
>
> Hello Marcin,
>
> Yes please. It would save me lots of time. Thanks.
>
> Regards.
>
> On Wed, Nov 26, 2014 at 6:50 PM, Marcin Junczys-Dowmunt <junczys@amu.edu.pl> wrote:
>
> Hi,
>
> I have a JNI interface to my compact phrase table somewhere, I guess I can put that in contrib within a day or two if there is interest.
>
> best,
>
> Marcin
>
> W dniu 2014-11-26 10:45, Barry Haddow napisa?(a):
>
> Hi Raj
>
> The format of these tables is not described anywhere. You'd have to read
> the code in moses/TranslationModel/PhraseDictionaryTree.cpp, and then
> try to convert it it Java.
>
> A better plan would be to use JNI to call the C++ code -- a similar
> approach has been followed in the python interface in contrib/python.
> This would insulate you from the low-level details, and from changes in
> the format,
>
> cheers - Barry
>
> On 26/11/14 03:22, Raj Dabre wrote:
> Hello All, I know that Moses allows for binarization of a phrase table which can be read on demand at decoding time. We get 5 files named: phrase-table.binphr.* I want to write my own routine in Java to read phrase pairs from these on demand. Can anyone guide me ? PS: If an explanation of the same for binary reordering tables can be done then it would be great too. Thanks in advance. -- Raj Dabre. Research Student, Graduate School of Informatics, Kyoto University. CSE MTech, IITB., 2011-2014 _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support [1]

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support [1]

--

Raj Dabre.
Research Student, Graduate School of Informatics,
Kyoto University.

CSE MTech, IITB., 2011-2014

--

Raj Dabre.
Research Student, Graduate School of Informatics,
Kyoto University.

CSE MTech, IITB., 2011-2014

--

Raj Dabre.
Research Student, Graduate School of Informatics,
Kyoto University.

CSE MTech, IITB., 2011-2014



Links:
------
[1] http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141127/e905fb02/attachment-0001.htm

------------------------------

Message: 2
Date: Thu, 27 Nov 2014 14:45:29 +0100
From: "Vera Aleksic, Linguatec GmbH" <v.aleksic@linguatec.de>
Subject: [Moses-support] WG: Unknown single words that are part of
phrases
To: "moses-support (moses-support@mit.edu)" <moses-support@mit.edu>
Message-ID:
<0813A2298B079D43B7932F9C0D5D483460E00A9350@server-mail.muc.linguatec>
Content-Type: text/plain; charset="utf-8"

Hi,

I have one more question:
In the lex.e2f file there is a translation Gitarre->guitar:

Gitarre guitar 0.4000000
Gitarre using 0.0000284
Gitarre ; 0.0000017

Why has not it became part of the phrase table?

Thanks again!
Vera

-----Urspr?ngliche Nachricht-----
Von: Vera Aleksic, Linguatec GmbH
Gesendet: Donnerstag, 27. November 2014 09:42
An: 'Matthias Huck'; Raj Dabre
Betreff: AW: [Moses-support] Unknown single words that are part of phrases

Hi,
Thank you for your answers.
@Raj, one-word-translations do not exist, I have searched for them. If the grow-diag method probably causes such phenomena, are there any better alternatives?
@Matthias, you are right, the pair Gitarre-guitar is always unaligned, but I do not really understand why. Why is "guitar" in the example below aligned to "Musikinstrument Gittare", and not to "Gitarre" only? I assume, decomposing "Musik + Instrument" would help? How else could I improve the word alignment quality?
Thanks!
Best,
Vera

f?r ein Musikinstrument wie eine elektrische Gitarre , NULL ({ }) for ({ 1 }) a ({ 2 }) musical ({ }) instrument ({ }) , ({ }) such ({ }) as ({ 4 }) an ({ 5 }) electric ({ 6 }) guitar ({ 3 7 }) ; ({ 8 })

-----Urspr?ngliche Nachricht-----
Von: Matthias Huck [mailto:mhuck@inf.ed.ac.uk]
Gesendet: Mittwoch, 26. November 2014 17:54
An: Raj Dabre
Cc: Vera Aleksic, Linguatec GmbH; moses-support
Betreff: Re: [Moses-support] Unknown single words that are part of phrases

Hi,

Supposedly your phrase table does not contain an entry "Gitarre ||| guitar" because this word pair is always unaligned in your training data. You could try to improve your word alignment quality.

Alternatively, you could implement a procedure in the manner of the "forced single word heuristic" as described in:
D. Stein, D. Vilar, S. Peitz, M. Freitag, M. Huck, and H. Ney. A Guide to Jane, an Open Source Hierarchical Translation Toolkit. The Prague Bulletin of Mathematical Linguistics, number 95, pages 5-18, Prague, Czech Republic, April 2011.
http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
(see Fig. 1c).

But the latter would rather be a workaround.

Cheers,
Matthias


On Thu, 2014-11-27 at 01:18 +0900, Raj Dabre wrote:
> Hello,
>
>
> If I am not wrong this is most likely due to the grow (-diag) method applied to the word aligned data (both directions) before phrase extraction.
>
> Furthermore..... one word translations should exist (but not always).... search for them.
>
>
>
> Regards.
>
>
> On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH <v.aleksic@linguatec.de> wrote:
> Hi,
>
> I have observed many times that some words do not exist as single word translations in the phrase table, although they exist in the training corpus and in multiword phrases.
> An example:
> German-English translation for "Gitarre" is unknown, i.e. there is no single word entry for "Gitarre" in the phrase table, although some other phrases containing this word exist (see below).
> How is it possible?
> Thanks and best regards,
> Vera
>
>
> Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| ||| 1 1
> Gitarre darstellt , unter Beanspruchung ||| guitar using ||| 0.25 2.7351e-11 1 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05 1 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1 0.0625119 2.718 ||| ||| 4 1
> Kopfplatte einer Gitarre darstellt , ||| head of a guitar using ||| 0.5 4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
> Kopfplatte einer Gitarre darstellt ||| head of a guitar using ||| 0.5 2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
> eine elektrische Gitarre , ||| an electric guitar ; ||| 1 0.00107982 1 0.00163632 2.718 ||| ||| 1 1
> einer Gitarre darstellt , unter ||| of a guitar using ||| 0.333333 6.4754e-07 1 0.00471281 2.718 ||| ||| 3 1
> einer Gitarre darstellt , ||| of a guitar using ||| 0.333333 0.000309961 1 0.00471281 2.718 ||| ||| 3 1
> einer Gitarre darstellt ||| of a guitar using ||| 0.333333 0.00217827 1 0.00471281 2.718 ||| ||| 3 1
> elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1 0.0142097 2.718 ||| ||| 1 1
> wie eine elektrische Gitarre , ||| as an electric guitar ; |||
> 1 0.000177339 1 0.000809485 2.718 ||| ||| 1 1
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> Raj Dabre.
> Research Student,
>
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



--
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.




------------------------------

Message: 3
Date: Thu, 27 Nov 2014 22:49:20 +0900
From: Raj Dabre <prajdabre@gmail.com>
Subject: Re: [Moses-support] Format of binarized phrase tables
To: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAB3gfjB5hfa8f6X7sbJuuT4_sGz2cP8JOObD00_P0bBdVb1ZZA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Well... in that case... let me deal with the reordering part.
I will update you if I manage that.
Thanks once again.

On Thu, Nov 27, 2014 at 10:05 PM, Marcin Junczys-Dowmunt <junczys@amu.edu.pl
> wrote:

> Hi Raj,
>
> Unfortunately there's no interface to the reordering model. If you need
> help with the C++ part let me know. My experience with JNI however is
> confined to what you are seeing. And what's worse, I did that two years ago
> and by today I hardly understand what is going on there. I must have been
> way smarter back then than I am now (I even managed to convince bjam to run
> ant and create java classes!).
>
> W dniu 2014-11-27 13:43, Raj Dabre napisa?(a):
>
> Marcin,
>
> I just finished installing everything and the code works like a charm. I
> did have to modify the LD_LIBRARY_PATH since libcmph.so.0 did not get
> linked to libJniQueryPt.so in the lib folder. But thats a small thing.
> I will study the code in detail and try to make it work for querying the
> reordering models also (Unless that is also taken care of???).
> If you have anything new you wanted to add to this then please let me know.
> Some experience in JNI would help a Java programmer like me play with the
> internals of moses.
>
> Thanks again.
>
> On Thu, Nov 27, 2014 at 8:03 PM, Marcin Junczys-Dowmunt <
> junczys@amu.edu.pl> wrote:
>
>> The code in the wipo branch is not segfaulting, it's just an old moses
>> version. I can put my recent attempts to make it run with current master
>> into a separate branch (that's segfaulting). It's best if you start there
>> when you try to fix it. I will let you know once I pushed it. Probably this
>> evening.
>>
>> W dniu 2014-11-27 11:57, Raj Dabre napisa?(a):
>>
>> Hello Marcin,
>>
>> You just saved me a lot of time since I was planning to write this code
>> from scratch. Many thanks for that.
>> I will try to fix the reasons for the segfaults.
>>
>> Many thanks again!
>>
>> Regards.
>>
>> On Thu, Nov 27, 2014 at 7:43 PM, Marcin Junczys-Dowmunt <
>> junczys@amu.edu.pl> wrote:
>>
>>> Hi,
>>>
>>> I tried to port it to the newest Moses version, but I still get
>>> segfaults. This used to work with Moses 1.* before the feature function
>>> format was changed. However, since the binary format of the compact phrase
>>> table has not changed since then, just use the old interface.
>>>
>>> Checkout the "wipo" branch (not wipoNew!) and compile with
>>>
>>> ./bjam --with-cmph=/usr/include -j8
>>> --with-java=/usr/lib/jvm/java-7-openjdk-amd64
>>>
>>> You see you need to specifiy the location of your Java includes,
>>> specifically the directories jni.h resides in. This should build the stuff
>>> in misc/jni/* including java classes and stuff. You need ant for that. Then
>>> you can run the query tool like this:
>>>
>>> echo "test" | ./bin/JniQueryPt_example.sh
>>> /some_path/phrase-table.0-0.minphr 4
>>>
>>> You need to specify the path and the number of scores in the phrase
>>> table. Look inside JniQueryPt_example.sh how to call the jar and inside
>>> misc/jni/java/example.java how to call the code directly from Java. I
>>> apologize for the code, I am not a Java programmer, so this may be crude.
>>>
>>> Best,
>>>
>>> Marcin
>>>
>>> W dniu 2014-11-26 12:00, Raj Dabre napisa?(a):
>>>
>>> Hello Marcin,
>>>
>>> Yes please.
>>> It would save me lots of time.
>>> Thanks.
>>>
>>> Regards.
>>>
>>> On Wed, Nov 26, 2014 at 6:50 PM, Marcin Junczys-Dowmunt <
>>> junczys@amu.edu.pl> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a JNI interface to my compact phrase table somewhere, I guess I
>>>> can put that in contrib within a day or two if there is interest.
>>>>
>>>> best,
>>>>
>>>> Marcin
>>>>
>>>> W dniu 2014-11-26 10:45, Barry Haddow napisa?(a):
>>>>
>>>> Hi Raj
>>>>
>>>> The format of these tables is not described anywhere. You'd have to read
>>>> the code in moses/TranslationModel/PhraseDictionaryTree.cpp, and then
>>>> try to convert it it Java.
>>>>
>>>> A better plan would be to use JNI to call the C++ code -- a similar
>>>> approach has been followed in the python interface in contrib/python.
>>>> This would insulate you from the low-level details, and from changes in
>>>> the format,
>>>>
>>>> cheers - Barry
>>>>
>>>> On 26/11/14 03:22, Raj Dabre wrote:
>>>>
>>>> Hello All, I know that Moses allows for binarization of a phrase table
>>>> which can be read on demand at decoding time. We get 5 files named:
>>>> phrase-table.binphr.* I want to write my own routine in Java to read phrase
>>>> pairs from these on demand. Can anyone guide me ? PS: If an explanation of
>>>> the same for binary reordering tables can be done then it would be great
>>>> too. Thanks in advance. -- Raj Dabre. Research Student, Graduate
>>>> School of Informatics, Kyoto University. CSE MTech, IITB., 2011-2014
>>>> _______________________________________________ Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>>
>>> --
>>> Raj Dabre.
>>> Research Student,
>>> Graduate School of Informatics,
>>> Kyoto University.
>>> CSE MTech, IITB., 2011-2014
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Raj Dabre.
>> Research Student,
>> Graduate School of Informatics,
>> Kyoto University.
>> CSE MTech, IITB., 2011-2014
>>
>>
>>
>>
>
>
>
> --
> Raj Dabre.
> Research Student,
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
>
>
>
>



--
Raj Dabre.
Research Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141127/c4b9936a/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 97, Issue 93
*********************************************

0 Response to "Moses-support Digest, Vol 97, Issue 93"

Post a Comment