Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Repeating Non-terminals and Alignment (Jamie Macbeth)
2. Re: Repeating Non-terminals and Alignment (Hieu Hoang)
3. Re: Repeating Non-terminals and Alignment (Jamie Macbeth)
4. Re: lmplz crashed on joint_order (Dingyuan Wang)
----------------------------------------------------------------------
Message: 1
Date: Wed, 29 Mar 2017 13:27:48 -0400
From: Jamie Macbeth <jmacbeth@mit.edu>
Subject: [Moses-support] Repeating Non-terminals and Alignment
To: moses-support@mit.edu
Message-ID:
<CAKgAoadmWz+108D80ViXh02dDPitu6pEzZy0c+PbxrVsxPtQBw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hello,
I planning to use Moses for a complex language decomposition task. I would
like to be able to translate a sentence like "Bob bought a loaf of bread at
the store for $1," to one like "Bob gave 1$ to the store, and the store
gave a loaf of bread to Bob" using a tree-based model. This requires that
an NP like "Bob" or "the store" appears once in the source text, but
appears twice in the target text.
I tried doing this using a string-to-tree rule table where I have a single
non-terminal in the source aligned with two non-terminals in the target.
Here's a simple example where I would try to translate "bob" to "bob bob":
bob [X] ||| bob [X] ||| 1.0 ||| |||
<s> [X][X] </s> [X] ||| <s> [X][X] [X][X] </s> [TOP] ||| 1.0 ||| 1-1 1-2 |||
You can see that I tried aligning 1-1 and 1-2. However, when I try to
translate "bob" to "bob bob" using this rule table I get a segfault.
Would it be possible to support a more flexible non-terminal alignment like
this in Moses? If I wanted to implement this, would it be extremely
difficult, and where would I start?
Sincerely,
Jamie Macbeth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170329/e6474661/attachment-0001.html
------------------------------
Message: 2
Date: Wed, 29 Mar 2017 18:54:30 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Repeating Non-terminals and Alignment
To: Jamie Macbeth <jmacbeth@mit.edu>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbgi7JtTLeidu_0+kPNYzRxLMo_35XtPmwvN8sOHu1GYyw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Interesting. So you're not after a general insertion or deletion grammar,
but a synchronous grammar with occasional replicating subphrases?
I've forgotten how to do it in the main Moses, but I've just rewritten the
decoder 'moses2' that also supports SCFG. It may be better for you to
implement it in there because the codebase is smaller, newer and easier to
read.
I'm not sure exactly how you'll do it but the starting point would be
moses2/SCFG/Manager.cpp function Decode()
and drill down from there
* Looking for MT/NLP opportunities *
Hieu Hoang
http://moses-smt.org/
On 29 March 2017 at 18:27, Jamie Macbeth <jmacbeth@mit.edu> wrote:
> Hello,
>
> I planning to use Moses for a complex language decomposition task. I
> would like to be able to translate a sentence like "Bob bought a loaf of
> bread at the store for $1," to one like "Bob gave 1$ to the store, and the
> store gave a loaf of bread to Bob" using a tree-based model. This requires
> that an NP like "Bob" or "the store" appears once in the source text, but
> appears twice in the target text.
>
> I tried doing this using a string-to-tree rule table where I have a single
> non-terminal in the source aligned with two non-terminals in the target.
> Here's a simple example where I would try to translate "bob" to "bob bob":
>
> bob [X] ||| bob [X] ||| 1.0 ||| |||
> <s> [X][X] </s> [X] ||| <s> [X][X] [X][X] </s> [TOP] ||| 1.0 ||| 1-1 1-2
> |||
>
> You can see that I tried aligning 1-1 and 1-2. However, when I try to
> translate "bob" to "bob bob" using this rule table I get a segfault.
>
> Would it be possible to support a more flexible non-terminal alignment
> like this in Moses? If I wanted to implement this, would it be extremely
> difficult, and where would I start?
>
> Sincerely,
> Jamie Macbeth
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170329/cd0074aa/attachment-0001.html
------------------------------
Message: 3
Date: Wed, 29 Mar 2017 17:02:12 -0400
From: Jamie Macbeth <jmacbeth@mit.edu>
Subject: Re: [Moses-support] Repeating Non-terminals and Alignment
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAKgAoacWyfiCxGQ+QOs8m6h3HxHDRomD9vALkXB9_e+pr3EF1w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi, Hieu,
Yes, I think that's a good description of what I'm hoping for. Thanks for
the pointers! Will let you know how it goes.
Jamie
On Wed, Mar 29, 2017 at 1:54 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
> Interesting. So you're not after a general insertion or deletion grammar,
> but a synchronous grammar with occasional replicating subphrases?
>
> I've forgotten how to do it in the main Moses, but I've just rewritten the
> decoder 'moses2' that also supports SCFG. It may be better for you to
> implement it in there because the codebase is smaller, newer and easier to
> read.
>
> I'm not sure exactly how you'll do it but the starting point would be
> moses2/SCFG/Manager.cpp function Decode()
> and drill down from there
>
>
> * Looking for MT/NLP opportunities *
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 29 March 2017 at 18:27, Jamie Macbeth <jmacbeth@mit.edu> wrote:
>
>> Hello,
>>
>> I planning to use Moses for a complex language decomposition task. I
>> would like to be able to translate a sentence like "Bob bought a loaf of
>> bread at the store for $1," to one like "Bob gave 1$ to the store, and the
>> store gave a loaf of bread to Bob" using a tree-based model. This requires
>> that an NP like "Bob" or "the store" appears once in the source text, but
>> appears twice in the target text.
>>
>> I tried doing this using a string-to-tree rule table where I have a
>> single non-terminal in the source aligned with two non-terminals in the
>> target. Here's a simple example where I would try to translate "bob" to
>> "bob bob":
>>
>> bob [X] ||| bob [X] ||| 1.0 ||| |||
>> <s> [X][X] </s> [X] ||| <s> [X][X] [X][X] </s> [TOP] ||| 1.0 ||| 1-1 1-2
>> |||
>>
>> You can see that I tried aligning 1-1 and 1-2. However, when I try to
>> translate "bob" to "bob bob" using this rule table I get a segfault.
>>
>> Would it be possible to support a more flexible non-terminal alignment
>> like this in Moses? If I wanted to implement this, would it be extremely
>> difficult, and where would I start?
>>
>> Sincerely,
>> Jamie Macbeth
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20170329/c082cf23/attachment-0001.html
------------------------------
Message: 4
Date: Thu, 30 Mar 2017 17:45:36 +0800
From: Dingyuan Wang <abcdoyle888@gmail.com>
Subject: Re: [Moses-support] lmplz crashed on joint_order
To: Kenneth Heafield <moses@kheafield.com>, moses-support@mit.edu
Message-ID: <316c1949-2245-7fe8-4dd9-e24207544d81@gmail.com>
Content-Type: text/plain; charset=utf-8
Hi,
I think it should work. Does this have something to do with available
disk space?
2017-03-29 23:40, Kenneth Heafield:
> How embarrassing. Can you try on head from github.com/kpu/kenlm
> <http://github.com/kpu/kenlm> ? If that fails, I can take this off list.
>
> Kenneth
>
> On March 29, 2017 3:39:20 PM GMT+01:00, Dingyuan Wang
> <abcdoyle888@gmail.com> wrote:
>
> Dear list,
>
> lmplz crashed on my machine recently. Command is
>
> lmplz -o 4 -S 70% --text zhc-simp.txt --arpa zhc.lm --prune 0 1 1 2
>
> === 1/5 Counting and sorting n-grams ===
> Reading /home/gumble/docs/E/corpus/zhs/zhc-simp.txt
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> tcmalloc: large alloc 2340552704 bytes == 0x55e7ed4f4000 @
> tcmalloc: large alloc 9362194432 bytes == 0x55e878d14000 @
> ****************************************************************************************************
> Unigram tokens 886453003 types 66249
> === 2/5 Calculating and sorting adjusted counts ===
> Chain sizes: 1:794988 2:1961835648 3:3678441728 4:5885507072
> tcmalloc: large alloc 5885509632 bytes == 0x55e7ed4f4000 @
> tcmalloc: large alloc 1961836544 bytes == 0x55e94c29c000 @
> tcmalloc: large alloc 3678445568 bytes == 0x55e9c1190000 @
> Statistics:
> 1 66249 D1=0.549028 D2=1.18255 D3+=0.99644
> 2 14266408/22790840 D1=0.615082 D2=1.06095 D3+=1.47555
> 3 87810872/205978808 D1=0.742285 D2=1.17282 D3+=1.49899
> 4 62909089/415283792 D1=0.698985 D2=1.20588 D3+=1.54463
> Memory estimate for binary LM:
> type MB
> probing 3417 assuming -p 1.5
> probing 4002 assuming -r models -p 1.5
> trie 1653 without quantization
> trie 908 assuming -q 8 -b 8 quantization
> trie 1418 assuming -a 22 array pointer compression
> trie 674 assuming -a 22 -q 8 -b 8 array pointer compression and
> quantization
> === 3/5 Calculating and sorting initial probabilities ===
> tcmalloc: large alloc 4119576576 bytes == 0x55e94c1d8000 @
> tcmalloc: large alloc 9966813184 bytes == 0x55eaaf630000 @
> Chain sizes: 1:794988 2:228262528 3:1756217440 4:1509818136
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> ##**********###############################################################-----##**********++#############################################################-----##************#############################################################-----##************####################################################################************####################################################################************+###################################################################*************###################################################################*************#####################################################################################
> === 4/5 Calculating and writing order-interpolated probabilities ===
> Chain sizes: 1:794988 2:228262528 3:1756217440 4:1509818136
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> ------------------------------------------------------------------------
> terminate
> called after throwing an instance of 'lm::FormatLoadException'
> what(): ./lm/common/joint_order.hh:61 in void lm::JointOrder(const
> util::stream::ChainPositions&, Callback&) [with Callback =
> lm::builder::{anonymous}::Callback<lm::builder::{anonymous}::OutputProbBackoff>;
> Compare = lm::SuffixOrder] threw FormatLoadException because `order !=
> current + 1'.
> Detected n-gram without matching suffix
>
--
Dingyuan Wang
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 125, Issue 54
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 125, Issue 54"
Post a Comment