Moses-support Digest, Vol 141, Issue 11

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. incomplete phrase table (Janek Amann)
2. Re: incomplete phrase table (Hieu Hoang)
3. Re: incomplete phrase table (Philipp Koehn)

----------------------------------------------------------------------

Message: 1
Date: Thu, 26 Jul 2018 18:06:19 +0200
From: "Janek Amann" <J.Amann@gmx.net>
Subject: [Moses-support] incomplete phrase table
To: moses-support@mit.edu
Message-ID:
<trinity-5f75c9da-57f0-46f7-8fc6-5594950169ab-1532621179080@3c-app-gmx-bs18>

Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20180726/b3e24fda/attachment-0001.html

------------------------------

Message: 2
Date: Fri, 27 Jul 2018 08:04:23 +1000
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] incomplete phrase table
To: Janek Amann <J.Amann@gmx.net>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbj8Zaphg=VfR5ZjHRmr528ScBu-cUaiCb5qCmA26fr3JQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I guess you wanted it to create the following rules
c -> x
d -> y
e -> z
There's no guarantee that it will figure that out. A cause could be there
isn't enough training data.

Hieu Hoang
http://statmt.org/hieu

On 27 July 2018 at 02:06, Janek Amann <J.Amann@gmx.net> wrote:

> Hi all,
>
> I'm pretty new to Moses and I don't think I'm able to figure this out on
> my own. I'm trying to train Moses with this very small data set.
>
> Src:
>
> A C
> B C
> A D
> B E
>
> Tgt:
>
> X
> X
> Y
> Z
>
> And this is my test set:
>
> Src:
>
> A C
> B C
> A D
> B D
> A E
> B E
>
> Tgt:
>
> X
> X
> Y
> Y
> Z
> Z
>
>
> This is the phrase table I'm getting:
>
> A C ||| X ||| 0.5 0.25 1 1 ||| 0-0 1-0 ||| 2 1 1 ||| |||
> A D ||| Y ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| |||
> B C ||| X ||| 0.5 0.25 1 0.75 ||| 0-0 1-0 ||| 2 1 1 ||| |||
> B E ||| Z ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| |||
>
> For some reason Moses didn't extract any single tokens which of course
> messes up the translation model.
> These are the commands I used:
>
> for the language model:
>
> /home/janek/mosesdecoder/bin/lmplz \
> -o 3 </home/janek/Desktop/Moses/data/moses_train_4.tgt >
> /home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt \
> --discount_fallback
>
> and the translation model:
>
> /home/janek/mosesdecoder/scripts/training/train-model.perl \
> -root-dir /home/janek/Desktop/Moses/working \
> -corpus /home/janek/Desktop/Moses/data/moses_train_4 \
> -f src \
> -e tgt \
> -alignment grow-diag-final-and \
> -reordering msd-bidirectional-fe \
> -lm 0:1:/home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt:8 \
> -external-bin-dir /home/janek/mosesdecoder/mgiza/mgizapp \
> -mgiza
>
> Since my dataset is very small I skipped tokenizing and truecasing. I
> didn't do any tuning also.
> I've already tried out all possible options for the alignment but it
> didn't change a thing.
> I'd be really grateful if someone could point me to a solution or at least
> the right direction for solving this.
> This is my first time posting something in a support forum so I don't know
> if you need any more information.
> Just let me know if you do.
>
> Thanks for your help.
>
> Best,
> Janek
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20180726/f7fab069/attachment-0001.html

------------------------------

Message: 3
Date: Thu, 26 Jul 2018 21:52:50 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] incomplete phrase table
To: Hieu Hoang <hieuhoang@gmail.com>
Cc: Moses Support <moses-support@mit.edu>
Message-ID:
<CAAFADDDPB1rNG4sHvbfy1t=J4GnW_Z+EeFui8dgGH8eRR7usXw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

if you have data like this, then you should also manually create word
alignments for it.

This would guarantee that you get certain phrase pairs.

You can take a look at the word alignment it generated to see why it fails
sometimes.

-phi

On Thu, Jul 26, 2018 at 6:16 PM Hieu Hoang <hieuhoang@gmail.com> wrote:

> I guess you wanted it to create the following rules
> c -> x
> d -> y
> e -> z
> There's no guarantee that it will figure that out. A cause could be there
> isn't enough training data.
>
>
>
> Hieu Hoang
> http://statmt.org/hieu
>
> On 27 July 2018 at 02:06, Janek Amann <J.Amann@gmx.net> wrote:
>
>> Hi all,
>>
>> I'm pretty new to Moses and I don't think I'm able to figure this out on
>> my own. I'm trying to train Moses with this very small data set.
>>
>> Src:
>>
>> A C
>> B C
>> A D
>> B E
>>
>> Tgt:
>>
>> X
>> X
>> Y
>> Z
>>
>> And this is my test set:
>>
>> Src:
>>
>> A C
>> B C
>> A D
>> B D
>> A E
>> B E
>>
>> Tgt:
>>
>> X
>> X
>> Y
>> Y
>> Z
>> Z
>>
>>
>> This is the phrase table I'm getting:
>>
>> A C ||| X ||| 0.5 0.25 1 1 ||| 0-0 1-0 ||| 2 1 1 ||| |||
>> A D ||| Y ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| |||
>> B C ||| X ||| 0.5 0.25 1 0.75 ||| 0-0 1-0 ||| 2 1 1 ||| |||
>> B E ||| Z ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| |||
>>
>> For some reason Moses didn't extract any single tokens which of course
>> messes up the translation model.
>> These are the commands I used:
>>
>> for the language model:
>>
>> /home/janek/mosesdecoder/bin/lmplz \
>> -o 3 </home/janek/Desktop/Moses/data/moses_train_4.tgt >
>> /home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt \
>> --discount_fallback
>>
>> and the translation model:
>>
>> /home/janek/mosesdecoder/scripts/training/train-model.perl \
>> -root-dir /home/janek/Desktop/Moses/working \
>> -corpus /home/janek/Desktop/Moses/data/moses_train_4 \
>> -f src \
>> -e tgt \
>> -alignment grow-diag-final-and \
>> -reordering msd-bidirectional-fe \
>> -lm 0:1:/home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt:8 \
>> -external-bin-dir /home/janek/mosesdecoder/mgiza/mgizapp \
>> -mgiza
>>
>> Since my dataset is very small I skipped tokenizing and truecasing. I
>> didn't do any tuning also.
>> I've already tried out all possible options for the alignment but it
>> didn't change a thing.
>> I'd be really grateful if someone could point me to a solution or at
>> least the right direction for solving this.
>> This is my first time posting something in a support forum so I don't
>> know if you need any more information.
>> Just let me know if you do.
>>
>> Thanks for your help.
>>
>> Best,
>> Janek
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20180726/d49fd461/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 141, Issue 11
**********************************************

Moses-support Digest, Vol 141, Issue 11

0 Response to "Moses-support Digest, Vol 141, Issue 11"

Post a Comment