Moses-support Digest, Vol 149, Issue 11

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Moses + BPE ? (Anoop (?????))


----------------------------------------------------------------------

Message: 1
Date: Sun, 17 Mar 2019 08:15:08 +0530
From: Anoop (?????) <anoop.kunchukuttan@gmail.com>
Subject: Re: [Moses-support] Moses + BPE ?
To: Noe Casas <noe.casas@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CADXxMYfhap=GjEfUxG+UYe+USfZSSDqAYH4cyDfgKDcba-kZug@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Noe,

We had done translation between related languages using BPE with Moses
without using EMS. I did not face any problems in particular. A few things
that we did for our scenario:

- Sentence length could increase, increasing decoding time. Since, we were
working on related languages we switched off reordering.
- To speed up decoding, we used cube pruning with a small pop-limit (
https://www.cse.iitb.ac.in/~anoopk/publications/vardial2016_faster_subword.pdf
)
- Again, we used a small BPE size (~3000 words) since we were working with
similar languages and used a higher order LM (10 gram)

You can find more details here:
https://www.cse.iitb.ac.in/~anoopk/publications/sclem2017_bpe_related.pdf

Regards,
Anoop.

You can see details here:
https://www.cse.iitb.ac.in/~anoopk/publications/sclem2017_bpe_related.pdf

On Sat, Mar 16, 2019 at 5:16 PM Noe Casas <noe.casas@gmail.com> wrote:

> Dear Moses Community,
>
> I want to train Moses with byte-pair encoding tokenization (BPE,
> https://github.com/rsennrich/subword-nmt). I plan to do it "by hand"
> without the EMS.
>
> Is there any problem with the idea?
>
> Would it be Ok just to apply BPE after tokenization, truecasing, etc and
> then go on with the rest of the typical steps?
>
> Is there any gotcha I should take into account?
>
> I have only identified as potential pitfall that I have to clean the
> corpus with clean-corpus-n.perl after applying BPE in order not to reach
> the maximum fertility 9 for mgiza.
>
> Any success/failure experiences doing similar stuff are also very welcome.
>
> Thanks,
> Noe.
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


--
I claim to be a simple individual liable to err like any other fellow
mortal. I own, however, that I have humility enough to confess my errors
and to retrace my steps.

http://flightsofthought.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20190316/afcb9025/attachment-0001.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 149, Issue 11
**********************************************

0 Response to "Moses-support Digest, Vol 149, Issue 11"

Post a Comment