Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Language model question (Dingyuan Wang)
----------------------------------------------------------------------
Message: 1
Date: Fri, 27 Nov 2015 00:05:51 +0800
From: Dingyuan Wang <abcdoyle888@gmail.com>
Subject: Re: [Moses-support] Language model question
To: Vincent Nguyen <vnguyen@neuf.fr>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAFt8H74H6ta+ijKC_CHao2dvUchqnONvk64Q+JDN99JK5b-zEg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
I tend to fix it in the tokenization script, or I would solve this in some
pre-processing scripts if there are any obvious patterns in the noise.
--
Dingyuan
2015?11?26? 21:09? "Vincent Nguyen" <vnguyen@neuf.fr>???
> Hi all,
>
> I have a question regarding LMs.
>
> Let's take the example of news.2014.shuffle.en
>
> When we process it through punctuation normalization for english
> language, it will for instance put a " " before an apostrophe
> "it is'nt" = > "it is 'nt"
>
> BUT it contains some noise, for instance there is some french sentences
> in the corpus, for which the apostrophe process will not be suited
> "j'aime" => "j 'aime" => it will create the token 'aime
>
> My point is the following,
>
> At stage of LM building, how can we prune to eliminate such token like
> "'aime" so that it does not create wrong uni-grams, nor bi-grams, ...
>
> the ngram -minprune only take 2 as a minimum so wrong unigrams will
> still be taken in the LM.
>
>
> Hope I'm clear enough ....
>
> Vincent
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151126/e6c989a0/attachment-0001.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 109, Issue 70
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 109, Issue 70"
Post a Comment