Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: Corpus tokenisation problem (Philipp Koehn)
2. Re: continue partial translation (Philipp Koehn)
3. Re: normalize punctuation (Hieu Hoang)
4. Re: continue partial translation (He He)
5. Re: System requiremnts for Moses (Hegde, Sujay)
----------------------------------------------------------------------
Message: 1
Date: Wed, 2 Dec 2015 17:01:07 -0500
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] Corpus tokenisation problem
To: Anysta Nysta <anystanysta@ymail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDBMR_K+AWhf5_NEywYQ58Chgjh4xP6PTcengD6o70AMqA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Hi,
please use the path to the tokenizer where you actually installed it.
-phi
On Sun, Nov 29, 2015 at 11:31 PM, Anysta Nysta <anystanysta@ymail.com> wrote:
> Hello, I already download and untar the training corpus sample provided in
> Moses/Baseline tutorial. When I tokenised the corpus as shown below, I get
> "No such file or directory" error. Any solution to solve the problem?
>
> Thank you.
>
>
> $ ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en \
>> ~/corpus/training/news-commentary-v8.fr-en.en \
>> ~/corpus/news-commentary-v8.fr-en.tok.en
> -bash: /home/Anystaliunrenang/mosesdecoder/scripts/tokenizer/tokenizer.perl:
> No such file or directory
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
------------------------------
Message: 2
Date: Wed, 2 Dec 2015 18:14:34 -0500
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] continue partial translation
To: He He <hhe.xiy@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDDkVSWO+Nw_N18G037EoOidLzSxjC0VKToUod93mC+icg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Hi,
it's not clear to me what you are exactly specifying to the decoder,
but what you intend to do should work.
Did you use the switch "-xml-input exclusive"?
What exactly do you specify as input?
-phi
On Tue, Nov 24, 2015 at 10:19 PM, He He <hhe.xiy@gmail.com> wrote:
> Hi there,
>
> I'm trying to do translation conditioned on some already translated prefix
> (essentially what -continue-partial-translation was supposed to do). I'm
> using -xml-input exclusive to pass in the prefix source and translation.
>
> However, when the prefix becomes long, this doesn't work, e.g.
> <p translation="Britain 's trade house E D & F Man said on the amount of
> money in eastern europe , sugar beet output both Ukraine and Russia in"> ??
> ? ED & F ?? ? ? ?? ? , 96 / 97 ?? ? ?? ? ??? ?? ? , ????? ???? ? ?? ??
> ???</p> ?? ? ?? ? ?? ? ? , ??? ?? ? ??> 0 ||| ?? ? ED & F?? ? ED & F ?? ? ?
> ?? ? / 96 , 97 ?? ? ?? ? ??? ?? ? ????? , ? ??? ? ?? ?? ??? substantial
> decline was expeted to be tough ||| LexicalReordering0= -4.48185 -7.01678
> -1.48808 -4.3759 -6.89465-0.942918 Distortion0= -12 LM0= -227.918
> WordPenalty0= -40 PhrasePenalty0= 36 TransltionModel0= -7.25771 -34.5474
> -2.80336 -22.3651 ||| -3322.64"
>
> It just copies the source prefix. I suspect it's because many words now
> becomes UNK due to ignoring entries in phrase table that overlaps the
> prefix.
>
> Is there a way around this? Thanks a lot in advance!
>
> Best,
> He
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
------------------------------
Message: 3
Date: Thu, 3 Dec 2015 00:21:18 +0000
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] normalize punctuation
To: Vincent Nguyen <vnguyen@neuf.fr>, moses-support
<moses-support@mit.edu>
Message-ID: <565F8AFE.5020400@gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
looking at the script, english is the default language if '-l' is
specified.
there isn't many language-specific processing in that script
On 01/12/15 08:49, Vincent Nguyen wrote:
> Hieu,
>
> here :
> http://www.statmt.org/moses/RELEASE-3.0/models/fr-en/config.pb.recase
>
> I read :
>
> input-tokenizer = "$moses-script-dir/tokenizer/normalize-punctuation.perl $input-extension | $moses-script-dir/tokenizer/tokenizer.perl -a -l $input-extension"
> output-tokenizer = "$moses-script-dir/tokenizer/normalize-punctuation.perl $output-extension | $moses-script-dir/tokenizer/tokenizer.perl -a -l $output-extension"
>
>
> but shouldn't the language be prefixed by "-l" for the
> normalize-punctuation.perl sript ?
>
>
> thanks
> V
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
--
Hieu Hoang
http://www.hoang.co.uk/hieu
------------------------------
Message: 4
Date: Wed, 2 Dec 2015 21:30:35 -0500
From: He He <hhe.xiy@gmail.com>
Subject: Re: [Moses-support] continue partial translation
To: Philipp Koehn <phi@jhu.edu>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAMdMQUMcPMAo87=8_3bk_Vs53HZDfkNY7vUM=b7+-jC4HvCYsQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
Yes. The input to the decoder is " -v 0 -threads 4 -n-best-list - 10
--print-alignment-info-in-n-best -xml-input exclusive
If I break the long translation into parts it works though.
He
On Wed, Dec 2, 2015 at 6:14 PM, Philipp Koehn <phi@jhu.edu> wrote:
> Hi,
>
> it's not clear to me what you are exactly specifying to the decoder,
> but what you intend to do should work.
>
> Did you use the switch "-xml-input exclusive"?
> What exactly do you specify as input?
>
> -phi
>
>
>
>
>
> On Tue, Nov 24, 2015 at 10:19 PM, He He <hhe.xiy@gmail.com> wrote:
> > Hi there,
> >
> > I'm trying to do translation conditioned on some already translated
> prefix
> > (essentially what -continue-partial-translation was supposed to do). I'm
> > using -xml-input exclusive to pass in the prefix source and translation.
> >
> > However, when the prefix becomes long, this doesn't work, e.g.
> > <p translation="Britain 's trade house E D & F Man said on the amount of
> > money in eastern europe , sugar beet output both Ukraine and Russia in">
> ??
> > ? ED & F ?? ? ? ?? ? , 96 / 97 ?? ? ?? ? ??? ?? ? , ????? ???? ? ?? ??
> > ???</p> ?? ? ?? ? ?? ? ? , ??? ?? ? ??> 0 ||| ?? ? ED & F?? ? ED & F ??
> ? ?
> > ?? ? / 96 , 97 ?? ? ?? ? ??? ?? ? ????? , ? ??? ? ?? ?? ??? substantial
> > decline was expeted to be tough ||| LexicalReordering0= -4.48185
> -7.01678
> > -1.48808 -4.3759 -6.89465-0.942918 Distortion0= -12 LM0= -227.918
> > WordPenalty0= -40 PhrasePenalty0= 36 TransltionModel0= -7.25771 -34.5474
> > -2.80336 -22.3651 ||| -3322.64"
> >
> > It just copies the source prefix. I suspect it's because many words now
> > becomes UNK due to ignoring entries in phrase table that overlaps the
> > prefix.
> >
> > Is there a way around this? Thanks a lot in advance!
> >
> > Best,
> > He
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151202/34e5544a/attachment-0001.html
------------------------------
Message: 5
Date: Thu, 3 Dec 2015 05:32:53 +0000
From: "Hegde, Sujay" <Sujay.Hegde@xerox.com>
Subject: Re: [Moses-support] System requiremnts for Moses
To: Philipp Koehn <phi@jhu.edu>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>,
"MudaliarMudaliar, Preeti J" <preeti.mudaliarmudaliar@xerox.com>
Message-ID:
<586EA7C483504E48870F5BF54319B6EC4ED06B@USA7109MB006.na.xerox.net>
Content-Type: text/plain; charset="utf-8"
HI Philipp,
Thanks a lot.
Actually it?s a VIRTUAL machine.
Also we have compressed the models into .minphr and .minlexr but we couldn?t prune it as while pruning we got an error saying some of the sentences in the Corpus are too long and it cannot be pruned.
We used pruning using SALM and get the following error:
/mnt/hd1/git/salm/Bin/Linux/Index/IndexSA.O64 opensub.train.it
Initialize vocabulary file: opensub.train.it.id_voc
Loading existing vocabulary file: opensub.train.it.id_voc
Total 100 word types loaded
Max VocID=100
Sentence 4152148 has more than 256 words. Can not handle such long sentence. Please cut it short first!
Is there anything we could do about the above?
Thanks and Regards,
Sujay,
Xerox Business Services, Bangalore, India
From: phkoehn@gmail.com [mailto:phkoehn@gmail.com] On Behalf Of Philipp Koehn
Sent: 03 December 2015 03:13
To: Hegde, Sujay
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] System requiremnts for Moses
Hi,
the machine you have is certainly sufficient even for large models.
If you are running two language pairs in parallel and run into RAM problems, you may want to look into ways to compress the model files (phrase table, reordering table, language model) using either more efficient data structures (e.g., various KENLM options), or pruning the models.
-phi
On Tue, Dec 1, 2015 at 5:08 AM, Hegde, Sujay <Sujay.Hegde@xerox.com<mailto:Sujay.Hegde@xerox.com>> wrote:
Dear Moses Admin,
We are using Moses decoder for commercial environment.
We have 132GB RAM, 1TB disk and quadcore Virtual Machine with CentOs OS.
We have 2 language pairs installed, and when running both the models together the Translation hangs(Takes a LONG time).
It is fine when we run only one language model.
Is there any Specific System requirements needed for moses?
Please let me know
Thanks and Regards,
Sujay,
Xerox Business Services, Bangalore, India
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu<mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151203/b5a36f59/attachment.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 110, Issue 5
*********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 110, Issue 5"
Post a Comment