Moses-support Digest, Vol 88, Issue 44

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: why compile moses with irstlm? (Hieu Hoang)
2. Re: Training gives bad results! (Hieu Hoang)
3. Re: about segmentation flag of decoder (Hieu Hoang)
4. tokenizer script , special characters
(cyrine.nasri@univ-lorraine.fr)
5. (no subject) (cyrine.nasri@univ-lorraine.fr)

----------------------------------------------------------------------

Message: 1
Date: Thu, 20 Feb 2014 23:19:03 +0000
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] why compile moses with irstlm?
To: moses-support@mit.edu
Message-ID: <53068D67.8050101@gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

I cna think of a few reasons when using IRSTLM is preferable to KENLM:
1. You've been given a LM that's already been binarized with IRSTLM
so you must use it.
2. You need a language model that support large ngram order. By
default, KenLM max n-gram order is 7.

On 19/02/2014 13:44, Viktor Pless wrote:
> Hi, what is the significance of the option of installing moses with
> IRSTLM? I did not give the option "--with-irstlm=..." but it still
> seems to be working.
> with Thanks,
> V
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140220/3245d1e3/attachment-0001.htm

------------------------------

Message: 2
Date: Fri, 21 Feb 2014 10:32:57 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] Training gives bad results!
To: Sehrob Ibrohimov <isehrob@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbjPLhEYFVm5KDwuObhvd8hV8RqZL81uw+s+Cu-qHYcnmQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

That warning doesn't say anything about why your results are low.

The 1st thing I would do is check the translation output and see if lots of
words are unknown to the translation model and just passed through the
system.

If there are lots of them, it may indicate
1.your datafiles weren't all converted to UTF8 before processing,
2.the files weren't consistently tokenized.
3.the genre of the training data is not the same as the genre of the
test data
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140221/5456a93c/attachment-0001.htm

------------------------------

Message: 3
Date: Fri, 21 Feb 2014 10:39:21 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] about segmentation flag of decoder
To: nadeem khan <nad_star06@yahoo.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbhDmODwTH70GbfLo9t+_psBYLhmg0wOXfV616Jv4MXH_Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On 20 February 2014 13:06, nadeem khan <nad_star06@yahoo.com> wrote:

> Hi all;
>
> I want to know about the use of different flags of moses while decoding..
> as I am getting this kind of translation output while ran the decoder..
> there is no kind of segmentation of alignment information for words.. I am
> using coomand given below:
>
if you run moses without any flags, it will tell you what flags to use
./moses
Or look in the file moses/Parameter.cpp

TO print word alignment info
-print-alignment-info
or
-alignment-output-file
TO print segmentation info:
-report-segmentation

>
> Output:
> Collecting options took 0.000 seconds
> Search took 0.100 seconds
> BEST TRANSLATION: in early years care and education ??|UNK|UNK|UNK
> ???|UNK|UNK|UNK level 3 or the [111111111111] [total=-209.805] <<-19.000,
> -12.000, -200.000, -1.191, 0.000, -1.044, -2.693, 0.000, -2.999, -51.832,
> -15.323, -20.042, -2.518, -5.393, 6.999>>
> Translation took 0.100 seconds
> Finished translating
>
> decoder command is :
>
> *moses** -config** work/eng-urd/f4/model/moses.ini** -input-file**
> work/eng-urd/f4/eval/urd-eng-test.lw.en** 1>** work/eng-urd/f4/urd.out**
> 2>** work/eng-urd/f4/eval/tuned.decode.out** &*
> I want to add segmentation or other reordering etc information to the
> output plz help out in that regards and also
>
>
> Can Someone please tell me about the actual alignments and reordering of
> words from Source to target. In simple words we know in SMT the
> probablities are involved but I want to know the deeper side of these
> reorderings and alignments like for example a source word abc translated to
> xyz how the alignment of this word being done when it occurs in a sentence
> and how it will get in proper order..
> If someone elaborate all this with some kind of example
>
There may be some example in Philipp's slides for his book
http://statmt.org/book/

>
> THANKS
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140221/03b36e21/attachment-0001.htm

------------------------------

Message: 4
Date: Fri, 21 Feb 2014 14:20:02 +0100
From: "cyrine.nasri@univ-lorraine.fr" <cyrine.nasri@gmail.com>
Subject: [Moses-support] tokenizer script , special characters
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAPg_V0ig4H0vdnjbyL-9KqYA+TC0ov94QeW4PcomdYwEdofTXQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hello all,

I have a problem with the tokenizer.pl script. i get as a result a text ith
some special punctuation , like this for example :

EU 's Luxembourg-based statistical office reported

The input file is a .txt file

Is there any solution for this problem

Thank you in advance

Bests
--
*Cyrine*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140221/abb049b4/attachment-0001.htm

------------------------------

Message: 5
Date: Fri, 21 Feb 2014 14:49:18 +0100
From: "cyrine.nasri@univ-lorraine.fr" <cyrine.nasri@gmail.com>
Subject: [Moses-support] (no subject)
To: Thomas Meyer <ithurtstom@gmail.com>, "moses-support@mit.edu"
<moses-support@MIT.EDU>
Message-ID:
<CAPg_V0iAB2VxKssP-ADFEzQ-AYriCt_3iRMsVCABfy9EQm5PoQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Thank you Thomas,

So, i keep the text with these Special characters, it will not cause
problems? beacuse the training corpus is without these characters but only
the development and test corpus are like this.

Thank you :)

Bets

2014-02-21 14:40 GMT+01:00 Thomas Meyer <ithurtstom@gmail.com>:

>
>
> Hi,
>
> That is not a 'problem' but XML entities<http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references> mark-up
> for special characters. You don't have to worry about this, as the
> tokenizer script does it for all characters in a consistent way.
>
> Best,
> Thomas
>
>
> On 21 February 2014 14:20, cyrine.nasri@univ-lorraine.fr <
> cyrine.nasri@gmail.com> wrote:
>
>>
>> Hello all,
>>
>> I have a problem with the tokenizer.pl script. i get as a result a text
>> ith some special punctuation , like this for example :
>>
>> EU 's Luxembourg-based statistical office reported
>>
>> The input file is a .txt file
>>
>> Is there any solution for this problem
>>
>> Thank you in advance
>>
>>
>> Bests
>> --
>> *Cyrine*
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>

--

*Cyrine NASRIPh.D. Student in Computer Science*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140221/b2d4a721/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 88, Issue 44
*********************************************

Moses-support Digest, Vol 88, Issue 44

0 Response to "Moses-support Digest, Vol 88, Issue 44"

Post a Comment