Moses-support Digest, Vol 87, Issue 51

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: word alignment-words' indexes and sentences' length
(amir haghighi)
2. Re: word alignment-words' indexes and sentences' length
(Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Wed, 22 Jan 2014 13:39:04 +0330
From: amir haghighi <amir.haghighi.64@gmail.com>
Subject: Re: [Moses-support] word alignment-words' indexes and
sentences' length
To: moses-support@mit.edu
Message-ID:
<CA+UVbEiv4E_o3_ENt8ZeqA3z55xkaxB3eOUvgE=eYVWHBwQt+w@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Thank you Hieu,

The corpus is utf8, but there is a double space in this line. are double
spaces regarded as a word?
should I remove double spaces from the lines manually to get the correct
sentence's length?



On Tue, Jan 21, 2014 at 4:12 AM, Hieu Hoang <hieuhoang@gmail.com> wrote:

>
> On 20/01/2014 13:45, amir haghighi wrote:
>
> Hello
>
> I've some questions about the giza word alignment.
>
> 1-where is the final alignment file?Is it the aligned.1.grow.... in the
> model folder?
>
> yes.
>
>
> 2-do indexes of the words of both target and source sentences start from
> 0?
>
> yes
>
>
> 3- how does giza calculate the length of a sentence?
>
> the number of words
>
> I have a sentence with 11 tokens that are separated with space, but in
> the alignment file it length is 13.
>
> strange. Are you sure your corpus file is encoded as UTF8? Are there
> double spaces in the line?
>
>
> Regards
> Amir
>
>
>
> _______________________________________________
> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140122/07e8c94e/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 22 Jan 2014 10:40:31 +0000
From: Hieu Hoang <Hieu.Hoang@ed.ac.uk>
Subject: Re: [Moses-support] word alignment-words' indexes and
sentences' length
To: amir haghighi <amir.haghighi.64@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbhoOk618tvJF15AvhD8XoP0KDomZS2WFm2PCRjHvP+E-Q@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

yes, remove the double space. Sometimes, the double space is ignored,
sometimes it's counted as a 'word' with no characters, depending on exactly
how the program tokenizes the line.




On 22 January 2014 10:09, amir haghighi <amir.haghighi.64@gmail.com> wrote:

> Thank you Hieu,
>
> The corpus is utf8, but there is a double space in this line. are double
> spaces regarded as a word?
> should I remove double spaces from the lines manually to get the correct
> sentence's length?
>
>
>
> On Tue, Jan 21, 2014 at 4:12 AM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>
>>
>> On 20/01/2014 13:45, amir haghighi wrote:
>>
>> Hello
>>
>> I've some questions about the giza word alignment.
>>
>> 1-where is the final alignment file?Is it the aligned.1.grow.... in the
>> model folder?
>>
>> yes.
>>
>>
>> 2-do indexes of the words of both target and source sentences start
>> from 0?
>>
>> yes
>>
>>
>> 3- how does giza calculate the length of a sentence?
>>
>> the number of words
>>
>> I have a sentence with 11 tokens that are separated with space, but in
>> the alignment file it length is 13.
>>
>> strange. Are you sure your corpus file is encoded as UTF8? Are there
>> double spaces in the line?
>>
>>
>> Regards
>> Amir
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140122/115743c1/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 87, Issue 51
*********************************************

0 Response to "Moses-support Digest, Vol 87, Issue 51"

Post a Comment