Moses-support Digest, Vol 108, Issue 37

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Moses vocabulary code (Hieu Hoang)
2. Re: Moses vocabulary code (Kenneth Heafield)
3. BLEU score difference about 0.13 for one dataset is normal?
(Davood Mohammadifar)


----------------------------------------------------------------------

Message: 1
Date: Sat, 10 Oct 2015 18:22:17 +0100
From: Hieu Hoang <hieuhoang@gmail.com>
Subject: Re: [Moses-support] Moses vocabulary code
To: Lane Schwartz <dowobeha@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAEKMkbhSWiNEq-fx6PYA4pdETX8YQQYw80UsDHvQUKSsQUVj_A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Yep. The cinst factor* is the original unique vocab I'd and its more useful
IMO cos u can get the string back without u referring back to the vocab
factory. But use what u like

String piece is apparently faster for some operations
On 10 Oct 2015 5:35 pm, "Lane Schwartz" <dowobeha@gmail.com> wrote:

> Wouldn't factor->GetId() be the unique integer ID of the string?
>
> On Fri, Oct 9, 2015 at 5:54 PM, Hieu Hoang <hieuhoang@gmail.com> wrote:
>
>> const Factor* is the vocab id. It's guaranteed to be unique for each
>> unique string. You can map directly to the string using
>> factor->GetString()
>>
>>
>>
>> On 09/10/2015 22:55, Lane Schwartz wrote:
>>
>> Thanks, Marcin.
>>
>> So when the various components of Moses pass words back and forth, what
>> do they send each other? std::string? StringPiece?
>>
>> On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt <
>> junczys@amu.edu.pl> wrote:
>>
>>> For instance in my phrase table that would be
>>>
>>> mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h
>>>
>>> StringVector<unsigned char, unsigned, std::allocator>
>>> m_sourceSymbols;
>>> StringVector<unsigned char, unsigned, std::allocator> m_targetSymbols;
>>>
>>> That's a memory-mapped vector of strings.
>>>
>>> W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:
>>>
>>> Seriously? That sounds inefficient.
>>>
>>> I've found code in KenLM that maps from strings to integers, but not the
>>> other way around.
>>>
>>> Marcin, do you know, for example, where any Moses code is for doing the
>>> mapping for any data structure?
>>>
>>>
>>> On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt <
>>> <junczys@amu.edu.pl>junczys@amu.edu.pl> wrote:
>>>
>>>> Hi,
>>>> This would only be a simple thing if there was a common framework for
>>>> that, but there isn't. Each datastructure implements its own vocabularies
>>>> and look-up tables. There is no common set of integers.
>>>> Best,
>>>> Marcin
>>>>
>>>> W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
>>>>
>>>> Hey,
>>>>
>>>> I know this should be a simple thing to find, but what code in Moses is
>>>> responsible for mapping back and forth between strings and integers?
>>>>
>>>> Thanks,
>>>> Lane
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>>
>>> --
>>> When a place gets crowded enough to require ID's, social collapse is not
>>> far away. It is time to go elsewhere. The best thing about space travel
>>> is that it made it possible to go elsewhere.
>>> -- R.A. Heinlein, "Time Enough For Love"
>>>
>>>
>>>
>>
>>
>> --
>> When a place gets crowded enough to require ID's, social collapse is not
>> far away. It is time to go elsewhere. The best thing about space travel
>> is that it made it possible to go elsewhere.
>> -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>> _______________________________________________
>> Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> --
>> Hieu Hoanghttp://www.hoang.co.uk/hieu
>>
>>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away. It is time to go elsewhere. The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151010/c295bece/attachment-0001.html

------------------------------

Message: 2
Date: Sat, 10 Oct 2015 22:17:54 +0100
From: Kenneth Heafield <moses@kheafield.com>
Subject: Re: [Moses-support] Moses vocabulary code
To: moses-support@mit.edu
Message-ID: <56198082.9060803@kheafield.com>
Content-Type: text/plain; charset=utf-8

Agreed about the cuteness of const Factor *.

Let's say you're reading space-delimited file input.

std::string line("Foo Bar Baz Quux .");

One can make a StringPiece(line.data(), 3) that looks and for most
purposes acts like std::string("Foo") but requires zero memory
allocation. It's not null terminated. It's just a const char * and a
length without owning the underlying memory. This makes it super fast
to parse/split text. util/tokenize_piece.hh provides an iterator
operation for string splitting.

Taking it a step further, util::FilePiece does a rolling mmap of a text
file and gives you StringPiece. Zero-copy file reading.

In Moses preference order for function parameters: const Factor *,
StringPiece, std::string or char *.

On 10/10/2015 06:22 PM, Hieu Hoang wrote:
> Yep. The cinst factor* is the original unique vocab I'd and its more
> useful IMO cos u can get the string back without u referring back to the
> vocab factory. But use what u like
>
> String piece is apparently faster for some operations
>
> On 10 Oct 2015 5:35 pm, "Lane Schwartz" <dowobeha@gmail.com
> <mailto:dowobeha@gmail.com>> wrote:
>
> Wouldn't factor->GetId() be the unique integer ID of the string?
>
> On Fri, Oct 9, 2015 at 5:54 PM, Hieu Hoang <hieuhoang@gmail.com
> <mailto:hieuhoang@gmail.com>> wrote:
>
> const Factor* is the vocab id. It's guaranteed to be unique for
> each unique string. You can map directly to the string using
> factor->GetString()
>
>
>
> On 09/10/2015 22:55, Lane Schwartz wrote:
>> Thanks, Marcin.
>>
>> So when the various components of Moses pass words back and
>> forth, what do they send each other? std::string? StringPiece?
>>
>> On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt
>> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>> wrote:
>>
>> For instance in my phrase table that would be
>>
>> mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h
>>
>> StringVector<unsigned char, unsigned, std::allocator>
>> m_sourceSymbols;
>> StringVector<unsigned char, unsigned, std::allocator>
>> m_targetSymbols;
>>
>> That's a memory-mapped vector of strings.
>>
>> W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:
>>> Seriously? That sounds inefficient.
>>>
>>> I've found code in KenLM that maps from strings to
>>> integers, but not the other way around.
>>>
>>> Marcin, do you know, for example, where any Moses code is
>>> for doing the mapping for any data structure?
>>>
>>>
>>> On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt
>>> <<mailto:junczys@amu.edu.pl>junczys@amu.edu.pl
>>> <mailto:junczys@amu.edu.pl>> wrote:
>>>
>>> Hi,
>>> This would only be a simple thing if there was a
>>> common framework for that, but there isn't. Each
>>> datastructure implements its own vocabularies and
>>> look-up tables. There is no common set of integers.
>>> Best,
>>> Marcin
>>>
>>> W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
>>>> Hey,
>>>>
>>>> I know this should be a simple thing to find, but
>>>> what code in Moses is responsible for mapping back
>>>> and forth between strings and integers?
>>>>
>>>> Thanks,
>>>> Lane
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>> --
>>> When a place gets crowded enough to require ID's, social
>>> collapse is not
>>> far away. It is time to go elsewhere. The best thing
>>> about space travel
>>> is that it made it possible to go elsewhere.
>>> -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>>
>>
>> --
>> When a place gets crowded enough to require ID's, social
>> collapse is not
>> far away. It is time to go elsewhere. The best thing about
>> space travel
>> is that it made it possible to go elsewhere.
>> -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away. It is time to go elsewhere. The best thing about space
> travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


------------------------------

Message: 3
Date: Sun, 11 Oct 2015 04:23:56 +0000
From: Davood Mohammadifar <davood_mf@hotmail.com>
Subject: [Moses-support] BLEU score difference about 0.13 for one
dataset is normal?
To: Moses Support <moses-support@mit.edu>
Message-ID: <SNT150-W77C3B81B7AD94C31D18B598C320@phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"

Hello every one

I noticed different BLEU scores for same dataset. Also the difference is not so much and is about 0.13.

I trained my dataset and tuned development set for Persian-English translation. after testing, the score was 21.95. For second time i did the same process and obtained 21.82. (my tools were mgiza, mert, ...)

is this difference normal?

My system:
CPU: Core i7-4790K
RAM: 16GB
OS: ubuntu 12.04

Thanks

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20151011/f0cc50b7/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 108, Issue 37
**********************************************

Related Posts :

0 Response to "Moses-support Digest, Vol 108, Issue 37"

Post a Comment