Moses-support Digest, Vol 161, Issue 12

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: europarl v9 - released or not? Can be used? (Philipp Koehn)
2. Re: europarl v9 - released or not? Can be used? (Artem Shevchenko)
3. Re: europarl v9 - released or not? Can be used? (Artem Shevchenko)


----------------------------------------------------------------------

Message: 1
Date: Mon, 30 Mar 2020 15:50:40 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] europarl v9 - released or not? Can be
used?
To: Artem Shevchenko <shevart@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDBdqF-AqmKd7jYQHWdqntbfdrhLT0QzghkiROJnRWUs7A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

you are free to use this data - v9 has only been generated for some
language pairs, since the amount of translations have not increased
significantly for a few years by now.

-phi

On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko <shevart@gmail.com> wrote:

> Hello,
>
> I have found this:
> http://www.statmt.org/europarl/v9/ dated 2019-02
> It contains parallel corpus v9?
>
> However no mentioning of v9 elsewhere.
> Is it released?
> Can it be used?
>
> Thank you!
> Artem Shevchenko
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200330/8bd33cb2/attachment-0001.html

------------------------------

Message: 2
Date: Tue, 31 Mar 2020 02:21:11 +0200
From: Artem Shevchenko <shevart@gmail.com>
Subject: Re: [Moses-support] europarl v9 - released or not? Can be
used?
To: Philipp Koehn <phi@jhu.edu>, moses-support@mit.edu
Message-ID:
<CACmqYH18rEALExM7hcNwippo+aP4B0GWfwRR9vG2MmfUdPmycA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,

thank you very much for your reply.
my target is to rebuild translation memory for de-en pair while keeping
truecase in the German phrase table.
In models released with 4.0 for de-en it is all smallcased, which makes
impossible to distinguish between e.g. a noun (das Wissen) and a verb zu
wissen or sie (she) and Sie (you).
I observe the file extension is tsv, different to v7. it is a tab-separated
de-en text file.
so I need to split it into two.
what would be the best way? is there a python script for it?

Is v9 better than v8 and v7?

Thanks!
Artem Shevchenko



??, 30 ???. 2020 ?. ? 21:50, Philipp Koehn <phi@jhu.edu>:

> Hi,
>
> you are free to use this data - v9 has only been generated for some
> language pairs, since the amount of translations have not increased
> significantly for a few years by now.
>
> -phi
>
> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko <shevart@gmail.com>
> wrote:
>
>> Hello,
>>
>> I have found this:
>> http://www.statmt.org/europarl/v9/ dated 2019-02
>> It contains parallel corpus v9?
>>
>> However no mentioning of v9 elsewhere.
>> Is it released?
>> Can it be used?
>>
>> Thank you!
>> Artem Shevchenko
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200330/0c6d771f/attachment-0001.html

------------------------------

Message: 3
Date: Tue, 31 Mar 2020 03:00:39 +0200
From: Artem Shevchenko <shevart@gmail.com>
Subject: Re: [Moses-support] europarl v9 - released or not? Can be
used?
To: Philipp Koehn <phi@jhu.edu>, moses-support@mit.edu
Message-ID:
<CACmqYH0hzUm7RWBT0ZwQnPtnF2BYN2fFT-RgxdcQUY=SxmssXg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

found how to split fields in tab-separated de-en sentences.
just if someone needs it, do it with cut - f 1 or 2:
cat file_de-en.tsv | cut -f 1 > file.de
cat file_de-en.tsv | cut -f 2 > file.en

so the only question, is europarl v9 better than v8 or v7.

??, 31 ???. 2020 ?. ? 02:21, Artem Shevchenko <shevart@gmail.com>:

> Hello,
>
> thank you very much for your reply.
> my target is to rebuild translation memory for de-en pair while keeping
> truecase in the German phrase table.
> In models released with 4.0 for de-en it is all smallcased, which makes
> impossible to distinguish between e.g. a noun (das Wissen) and a verb zu
> wissen or sie (she) and Sie (you).
> I observe the file extension is tsv, different to v7. it is a
> tab-separated de-en text file.
> so I need to split it into two.
> what would be the best way? is there a python script for it?
>
> Is v9 better than v8 and v7?
>
> Thanks!
> Artem Shevchenko
>
>
>
> ??, 30 ???. 2020 ?. ? 21:50, Philipp Koehn <phi@jhu.edu>:
>
>> Hi,
>>
>> you are free to use this data - v9 has only been generated for some
>> language pairs, since the amount of translations have not increased
>> significantly for a few years by now.
>>
>> -phi
>>
>> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko <shevart@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I have found this:
>>> http://www.statmt.org/europarl/v9/ dated 2019-02
>>> It contains parallel corpus v9?
>>>
>>> However no mentioning of v9 elsewhere.
>>> Is it released?
>>> Can it be used?
>>>
>>> Thank you!
>>> Artem Shevchenko
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200330/62a016a1/attachment.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 161, Issue 12
**********************************************

0 Response to "Moses-support Digest, Vol 161, Issue 12"

Post a Comment