Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu
You can reach the person managing the list at
moses-support-owner@mit.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."
Today's Topics:
1. Re: europarl v9 - released or not? Can be used? (Philipp Koehn)
2. Re: europarl v9 - released or not? Can be used? (Matt Post)
3. Announcing the chat translation task @WMT20 (Amin Farajian)
----------------------------------------------------------------------
Message: 1
Date: Mon, 30 Mar 2020 21:10:19 -0400
From: Philipp Koehn <phi@jhu.edu>
Subject: Re: [Moses-support] europarl v9 - released or not? Can be
used?
To: Artem Shevchenko <shevart@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDA=MZQQo7dOWgf+4p5HvfYzL4qKRFKa_hC3_T5or_s0rA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
v9 is mainly for other languages - it is slightly bigger than earlier
versions for languages
where multiple versions exist.
-phi
On Mon, Mar 30, 2020 at 9:01 PM Artem Shevchenko <shevart@gmail.com> wrote:
> found how to split fields in tab-separated de-en sentences.
> just if someone needs it, do it with cut - f 1 or 2:
> cat file_de-en.tsv | cut -f 1 > file.de
> cat file_de-en.tsv | cut -f 2 > file.en
>
> so the only question, is europarl v9 better than v8 or v7.
>
> ??, 31 ???. 2020 ?. ? 02:21, Artem Shevchenko <shevart@gmail.com>:
>
>> Hello,
>>
>> thank you very much for your reply.
>> my target is to rebuild translation memory for de-en pair while keeping
>> truecase in the German phrase table.
>> In models released with 4.0 for de-en it is all smallcased, which makes
>> impossible to distinguish between e.g. a noun (das Wissen) and a verb zu
>> wissen or sie (she) and Sie (you).
>> I observe the file extension is tsv, different to v7. it is a
>> tab-separated de-en text file.
>> so I need to split it into two.
>> what would be the best way? is there a python script for it?
>>
>> Is v9 better than v8 and v7?
>>
>> Thanks!
>> Artem Shevchenko
>>
>>
>>
>> ??, 30 ???. 2020 ?. ? 21:50, Philipp Koehn <phi@jhu.edu>:
>>
>>> Hi,
>>>
>>> you are free to use this data - v9 has only been generated for some
>>> language pairs, since the amount of translations have not increased
>>> significantly for a few years by now.
>>>
>>> -phi
>>>
>>> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko <shevart@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have found this:
>>>> http://www.statmt.org/europarl/v9/ dated 2019-02
>>>> It contains parallel corpus v9?
>>>>
>>>> However no mentioning of v9 elsewhere.
>>>> Is it released?
>>>> Can it be used?
>>>>
>>>> Thank you!
>>>> Artem Shevchenko
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200330/1e2c3f65/attachment-0001.html
------------------------------
Message: 2
Date: Mon, 30 Mar 2020 21:26:36 -0400
From: Matt Post <post@cs.jhu.edu>
Subject: Re: [Moses-support] europarl v9 - released or not? Can be
used?
To: Artem Shevchenko <shevart@gmail.com>
Cc: moses-support@mit.edu, Philipp Koehn <phi@jhu.edu>
Message-ID: <BF2613A2-6C1D-473D-8DF5-DC2148EFEC91@cs.jhu.edu>
Content-Type: text/plain; charset="utf-8"
Incidentally, you can split the fields more simply using the ?unpaste? command:
cat file_de-en.tsv | unpaste file.{de,en}
Unpaste is available here:
https://github.com/mjpost/bin/blob/master/unpaste
matt (from my phone)
> Le 30 mars 2020 ? 21:01, Artem Shevchenko <shevart@gmail.com> a ?crit :
>
> ?
> found how to split fields in tab-separated de-en sentences.
> just if someone needs it, do it with cut - f 1 or 2:
> cat file_de-en.tsv | cut -f 1 > file.de
> cat file_de-en.tsv | cut -f 2 > file.en
>
> so the only question, is europarl v9 better than v8 or v7.
>
> ??, 31 ???. 2020 ?. ? 02:21, Artem Shevchenko <shevart@gmail.com>:
>> Hello,
>>
>> thank you very much for your reply.
>> my target is to rebuild translation memory for de-en pair while keeping truecase in the German phrase table.
>> In models released with 4.0 for de-en it is all smallcased, which makes impossible to distinguish between e.g. a noun (das Wissen) and a verb zu wissen or sie (she) and Sie (you).
>> I observe the file extension is tsv, different to v7. it is a tab-separated de-en text file.
>> so I need to split it into two.
>> what would be the best way? is there a python script for it?
>>
>> Is v9 better than v8 and v7?
>>
>> Thanks!
>> Artem Shevchenko
>>
>>
>>
>> ??, 30 ???. 2020 ?. ? 21:50, Philipp Koehn <phi@jhu.edu>:
>>> Hi,
>>>
>>> you are free to use this data - v9 has only been generated for some
>>> language pairs, since the amount of translations have not increased
>>> significantly for a few years by now.
>>>
>>> -phi
>>>
>>> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko <shevart@gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I have found this:
>>>> http://www.statmt.org/europarl/v9/ dated 2019-02
>>>> It contains parallel corpus v9?
>>>>
>>>> However no mentioning of v9 elsewhere.
>>>> Is it released?
>>>> Can it be used?
>>>>
>>>> Thank you!
>>>> Artem Shevchenko
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200330/4e67ef12/attachment-0001.html
------------------------------
Message: 3
Date: Tue, 31 Mar 2020 11:11:28 +0100
From: Amin Farajian <ma.farajian@gmail.com>
Subject: [Moses-support] Announcing the chat translation task @WMT20
To: wmt-tasks@googlegroups.com, moses-support@mit.edu, CORPORA@uib.no
Message-ID: <7FAEB2EC-EECC-42AA-B5D1-4FCEF1F590BF@gmail.com>
Content-Type: text/plain; charset="us-ascii"
Dear all,
I would like to inform you that the first edition of the chat translation translation task for WMT20 is live now! You can find more details of the task in our website:
http://www.statmt.org/wmt20/chat-task.html <http://www.statmt.org/wmt20/chat-task.html>
Best,
Amin Farajian
(on behalf of the chat translation task organisers)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20200331/a9a4c748/attachment.html
------------------------------
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
End of Moses-support Digest, Vol 161, Issue 13
**********************************************
Subscribe to:
Post Comments (Atom)
0 Response to "Moses-support Digest, Vol 161, Issue 13"
Post a Comment