Moses-support Digest, Vol 138, Issue 3

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Preparing TMX files for use in Moses (Ricardo Cabello S?nchez)


----------------------------------------------------------------------

Message: 1
Date: Sat, 7 Apr 2018 19:45:23 +0200
From: Ricardo Cabello S?nchez
<ricardo.cabello.sanchez@googlemail.com>
Subject: Re: [Moses-support] Preparing TMX files for use in Moses
To: Per Tunedal <per.tunedal@operamail.com>
Cc: moses-support@mit.edu
Message-ID:
<CAJxWzkaV8UinO-ONk7fxgmHTbm8iUjmyZ9s0=rXMOaH3UZVbLw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Per,

I would like to ask you if this runs in Linux.

I work in Ubuntu and I am trying to convert TMX to moses files to train my
system.

Thanks.

Ricardo

2016-03-14 9:05 GMT+01:00 Per Tunedal <per.tunedal@operamail.com>:

> Hi,
> I had some problems with TMX extraction scripts and wrote my own. You
> might find it useful:
>
> https://github.com/havet/TMX2Moses
>
> It simply disregards the specification in the header and reads the source
> and target language from the <tu> elements.
>
> Works on single TMX-files as well as on folders containing TMX-files.
>
> Yours,
> Per Tunedal
>
> On Sun, Mar 13, 2016, at 12:03, Tom Hoar wrote:
>
> I don't know the tmx2txt.pl script, but I can suggest where to look for
> problems.
>
> The most frequent problem we have when extracting data from TMX files
> comes from files that don't comply with the TMX specification, especially
> regarding compliance with the srclang attributes. The spec states this
> about how to identify the source language:
>
>
> "*the <tuv> holding the source segment will have its xml:lang attribute
> set to the same value as srclang. (except if srclang is set to "*all*"). If
> a <tu> element does not have a srclang attribute specified, it uses the one
> defined in the <header> element.*"
>
> Sadly, many TMX creation tools, including tools from SDL, do not properly
> identify the source language. Each tool that looks for the source language
> TUV according to the spec handles erroneous TMX segments in its own way.
> So, you need to learn how your TMX declares the srclang attribute, and then
> study the script to see where there's a mismatch.
>
> You can see how we managed these sloppy TMX files in this post, only a
> week old: <https://pttools.freshdesk.com/discussions/topics/6000034251>
> https://pttools.freshdesk.com/discussions/topics/6000034251
>
> Hope this helps.
>
> Tom
>
>
>
> On 3/12/2016 8:57 PM, moses-support-request@mit.edu wrote:
>
> Date: Sat, 12 Mar 2016 13:42:05 +0100
> From: Sa?o Kuntaric <saso.kuntaric@gmail.com> <saso.kuntaric@gmail.com>
> Subject: [Moses-support] Preparing TMX files for use in Moses
> To: moses-support@mit.edu
>
> Hi all,
>
> I have a question that is not connected directly to Moses. I am trying to
> prepare the corpora for training my engine. I have exported a few of my TMs
> to the TMX format and now I am trying to create two separate UTF-8 text
> files. I have tried it with the extract-tmx-corpus and tmx2txt.pl tools. I
> get empty text files for both (the former tool claims that the input file
> can't be read). Are there any special setting I need to set when extracting
> the TMX files? I am using SDL Trados Studio 2015 for exporting the files.
>
> Has anyone come across anything like this?
>
> --
> lp,
>
> Sa?o
>
>
> *_______________________________________________*
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20180407/d8be4f64/attachment-0001.html

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 138, Issue 3
*********************************************

0 Response to "Moses-support Digest, Vol 138, Issue 3"

Post a Comment