Moses-support Digest, Vol 100, Issue 3

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Learning the moses.ini v2 format (Tom Hoar)
2. Re: Arabic public corpora for CASMACAT (Mohamed Z)
3. Reference about tuning the BLEU coefficients (Mikel L. Forcada)
4. Using factor in the hierarchical model (ekkim214)


----------------------------------------------------------------------

Message: 1
Date: Mon, 02 Feb 2015 08:33:06 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: [Moses-support] Learning the moses.ini v2 format
To: moses-support@mit.edu
Message-ID: <54CED3D2.6070001@precisiontranslationtools.com>
Content-Type: text/plain; charset="utf-8"

Much of the v2 moses.ini looks self-explanatory, but I'd like to confirm
my understanding.

The website (http://www.statmt.org/moses/?n=Moses.FeatureFunctions)
defines three feature/functions without arguments. In the moses.ini
files made by train-model.perl's step 9, there also appears to be a 4th
that requires no argument. Can someone confirm this is the case? Are
there others that could appear without arguments?

[feature]
UnknownWordPenalty
WordPenalty
Distortion
PhrasePenalty * - not listed on the website (are there more)

Feature/functions in the [feature] section and items in the [weight]
sections appear to be linked. The feature/functions without arguments
have corresponding entries linked by the same option name with an
appended zero in the [weight] section. Since these feature/functions
have arguments, is it safe to say that they can appear only once in both
the [feature] and [weight] sections?

[weight]
UnknownWordPenalty0= 1
WordPenalty0= -1
Distortion0= 0.3
PhrasePenalty0= 0.2

The feature/functions arguments have corresponding entries liked by the
"name=" argument as the option name in the [weight] section. Are there
cases where there will be entries in the [feature] section without
corresponding entries in the [weight] section or vice-versa?

[feature]
PhraseDictionaryMemory name=*TranslationModel0* num-features=4 ...
KENLM name=*LM0* factor=0 ...

[weight]
*TranslationModel0*= 0.2 0.2 0.2 0.2
*LM0*= 0.5

The sections other than [feature] and [weight], such as [input-factors]
and [mapping], appear to preserve the v1 moses.ini format. Is this true?

The order of lines in the [feature] and [weight] sections is irrelevant
(as many examples have them in different orders). Also, the order of the
arguments on a feature/function line is irrelevant (examples show them
in different orders).

Finally, is there a connection between the [input-factors] section's
value and the input-factor argument value for PhraseDictionaryMemory and
LexicalReordering feature/functions? Or, are the similar names and
corresponding values only coincidental?

My intention is to build two scripts and contribute these scripts to the
Moses project. One will convert the v2 moses.ini file to a standard form
(not associated with the command line syntax) so people can easily edit
the values. The other will convert the interim form back to the native
v2 moses.ini format.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150202/1bcc6742/attachment-0001.htm

------------------------------

Message: 2
Date: Mon, 2 Feb 2015 04:59:39 +0200
From: Mohamed Z <muhamadzeid@gmail.com>
Subject: Re: [Moses-support] Arabic public corpora for CASMACAT
To: Philipp Koehn <phi@jhu.edu>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAFwtJrS_dXCzHV-aJSPruhH62Nh3xAb7h6LfmnibqXLRdJH5ZA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Philipp,

Thanks for your reply. I appreciate it.

It would be really great if you can add Arabic to the list. I hope it's not
much work at your end. I know that you are part of the MateCat team and
basically it shared the same interface as Casmcat. Matecat supports Arabic
perfectly, so I assume casmacat can do the same job, but the most important
thing to me now is the machine translation capability.

I tried to import a Trados sdlxliff file, but it didn't work. I believe
this is due to the different structure of sdlxliff files from regular xliff
files used by Moses. Again, Matecat supports translation of sdlxiff file as
well. I am wondering if there will an integration at some point in the
future.

As for the tokenizer, do you think MADA+TOKAN
<http://www1.ccls.columbia.edu/%7Ecadim/MADA.html> can be added in the
future release of Casmacat? That would be awesome.

I really would like to test Casmacat with Arabic. I have a significant
language pairs that can be utilized for that purpose and I can report the
output to you and do as much as I can to make Moses better in terms of
handling English<>Arabic translation.

Please let me know what you think.

Thanks in advance for your kind support.

Kind regards,
Mohamed

On Fri, Jan 30, 2015 at 10:37 PM, Philipp Koehn <phi@jhu.edu> wrote:

> Hi,
>
> you pretty much have to train the engine with the CASMACAT interface
> to have everything else properly in place.
>
> Adding "Arabic" as an option would be a very simple fix (it just has
> to be added to various menus). There is very little special handling
> of specific languages, for instance the tokenizer is very basic;
> hopefully it works somewhat with Arabic.
>
> I am not sure how the web-based UI handles the left-to-right order of
> Arabic. It may work or not, and may depend on the browser. We have not
> tested that.
>
> You can use any corpus to train an engine by just uploading it
> yourself. It has to be in XLIFF format.
>
> I'd be keen to help testing this out, so let me know how far you get.
>
> -phi
>
>
> On Mon, Jan 26, 2015 at 7:33 PM, Mohamed Z <muhamadzeid@gmail.com> wrote:
> > Hi all,
> >
> > I have installed CASMACAT desktop and I would like to add an
> English>Arabic
> > engine. I see that some langauages are listed there, but mostly European
> > languages. Is there a chance to add Arabic to the list? That would be
> really
> > awesome.
> >
> > If this is not possible, how can I add my engine or upload it? I see an
> > upload button there, but I have no clue?
> >
> > thanks,
> > Mohamed
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150202/1a134495/attachment-0001.htm

------------------------------

Message: 3
Date: Mon, 02 Feb 2015 10:34:57 +0100
From: "Mikel L. Forcada" <mlf@dlsi.ua.es>
Subject: [Moses-support] Reference about tuning the BLEU coefficients
To: moses-support@mit.edu
Message-ID: <54CF44C1.80701@dlsi.ua.es>
Content-Type: text/plain; charset=utf-8; format=flowed

Dear list,

the correlation of BLEU with manual measurements of quality has
extensively been studied (Callison-Burch et al., WMT "Findings" paper).
But, would anyone know of any paper where people have actually tuned the
BLEU coefficients to approximate some kind of manual measurement of
quality? This is hard to search for as most papers talk about tuning the
coefficients of something else to BLEU.

Thanks a million!

All the best,

Mikel

--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes Inform?tics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326



------------------------------

Message: 4
Date: Mon, 02 Feb 2015 19:06:27 +0900
From: ekkim214 <ekkim214@gmail.com>
Subject: [Moses-support] Using factor in the hierarchical model
To: moses-support@mit.edu
Message-ID: <537giy8a6f7ixusx19pndwbk.1422871587539@email.android.com>
Content-Type: text/plain; charset="utf-8"


I want to use some factor (class)?in the hierarchical model.
I am using the ems script and using KenLM for surface and factor both.
The building of LM is successful with settings "-discount_fallback" but
I got an error during tuning phase like this.

Exception: moses/LM/Ken.cpp:399 in Moses::LanguageModel* Moses::ConstructKenLM(const string&) threw util::Exception because `args.size() != 2'.
Incorrect format of KenLM property: path=/home3/ekkim/working/E2K_1501/hierarchical_class/lm/travel22=mkcls.binlm.5
Exit code: 1

Does?not?"moses_chart" decoder which is distributed in the package support factored training?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150202/5ff57103/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 100, Issue 3
*********************************************

0 Response to "Moses-support Digest, Vol 100, Issue 3"

Post a Comment