Moses-support Digest, Vol 92, Issue 40

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: Help about Using Giza++ for comparable Corpora (Philipp Koehn)
2. How to use MANY without the word net and phrasedb
(Nishkarsh Shastri)
3. Re: How to use MANY without the word net and phrasedb
(Christophe Servan)
4. Large parallel corpora (Tom Hoar)
5. How are alignment scores in GIZA++ normalized? (Victor Zhong)
6. Recasing is very slow compared to the actual translation
(Stanislav Ku??k)

----------------------------------------------------------------------

Message: 1
Date: Fri, 20 Jun 2014 08:58:50 -0400
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] Help about Using Giza++ for comparable
Corpora
To: alireza tabebordbar <ar.tabebordbar@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAFADDDrcsRf-z+scLw6gWA0LyNDDO3Lyw50+AT=JGh26sJaEw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi,

GIZA++ is designed for parallel sentence pairs, but it is tolerant to
some degree of noise. If you feed "comparable" sentence pairs,
which I assume you mean to be sentence pairs which are only
partially (or not at all) translationally equivalent, then you will
hopefully get word alignments for the relevant parts, but also
alignments for the mismatched parts. Given a large amount
of such data, you may be able to learn useful word translation
tables - but it depends on the noisiness of the data.

If you add parallel sentence pairs, you will likely get better
alignments.

-phi

On Thu, Jun 19, 2014 at 7:20 AM, alireza tabebordbar
<ar.tabebordbar@gmail.com> wrote:
> Hi all
> I am Master degree Student and I'm New to SMT.
> I extracted some Comparable sentences and I want to use Giza++ for
> word alignment.I Think Giza Use EM algorithm for aligning words,
> However I don't know that I can feed comparable sentences directly to
> Giza++ or I have combine them with some Parallel sentences.
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

------------------------------

Message: 2
Date: Fri, 20 Jun 2014 18:48:41 +0530
From: Nishkarsh Shastri <nishkarsh.shastri@gmail.com>
Subject: [Moses-support] How to use MANY without the word net and
phrasedb
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAB9695adU6VW1An1uOERv86aszHkeshNBdrQiskAfcG1TKTkYg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I was trying to combine the Indian Language models and MANY was the tool
kit suggested.
If anyone has any experience with MANY, can they please direct me to use it
properly as the documentation is very less and Wordnet is not available for
Indian Languages.

--
Nishkarsh Shastri
2nd year U/G
Dept. of Computer Science and Engineering
IIT Kharagpur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140620/bb66a3dd/attachment-0001.htm

------------------------------

Message: 3
Date: Fri, 20 Jun 2014 15:39:04 +0200
From: Christophe Servan <christophe.servan@gmail.com>
Subject: Re: [Moses-support] How to use MANY without the word net and
phrasedb
To: Nishkarsh Shastri <nishkarsh.shastri@gmail.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID:
<CAAsGDkp+3kGEAVEH0MW6J9MVGp+8aLH7AZJvYQaV+XoJdW3E5w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,
if you need help for MANY, I recommend you to contact directly Loic
Barrault in Le Mans who is the designer of MANY.
http://www-lium.univ-lemans.fr/~barrault/home.php

Cheers,

Christophe

2014-06-20 15:18 GMT+02:00 Nishkarsh Shastri <nishkarsh.shastri@gmail.com>:

> I was trying to combine the Indian Language models and MANY was the tool
> kit suggested.
> If anyone has any experience with MANY, can they please direct me to use
> it properly as the documentation is very less and Wordnet is not available
> for Indian Languages.
>
> --
> Nishkarsh Shastri
> 2nd year U/G
> Dept. of Computer Science and Engineering
> IIT Kharagpur
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140620/2ff4e2ff/attachment-0001.htm

------------------------------

Message: 4
Date: Fri, 20 Jun 2014 20:50:08 +0700
From: Tom Hoar <tahoar@precisiontranslationtools.com>
Subject: [Moses-support] Large parallel corpora
To: Moses-Support <moses-support@mit.edu>
Message-ID: <53A43C10.5000504@precisiontranslationtools.com>
Content-Type: text/plain; charset=UTF-8; format=flowed

Does anyone have experience (words-of-wisdom) training the translation
model from a parallel corpus with 2.25 trillion phrase pairs and over 45
trillion tokens?

Thanks,
Tom

------------------------------

Message: 5
Date: Fri, 20 Jun 2014 10:33:53 -0400
From: Victor Zhong <victor@victorzhong.com>
Subject: [Moses-support] How are alignment scores in GIZA++
normalized?
To: moses-support@mit.edu
Message-ID:
<CADp+kmYc6erw-N=Sf_a97_dPOGnDAPj=rEpSZpzshgLv7B3apQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

Please excuse this question if it seems obvious :) I am just getting
started with statmt and am unclear as to how the alignment scores produced
by GIZA++ are normalized. My intuition is that the giza alignment scores of
a sentence pair is normalized across all possible alignments for this
sentence pair. Is this correct?

Moreover, are there any resources describing how the alignment score is
actually calculated? Which quantity in the IBM paper does the alignment
score actually correspond to?

Thanks,
Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140620/1fa05ab4/attachment-0001.htm

------------------------------

Message: 6
Date: Fri, 20 Jun 2014 15:03:14 +0000 (UTC)
From: Stanislav Ku??k <standa.kurik@gmail.com>
Subject: [Moses-support] Recasing is very slow compared to the actual
translation
To: moses-support@mit.edu
Message-ID: <loom.20140620T165851-229@post.gmane.org>
Content-Type: text/plain; charset=us-ascii

Hello,

I wonder if it's normal for the recaser to take 7 seconds to process a
single sentence when the actual translation of that sentence took 3 seconds.

Below is the recaser's config file. As for the referred files, phrase-
table.gz is about 3.5 MB while cased.srilm.gz is 33.7 MB.

I tried setting the 'distortion-limit' to 0 but it did not make any
difference.

Thank you.

#########################
### MOSES CONFIG FILE ###
#########################

# input factors
[input-factors]
0

# mapping steps
[mapping]
0 T 0

[ttable-file]
0 0 0 5 /storage/moses/trained-models/EN_SV-SE_2014-04-
22T04_00_17_105475/recaser/phrase-table.gz

# no generation models, no generation-file section

# language models: type(srilm/irstlm), factors, order, file
[lmodel-file]
0 0 3 /storage/moses/trained-models/EN_SV-SE_2014-04-
22T04_00_17_105475/recaser/cased.srilm.gz

# limit on how many phrase translations e for each phrase f are loaded
# 0 = all elements loaded
[ttable-limit]
20

# distortion (reordering) weight
[weight-d]
0.6

# language model weights
[weight-l]
0.5000

# translation model weights
[weight-t]
0.20
0.20
0.20
0.20
0.20

# no generation models, no weight-generation section

# word penalty
[weight-w]
-1

[distortion-limit]
6

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 92, Issue 40
*********************************************

Moses-support Digest, Vol 92, Issue 40

0 Response to "Moses-support Digest, Vol 92, Issue 40"

Post a Comment