Moses-support Digest, Vol 93, Issue 2

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Moses and BiDi Languages (Philipp Koehn)
2. Unable to compile SampleClient.java (Mahima Sharma)
3. Using word embeddings in Moses (Hubert Soyer)
4. Question regarding tokenizing the corpus
(Rajpirathap Sakthithasan)


----------------------------------------------------------------------

Message: 1
Date: Tue, 1 Jul 2014 12:56:27 -0400
From: Philipp Koehn <pkoehn@inf.ed.ac.uk>
Subject: Re: [Moses-support] Moses and BiDi Languages
To: "Schaefer, Falko" <falko.schaefer@sap.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID:
<CAAFADDC=DZXadUVDac70Yw9OvJHd6WaGrBZnDdaWhW8=uiYJJA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi

Right to left languages are no problem since the text passed to the decoder
is still a sequential string of words - identical to left to right
languages.

I am not familiar with bidirectional languages, but if this is also just a
display issue, it should not be a problem either.

-phi
On Jun 26, 2014 1:08 PM, "Schaefer, Falko" <falko.schaefer@sap.com> wrote:

> Dear Moses mailing list,
>
>
>
> I received a question from a potential MT customer regarding BiDi
> languages. Does the Moses Core support bidirectional as well as
> right-to-left languages such as Hebrew or Arabic without any problems or
> are there any technical implications that need to be taken into
> consideration in that regard? Any information you could send to me on the
> matter would be highly appreciated.
>
>
>
> Many thanks in advance and best regards,
>
>
>
> Falko
>
>
>
> *Dr. Falko Sch?fer*
>
>
>
> SAP Language Services (GS)
>
> *SAP AG, *SAP-Allee, 68789 St. Leon-Rot, Germany
>
> T +49 6227 7-68848
>
>
>
> *Please consider the impact on the environment before printing this
> e-mail.*
>
>
>
> Pflichtangaben/Mandatory Disclosure Statements:
>
> *http://www.sap.com/germany/about/company/legal/impressum_DE.epx
> <http://www.sap.com/germany/about/company/legal/impressum_DE.epx>*
>
>
>
> Diese E-Mail kann Betriebs- oder Gesch?ftsgeheimnisse oder sonstige
> vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrt?mlich
> erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine
> Vervielf?ltigung oder Weitergabe der E-Mail ausdr?cklich untersagt. Bitte
> benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen
> Dank.
>
>
>
> This e-mail may contain trade secrets or privileged, undisclosed, or
> otherwise confidential information. If you have received this e-mail in
> error, you are hereby notified that any review, copying, or distribution of
> it is strictly prohibited. Please inform us immediately and destroy the
> original transmittal. Thank you for your cooperation.
>
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140701/b0cb4bce/attachment-0001.htm

------------------------------

Message: 2
Date: Wed, 2 Jul 2014 01:27:40 +0530
From: Mahima Sharma <mahima.bv@gmail.com>
Subject: [Moses-support] Unable to compile SampleClient.java
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAKs0NRobgER-c8r8AmrWTGL5aA=4cERV0TAiGgcGgCigKc7avg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi

Thank you for helping me in installing moses server.

While exploring the contrib folder. I came across the server folder
which had a SampleClient.java and a Translation-Web folder which had
Transaltion.java file.

Out of curosity, I tried testing them. Both these files require
org.apache.xmlrpc and I am unable to locate this package. On the
Apache's wesite this is giving 404 error.

Can anybody point me to the package so that I may compile the files
and test the web interface as well.

--
Mahima
--
Change always comes bearing gifts.


------------------------------

Message: 3
Date: Wed, 2 Jul 2014 18:04:39 +0900
From: Hubert Soyer <hubert.soyer@googlemail.com>
Subject: [Moses-support] Using word embeddings in Moses
To: moses-support@mit.edu
Message-ID:
<CAM7TO-iFU5caBsU=2eaycG0J12fBevPX13pejpzd0o10J8nR2w@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hello,

I have checked the mailing list archive for this question but couldn't
find anything.
I'd be surprised if this question has not been asked yet, if it has,
I'd be happy if you could point me to the corresponding mails.

Recently, word representations induced by neural networks have gained
a lot of momentum.
Particularly often cited in this context is:
http://code.google.com/p/word2vec/

Those vector word representations are vectors that carry some semantic
meaning in them, i.e. semantically similar words have similar vectors
(small distances to each other).

I have been wondering about the best way to incorporate them in Moses.

One solution would be to incorporate them as factors in a factored model:

http://www.statmt.org/moses/?n=Moses.FactoredTutorial

It seems to me that I would have to treat each dimension of each word
vector as a separate factor which would lead to a lot of factors.
Usual dimensionalities of those word vectors are 200 or more.

Is treating each dimension as a factor the best way to incorporate
those vectors or is there anything better I can do?
I don't have to stick to factors, if there is another way.

Thank you in advance!

Best,

Hubert


------------------------------

Message: 4
Date: Sat, 28 Jun 2014 00:06:55 +0530
From: Rajpirathap Sakthithasan <rajpirathap@gmail.com>
Subject: [Moses-support] Question regarding tokenizing the corpus
To: moses-support <moses-support@mit.edu>
Message-ID:
<CAOjssz_aHXAqnuXqn0Q6a2ZqWtQn=d1tsAEqb87sYfiucc+daw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi !

Im doing research on statiscal machine translator for SriLankan local
languages Sinhala and Tamil . I have used

~/mosestools/mosesdecoder/scripts/tokenizer/tokenizer.perl -l si <
~/mosestools/testarea/corpus/datatraining.si >
~/mosestools/testarea/corpus/datatraining.tok.si

~/mosestools/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ta <
~/mosestools/testarea/corpus/datatraining.ta >
~/mosestools/testarea/corpus/datatraining.tok.ta

commands to do tokenize the corpus .

but the output is seems not not as i expected . Some unwanted are appear in
the output file datatraining.tok.si,datatraining.tok.ta.

here by i have attached those input and output files for corpus . Can you
help me to solve this unexpected trouble to continue my research .

Thanks

--


*RAJPIRATHAP SAKTHITHASAN.*
*FACULTY OF INFORMATION TECHNOLOGY.*
* UNIVERSITY OF MORATUWA.*
*SRILANKA.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140628/be8e0259/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corpusdata.zip
Type: application/zip
Size: 394195 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20140628/be8e0259/attachment.zip

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 93, Issue 2
********************************************

0 Response to "Moses-support Digest, Vol 93, Issue 2"

Post a Comment