Moses-support Digest, Vol 98, Issue 18

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. How to train a tree-to-tree model? (Steven Huang)
2. cfp for 20th International Conference on Application of
Natural Language to Information Systems (NLDB'15) (Michael Zock)


----------------------------------------------------------------------

Message: 1
Date: Thu, 4 Dec 2014 14:16:11 +0800
From: Steven Huang <d98922047@ntu.edu.tw>
Subject: [Moses-support] How to train a tree-to-tree model?
To: moses-support@mit.edu, ??? <farmer.tw@gmail.com>
Message-ID:
<CAG-iPUoWKSvVTNn4ricg5ueMpPzrO4r=s40kRzwTMX11JaST3w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

I am trying to build a tree-to-tree model. Before that, I've successfully
build a string-to-string syntax model with the following configuration (the
training corpus are in surface form).

/mosesdecoder/scripts/training/train-model.perl \
--root-dir train \
--mgiza \
--mgiza-cpus 20 \
--corpus /corpus \
--f en \
--e ch \
--lm 0:3:/lm/en-ch-surface.arpa.ch:8 \
--hierarchical \
--glue-grammar \
--max-phrase-length 10 \
--alignment grow-diag-final-and \
--external-bin-dir /mosesdecoder/tools


However, I failed to build a tree-to-tree model using the following
configuration with 2 modifications:
1. I added -target-sytax and -source-syntax arguments, andd
2. use syntax-annotated XML as training corpus (see the attached file for
reference).

/mosesdecoder/scripts/training/train-model.perl \
--root-dir train \
--mgiza \
--mgiza-cpus 20 \
--corpus /tree_test/tree \
--f en \
--e ch \
--lm 0:3:/lm/en-ch-surface.arpa.ch:8 \
--hierarchical \
--target-syntax \
--source-syntax \
--glue-grammar \
--max-phrase-length 10 \
--alignment grow-diag-final-and \
--external-bin-dir /mosesdecoder/tools



During training, there are many warnings like this:

Sent No: 13 , No. Occurrences: 1
0
3
ERROR: Forbidden zero sentence length 0




And en.vcb are generated with 9 lines:

en.vcb
1 UNK 0
2 morning 1
3 class 1
4 bel="FRAG"> 1
5 Good 1
6 GAO 1
7 : 1
8 . 1
9 , 1


It seems that the XML is not correctly paresed and is taken as plain text.
Is there anything wrong with my training configuration or training corpus?
Thanks a lot.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141204/1eaff41e/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tree.en
Type: application/octet-stream
Size: 357 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141204/1eaff41e/attachment-0001.obj

------------------------------

Message: 2
Date: Thu, 04 Dec 2014 13:30:12 +0100
From: Michael Zock <Michael.Zock@lif.univ-mrs.fr>
Subject: [Moses-support] cfp for 20th International Conference on
Application of Natural Language to Information Systems (NLDB'15)
To: corpora@uib.no, moses-support@mit.edu
Cc: Chris Bieman <biem@cs.tu-darmstadt.de>
Message-ID: <548053D4.1050502@lif.univ-mrs.fr>
Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141204/d6bcd93e/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 98, Issue 18
*********************************************

0 Response to "Moses-support Digest, Vol 98, Issue 18"

Post a Comment