Moses-support Digest, Vol 104, Issue 27

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Different phrase tables with same dataset (Barry Haddow)
2. Major bug found in Moses (Read, James C)
3. Re: Major bug found in Moses (Marcin Junczys-Dowmunt)


----------------------------------------------------------------------

Message: 1
Date: Wed, 17 Jun 2015 13:51:22 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Different phrase tables with same dataset
To: Davood Mohammadifar <davood_mf@hotmail.com>,
"moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <55816D4A.1050108@staffmail.ed.ac.uk>
Content-Type: text/plain; charset="windows-1252"

> Do you think that my medium system is effective? (Core i5 2400 , 4GB
> RAM, Ubuntu 32bit 14.04). Of course i wanted to train about 50000
> sentences.
For a small data set of 50k sentences, this should work. You could try
on 10k sentences to be sure.


On 17/06/15 13:46, Davood Mohammadifar wrote:
> Thanks a lot Barry
>
> I do not think the problem is related to persian side of corpus.
> Because My problem is remained when i'm running with French/English
> sample corpus (its link is in moses manual). Based on your comments, I
> think that i should check truecasing, recasing and cleaning tools that
> works properly in preprocessing.
>
> Do you think that my medium system is effective? (Core i5 2400 , 4GB
> RAM, Ubuntu 32bit 14.04). Of course i wanted to train about 50000
> sentences.
>
> ------------------------------------------------------------------------
> Date: Wed, 17 Jun 2015 12:34:59 +0100
> From: bhaddow@staffmail.ed.ac.uk
> To: davood_mf@hotmail.com; moses-support@mit.edu
> Subject: Re: [Moses-support] Different phrase tables with same dataset
>
> Hi Davood
>
> From line 20113 onwards there's a whole bunch of error messages
> indicating that the giza alignment didn't run properly, so then the
> resulting phrase extraction didn't work. I can't actually see why giza
> failed though - possibly the corpus was not preprocessed correctly.
> I'm not familiar with the arabic tool chain,
>
> cheers - Barry
>
> On 16/06/15 18:24, Davood Mohammadifar wrote:
>
> Thanks Barry.
>
> I attached log file. The file reports two training phases. (after
> "(9) create moses.ini", the second training report has been
> appended).
>
> I executed following instruction for both:
>
> nohup nice
> /home/hieu/workspace/github/mosesdecoder/scripts/training/train-model.perl
> -mgiza -mgiza-cpus 2 -parallel -sort-batch-size 253 -sort-compress
> gzip -root-dir /home/hieu/train -corpus
> /home/hieu/corpus/training/training.clean -f fa -e en -alignment
> grow-diag-final-and -reordering msd-bidirectional-fe -lm
> 0:3:/home/hieu/lm/training.blm.en:8 -external-bin-dir
> /home/hieu/workspace/github/mosesdecoder/tools
>
>
>
> Is there any error or unusual thing in it?
>
> ------------------------------------------------------------------------
> Date: Tue, 16 Jun 2015 13:01:10 +0100
> From: bhaddow@staffmail.ed.ac.uk <mailto:bhaddow@staffmail.ed.ac.uk>
> To: davood_mf@hotmail.com <mailto:davood_mf@hotmail.com>;
> moses-support@mit.edu <mailto:moses-support@mit.edu>
> Subject: Re: [Moses-support] Different phrase tables with same dataset
>
> Hi Davood
>
> It isn't normal to get such large differences in phrase table size
> or quality, on the same data set, although small variations are
> possible. You should check carefully that you used exactly the
> same settings in each run, and check if anything went wrong during
> training (errors in the log file),
>
> cheers - Barry
>
> On 16/06/15 12:00, Davood Mohammadifar wrote:
>
> Hello everyone
>
> I used Moses 3 for training my parallel corpus. I gained
> different BLEU scores (18.5-22.5); So i tried to find the
> reason. Finally, I understood that phrase tables are different
> from each other. I trained 50000 parallel sentences and the
> size of phrase table, for the first time was about 39MB (gz
> format) and in second time, it was about 59MB (gz format).
> Also the phrase tables' content are somewhat different (in
> scores, and entries).
>
> I used Mgiza and followed the instructions for baseline system
> in Moses manual. The problem was remained by using Giza++, too.
>
> The problem was remained in training of 150000 sentences, too.
>
> Is different size of phrase tables, normal?
>
> Thank you
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/b655fb0e/attachment-0001.htm
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/b655fb0e/attachment-0001.bat

------------------------------

Message: 2
Date: Wed, 17 Jun 2015 13:26:13 +0000
From: "Read, James C" <jcread@essex.ac.uk>
Subject: [Moses-support] Major bug found in Moses
To: "Moses-support@mit.edu" <Moses-support@mit.edu>
Cc: "Arnold, Doug" <doug@essex.ac.uk>
Message-ID:
<DB3PR06MB07131A3F583C538828F4DAEF85A60@DB3PR06MB0713.eurprd06.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hi all,


I tried unsuccessfully to publish experiments showing this bug in Moses behaviour. As a result I have lost interest in attempting to have my work published. Nonetheless I think you all should be aware of an anomaly in Moses' behaviour which I have thoroughly exposed and should be easy enough for you to reproduce.


As I understand it the TM logic of Moses should select the most likely translations according to the TM. I would therefore expect a run of Moses with no LM to find sentences which are the most likely or at least close to the most likely according to the TM.


To test this behaviour I performed two runs of Moses. One with an unfiltered phrase table the other with a filtered phrase table which left only the most likely phrase pair for each source language phrase. The results were truly startling. I observed huge differences in BLEU score. The filtered phrase tables produced much higher BLEU scores. The beam size used was the default width of 100. I would not have been surprised in the differences in BLEU scores where minimal but they were quite high.


I have been unable to find a logical explanation for this behaviour other than to conclude that there must be some kind of bug in Moses which causes a TM only run of Moses to perform poorly in finding the most likely translations according to the TM when there are less likely phrase pairs included in the race.


I hope this information will be useful to the Moses community and that the cause of the behaviour can be found and rectified.


James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/de64d483/attachment-0001.htm

------------------------------

Message: 3
Date: Wed, 17 Jun 2015 15:32:14 +0200
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Major bug found in Moses
To: "Read, James C" <jcread@essex.ac.uk>
Cc: moses-support@mit.edu, "Arnold, Doug" <doug@essex.ac.uk>
Message-ID: <6b69e4f8b595685c4e51683c0d35ae3f@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"



Hi James,

there are many more factors involved than just probability, for instance
word penalties, phrase penalities etc. To be able to validate your own
claim you would need to set weights for all those non-probabilities to
zero. Otherwise there is no hope that moses will produce anything
similar to the most probable translation. And based on that there is no
surprise that there may be different translations. A pruned phrase table
will produce naturally less noise, so I would say the behaviour you
describe is quite exactly what I would expect to happen.

Best,

Marcin

W dniu 2015-06-17 15:26, Read, James C napisa?(a):

> Hi all,
>
> I tried unsuccessfully to publish experiments showing this bug in Moses behaviour. As a result I have lost interest in attempting to have my work published. Nonetheless I think you all should be aware of an anomaly in Moses' behaviour which I have thoroughly exposed and should be easy enough for you to reproduce.
>
> As I understand it the TM logic of Moses should select the most likely translations according to the TM. I would therefore expect a run of Moses with no LM to find sentences which are the most likely or at least close to the most likely according to the TM.
>
> To test this behaviour I performed two runs of Moses. One with an unfiltered phrase table the other with a filtered phrase table which left only the most likely phrase pair for each source language phrase. The results were truly startling. I observed huge differences in BLEU score. The filtered phrase tables produced much higher BLEU scores. The beam size used was the default width of 100. I would not have been surprised in the differences in BLEU scores where minimal but they were quite high.
>
> I have been unable to find a logical explanation for this behaviour other than to conclude that there must be some kind of bug in Moses which causes a TM only run of Moses to perform poorly in finding the most likely translations according to the TM when there are less likely phrase pairs included in the race.
>
> I hope this information will be useful to the Moses community and that the cause of the behaviour can be found and rectified.
>
> James
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support [1]



Links:
------
[1] http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/79c7fac7/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 104, Issue 27
**********************************************

0 Response to "Moses-support Digest, Vol 104, Issue 27"

Post a Comment