Moses-support Digest, Vol 104, Issue 26

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Re: Different phrase tables with same dataset (Barry Haddow)
2. Re: c++11 support (Jeroen Vermeulen)
3. Re: Different phrase tables with same dataset
(Davood Mohammadifar)


----------------------------------------------------------------------

Message: 1
Date: Wed, 17 Jun 2015 12:34:59 +0100
From: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Subject: Re: [Moses-support] Different phrase tables with same dataset
To: Davood Mohammadifar <davood_mf@hotmail.com>,
"moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <55815B63.7090205@staffmail.ed.ac.uk>
Content-Type: text/plain; charset="windows-1252"

Hi Davood

From line 20113 onwards there's a whole bunch of error messages
indicating that the giza alignment didn't run properly, so then the
resulting phrase extraction didn't work. I can't actually see why giza
failed though - possibly the corpus was not preprocessed correctly. I'm
not familiar with the arabic tool chain,

cheers - Barry

On 16/06/15 18:24, Davood Mohammadifar wrote:
> Thanks Barry.
>
> I attached log file. The file reports two training phases. (after "(9)
> create moses.ini", the second training report has been appended).
>
> I executed following instruction for both:
>
> nohup nice
> /home/hieu/workspace/github/mosesdecoder/scripts/training/train-model.perl
> -mgiza -mgiza-cpus 2 -parallel -sort-batch-size 253 -sort-compress
> gzip -root-dir /home/hieu/train -corpus
> /home/hieu/corpus/training/training.clean -f fa -e en -alignment
> grow-diag-final-and -reordering msd-bidirectional-fe -lm
> 0:3:/home/hieu/lm/training.blm.en:8 -external-bin-dir
> /home/hieu/workspace/github/mosesdecoder/tools
>
>
>
> Is there any error or unusual thing in it?
>
> ------------------------------------------------------------------------
> Date: Tue, 16 Jun 2015 13:01:10 +0100
> From: bhaddow@staffmail.ed.ac.uk
> To: davood_mf@hotmail.com; moses-support@mit.edu
> Subject: Re: [Moses-support] Different phrase tables with same dataset
>
> Hi Davood
>
> It isn't normal to get such large differences in phrase table size or
> quality, on the same data set, although small variations are possible.
> You should check carefully that you used exactly the same settings in
> each run, and check if anything went wrong during training (errors in
> the log file),
>
> cheers - Barry
>
> On 16/06/15 12:00, Davood Mohammadifar wrote:
>
> Hello everyone
>
> I used Moses 3 for training my parallel corpus. I gained different
> BLEU scores (18.5-22.5); So i tried to find the reason. Finally, I
> understood that phrase tables are different from each other. I
> trained 50000 parallel sentences and the size of phrase table, for
> the first time was about 39MB (gz format) and in second time, it
> was about 59MB (gz format). Also the phrase tables' content are
> somewhat different (in scores, and entries).
>
> I used Mgiza and followed the instructions for baseline system in
> Moses manual. The problem was remained by using Giza++, too.
>
> The problem was remained in training of 150000 sentences, too.
>
> Is different size of phrase tables, normal?
>
> Thank you
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/7f575e0c/attachment-0001.htm
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/7f575e0c/attachment-0001.bat

------------------------------

Message: 2
Date: Wed, 17 Jun 2015 19:21:33 +0700
From: Jeroen Vermeulen <jtv@precisiontranslationtools.com>
Subject: Re: [Moses-support] c++11 support
To: Rico Sennrich <rico.sennrich@gmx.ch>, moses-support@mit.edu
Message-ID:
<AA7E4C55-E67A-406A-B287-3B3915562180@precisiontranslationtools.com>
Content-Type: text/plain; charset=UTF-8

On June 16, 2015 11:02:59 PM GMT+07:00, Rico Sennrich <rico.sennrich@gmx.ch> wrote:
>Hi list,
>
>some code in mosesdecoder (oxlm, c++tokenizer) already requires c++11.
>To
>let people benefit from the usability and functionality improvements of
>c++11, it would be beneficial to allow the use of c++11 features in all
>of
>the code.
>
>before people start making big changes to the codebase, we should make
>sure
>that there are no good reasons against allowing c++11 features, such as
>lack
>of compiler support.
>
>I pushed a minimal commit (6c0f875) to test the waters. If this
>introduces
>bugs, or if users still rely on old compilers without c++11 support,
>please
>complain here.
>
>best wishes,
>Rico
>
>_______________________________________________
>Moses-support mailing list
>Moses-support@mit.edu
>http://mailman.mit.edu/mailman/listinfo/moses-support

For me, C++11 is currently the most recent version I can work with. So I welcome C++11, but please, no C++14 just yet. :)

We may see a slight performance increase from going to C++11, especially if we can figure out exactly where "noexcept" is appropriate.


Jeroen


------------------------------

Message: 3
Date: Wed, 17 Jun 2015 12:46:26 +0000
From: Davood Mohammadifar <davood_mf@hotmail.com>
Subject: Re: [Moses-support] Different phrase tables with same dataset
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>, "moses-support@mit.edu"
<moses-support@mit.edu>
Message-ID: <SNT150-W518FF5EEA440B0747DC0358CA60@phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"

Thanks a lot Barry

I do not think the problem is related to persian side of corpus. Because My problem is remained when i'm running with French/English sample corpus (its link is in moses manual). Based on your comments, I think that i should check truecasing, recasing and cleaning tools that works properly in preprocessing.

Do you think that my medium system is effective? (Core i5 2400 , 4GB RAM, Ubuntu 32bit 14.04). Of course i wanted to train about 50000 sentences.

Date: Wed, 17 Jun 2015 12:34:59 +0100
From: bhaddow@staffmail.ed.ac.uk
To: davood_mf@hotmail.com; moses-support@mit.edu
Subject: Re: [Moses-support] Different phrase tables with same dataset






Hi Davood



From line 20113 onwards there's a whole bunch of error messages
indicating that the giza alignment didn't run properly, so then the
resulting phrase extraction didn't work. I can't actually see why
giza failed though - possibly the corpus was not preprocessed
correctly. I'm not familiar with the arabic tool chain,



cheers - Barry



On 16/06/15 18:24, Davood Mohammadifar
wrote:




Thanks Barry.



I attached log file. The file reports two training phases.
(after "(9) create moses.ini", the second training report has
been appended).



I executed following instruction for both:



nohup nice
/home/hieu/workspace/github/mosesdecoder/scripts/training/train-model.perl
-mgiza -mgiza-cpus 2 -parallel -sort-batch-size 253
-sort-compress gzip -root-dir /home/hieu/train -corpus
/home/hieu/corpus/training/training.clean -f fa -e en -alignment
grow-diag-final-and -reordering msd-bidirectional-fe -lm
0:3:/home/hieu/lm/training.blm.en:8 -external-bin-dir
/home/hieu/workspace/github/mosesdecoder/tools







Is there any error or unusual thing in it?




Date: Tue, 16 Jun 2015 13:01:10 +0100

From: bhaddow@staffmail.ed.ac.uk

To: davood_mf@hotmail.com; moses-support@mit.edu

Subject: Re: [Moses-support] Different phrase tables with same
dataset



Hi Davood



It isn't normal to get such large differences in phrase table
size or quality, on the same data set, although small
variations are possible. You should check carefully that you
used exactly the same settings in each run, and check if
anything went wrong during training (errors in the log file),



cheers - Barry



On 16/06/15 12:00, Davood
Mohammadifar wrote:




Hello everyone



I used Moses 3 for training my parallel corpus. I
gained different BLEU scores (18.5-22.5); So i tried to
find the reason. Finally, I understood that phrase
tables are different from each other. I trained 50000
parallel sentences and the size of phrase table, for the
first time was about 39MB (gz format) and in second
time, it was about 59MB (gz format). Also the phrase
tables' content are somewhat different (in scores, and
entries).



I used Mgiza and followed the instructions for
baseline system in Moses manual. The problem was
remained by using Giza++, too.



The problem was remained in training of 150000
sentences, too.



Is different size of phrase tables, normal?



Thank you






_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support









-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150617/e878880f/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 104, Issue 26
**********************************************

0 Response to "Moses-support Digest, Vol 104, Issue 26"

Post a Comment