Moses-support Digest, Vol 98, Issue 8

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. how to clean the UN corpus (emna hkiri)
2. Re: Running Moses with a compact lexical reordering
(Marcin Junczys-Dowmunt)


----------------------------------------------------------------------

Message: 1
Date: Mon, 1 Dec 2014 16:56:51 +0100
From: emna hkiri <emna.hkiri@gmail.com>
Subject: [Moses-support] how to clean the UN corpus
To: moses-support@mit.edu
Message-ID:
<CAAp-nZ27iYXX_BbLOjaTmdRYk33V+NJTHQ+uM9Rzov6YW38vHg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Friends thank you a lot for your help before and i hope that you will
help me
again
i try to build an arabic-english SMT with moses
but in the training Giza do not do the alignment it is because the corpus
UN ar-en is not well cleaned ; in fact this is the problem because they are
not parallel ;they have not the same number of lines. i'm working with 2000
directory (2000ar and 2000en). does anyone worked with UN ar-en corpus???
i want to ask how to make the same number of lines for ar-en in 2000 in
order to pass the cleaning step

thank you in advance i hope you will answer my question
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/85f260da/attachment.htm

------------------------------

Message: 2
Date: Mon, 01 Dec 2014 16:56:47 +0100
From: Marcin Junczys-Dowmunt <junczys@amu.edu.pl>
Subject: Re: [Moses-support] Running Moses with a compact lexical
reordering
To: Massinissa Ahmim <massinissa.ahmim@linguacustodia.com>
Cc: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <547C8FBF.6090106@amu.edu.pl>
Content-Type: text/plain; charset="utf-8"

Put the suffix .minlexr back, that was a failed advise.

W dniu 01.12.2014 o 16:56, Massinissa Ahmim pisze:
> I'm getting the same message :
>
> Can't read /home/train-degaulle-fren/training/model/reordering-table
>
> 2014-12-01 16:49 GMT+01:00 Marcin Junczys-Dowmunt <junczys@amu.edu.pl
> <mailto:junczys@amu.edu.pl>>:
>
> Could you check whether this happens if you do not use the phrase
> table combination? Just one normal phrase table and a standard
> decoding path "0 T 0" ?
>
> W dniu 01.12.2014 o 16:43, Massinissa Ahmim pisze:
>> Hi Marcin,
>>
>> the reordering table is over 12 GB, can I send a sample of the
>> inital file instead?
>>
>> here is my moses.ini :
>>
>> ### MOSES CONFIG FILE ###
>> #########################
>>
>> # input factors
>> [input-factors]
>> 0
>>
>> # mapping steps
>> [mapping]
>> 0 T 2
>>
>>
>> [distortion-limit]
>> 6
>>
>> # feature functions
>> [feature]
>> UnknownWordPenalty
>> WordPenalty
>> PhrasePenalty
>> #PhraseDictionaryCompact tuneable=false name=TranslationModel0
>> num-features=4
>> path=/home/Update4/train-pros-fren-update4/binarised-model/phrase-table-pros-fren-update4.minphr
>> input-factor=0 output-factor=0 table-limit=20
>>
>>
>> PhraseDictionaryCompact tuneable=false name=TranslationModel0
>> num-features=4
>> path=/home/train-degaulle-fren/binarised-model/phrase-table-degaulle-firsthalf.minphr
>> input-factor=0 output-factor=0 table-limit=20
>>
>>
>> PhraseDictionaryCompact tuneable=false name=TranslationModel1
>> num-features=4
>> path=/home/train-degaulle-fren/binarised-model/phrase-table-degaulle-secondhalf.minphr
>> input-factor=0 output-factor=0 table-limit=20
>>
>>
>>
>>
>> PhraseDictionaryMultiModel num-features=4 input-factor=0
>> output-factor=0 table-limit=20 mode=interpolate lambda=0.5,0.5
>> components=TranslationModel0,TranslationModel1
>>
>> LexicalReordering name=LexicalReordering0 num-features=6
>> type=wbe-msd-bidirectional-fe-
>> allff input-factor=0 output-factor=0
>> path=/home/train-degaulle-fren/training/model/reordering-table.minlexr
>>
>>
>> Distortion
>> KENLM lazyken=0 name=LM0 factor=0
>> path=/home/DATA/LM/Corps-lm.blm.en.mm <http://Corps-lm.blm.en.mm>
>> order=5
>>
>> # dense weights for feature functions
>>
>>
>>
>>
>> [threads]
>> all
>> [weight]
>>
>> LexicalReordering0= 0.0564145 -0.00809962 0.0939308 0.114042
>> 0.0630804 0.0683044
>> Distortion0= 0.0452205
>> LM0= 0.140316
>> WordPenalty0= 0.0793205
>> PhrasePenalty0= 0.113696
>> PhraseDictionaryMultiModel0= 0.0608326 0.102939 0.0533373 0.000466626
>> TranslationModel0= 0 0 1 0
>> TranslationModel1= 0 0 1 0
>> UnknownWordPenalty0= 1
>>
>>
>> 2014-12-01 16:30 GMT+01:00 Marcin Junczys-Dowmunt
>> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>>:
>>
>> Hm,
>> weird. In that case you should keep the suffix. Can you make
>> the reordering table available for download somehow? Can I
>> see your full INI-file?
>>
>> W dniu 01.12.2014 o 16:27, Massinissa Ahmim pisze:
>>> Hi Marcin,
>>>
>>> I'm getting this now :
>>>
>>> Can't read
>>> /home/train-degaulle-fren/training/model/reordering-table
>>>
>>> I double checked the paths, they seem to be correct
>>>
>>> Thanks
>>>
>>> Massinissa
>>>
>>> 2014-12-01 16:17 GMT+01:00 Marcin Junczys-Dowmunt
>>> <junczys@amu.edu.pl <mailto:junczys@amu.edu.pl>>:
>>>
>>> Hi,
>>> try removing the ".minlexr" suffix from the ini file.
>>> Does it work after that?
>>>
>>> W dniu 01.12.2014 o 16:14, Massinissa Ahmim pisze:
>>>> Dear all,
>>>>
>>>> I'm having troubles running moses with a compact
>>>> lexical reordering table.
>>>>
>>>> After sorting the initial table, I ran the following
>>>> command :
>>>>
>>>> /home/Moses/mosesdecoder/bin/processLexicalTableMin -in
>>>> reordering-table.sort -out reordering-table -threads 32
>>>>
>>>> the process has been successfully completed so I've
>>>> updated my moses.ini file :
>>>>
>>>> LexicalReordering name=LexicalReordering0
>>>> num-features=6 type=wbe-msd-bidirectional-fe-allff
>>>> input-factor=0 output-factor=0 path=/home/train-deg
>>>> aulle-fren/training/model/reordering-table.minlexr
>>>>
>>>>
>>>> and ran moses like this :
>>>>
>>>> /home/Moses/mosesdecoder/bin/moses -f moses2.ini
>>>> -minlexr-memory
>>>>
>>>> and I'm getting the following error :
>>>>
>>>> ....
>>>> line=LexicalReordering name=LexicalReordering0
>>>> num-features=6 type=wbe-msd-bidirectional-fe-allff
>>>> input-factor=0 output-factor=0
>>>> path=/home/train-degaulle-fren/training/model/reordering-table.minlexr
>>>> FeatureFunction: LexicalReordering0 start: 15 end: 20
>>>> Initializing LexicalReordering..
>>>> line=Distortion
>>>> FeatureFunction: Distortion0 start: 21 end: 21
>>>> line=KENLM lazyken=0 name=LM0 factor=0
>>>> path=/home/DATA/LM/Corps-lm.blm.en.mm
>>>> <http://Corps-lm.blm.en.mm> order=5
>>>> FeatureFunction: LM0 start: 22 end: 22
>>>> Loading UnknownWordPenalty0
>>>> Loading WordPenalty0
>>>> Loading PhrasePenalty0
>>>> Loading LexicalReordering0
>>>> Loading table into memory...Exception:
>>>> vector::_M_range_check
>>>>
>>>>
>>>> Many thanks
>>>>
>>>> Massinissa
>>>>
>>>> --
>>>>
>>>> Description : Description : lingua_custodia_final full logo
>>>>
>>>> */The Translation Trustee/*
>>>>
>>>> *1, Place Charles de Gaulle*
>>>>
>>>> *78180 Montigny-le-Bretonneux*
>>>>
>>>> *Tel : +33 1 30 44 04 23 Mobile : +33 7 61 44 40 84*
>>>>
>>>> *Email :****massinissa.ahmim@linguacustodia.com
>>>> <mailto:massinissa.ahmim@linguacustodia.com>***
>>>>
>>>> *Website :****www.linguacustodia.com
>>>> <http://www.linguacustodia.com/> -
>>>> www.thetranslationtrustee.com
>>>> <http://www.thetranslationtrustee.com>*
>>>>
>>>> A Young Innovative Company recognised by the French
>>>> Research Ministry and Paris Finance Innovation cluster
>>>>
>>>> Logo labellis? FI small
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Description : Description : lingua_custodia_final full logo
>>>
>>> */The Translation Trustee/*
>>>
>>> *1, Place Charles de Gaulle*
>>>
>>> *78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23 Mobile : +33 7 61 44 40 84*
>>>
>>> *Email :****massinissa.ahmim@linguacustodia.com
>>> <mailto:massinissa.ahmim@linguacustodia.com>***
>>>
>>> *Website :****www.linguacustodia.com
>>> <http://www.linguacustodia.com/> -
>>> www.thetranslationtrustee.com
>>> <http://www.thetranslationtrustee.com>*
>>>
>>> A Young Innovative Company recognised by the French Research
>>> Ministry and Paris Finance Innovation cluster
>>>
>>> Logo labellis? FI small
>>>
>>
>>
>>
>>
>> --
>>
>> Description : Description : lingua_custodia_final full logo
>>
>> */The Translation Trustee/*
>>
>> *1, Place Charles de Gaulle*
>>
>> *78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23 Mobile : +33 7 61 44 40 84*
>>
>> *Email :****massinissa.ahmim@linguacustodia.com
>> <mailto:massinissa.ahmim@linguacustodia.com>***
>>
>> *Website :****www.linguacustodia.com
>> <http://www.linguacustodia.com/> - www.thetranslationtrustee.com
>> <http://www.thetranslationtrustee.com>*
>>
>> A Young Innovative Company recognised by the French Research
>> Ministry and Paris Finance Innovation cluster
>>
>> Logo labellis? FI small
>>
>
>
>
>
> --
>
> Description : Description : lingua_custodia_final full logo
>
> */The Translation Trustee/*
>
> *1, Place Charles de Gaulle*
>
> *78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23 Mobile : +33 7 61 44 40 84*
>
> *Email :****massinissa.ahmim@linguacustodia.com
> <mailto:massinissa.ahmim@linguacustodia.com>***
>
> *Website :****www.linguacustodia.com
> <http://www.linguacustodia.com/> - www.thetranslationtrustee.com
> <http://www.thetranslationtrustee.com>*
>
> A Young Innovative Company recognised by the French Research Ministry
> and Paris Finance Innovation cluster
>
> Logo labellis? FI small
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/cd3dd19c/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/cd3dd19c/attachment.jpg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 2720 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/cd3dd19c/attachment-0001.jpg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/cd3dd19c/attachment-0002.jpg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 2720 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/cd3dd19c/attachment-0003.jpg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/cd3dd19c/attachment-0004.jpg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 2720 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/cd3dd19c/attachment-0005.jpg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 4421 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/cd3dd19c/attachment-0006.jpg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 2720 bytes
Desc: not available
Url : http://mailman.mit.edu/mailman/private/moses-support/attachments/20141201/cd3dd19c/attachment-0007.jpg

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 98, Issue 8
********************************************

0 Response to "Moses-support Digest, Vol 98, Issue 8"

Post a Comment