Moses-support Digest, Vol 111, Issue 58

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."

Today's Topics:

1. Re: kbmira died with SIGABRT when tuning (Dingyuan Wang)
2. Re: kbmira died with SIGABRT when tuning (Barry Haddow)

----------------------------------------------------------------------

Message: 1
Date: Tue, 19 Jan 2016 22:26:07 +0800
From: Dingyuan Wang <abcdoyle888@gmail.com>
Subject: Re: [Moses-support] kbmira died with SIGABRT when tuning
To: Barry Haddow <bhaddow@inf.ed.ac.uk>, Hieu Hoang
<hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <569E477F.3010202@gmail.com>
Content-Type: text/plain; charset=utf-8

Hi Barry,

It usually hits an error in about 1~10 iterations on my laptop. I don't
know what triggers that, so it may be a probability problem.

Disabling xml-input won't help. I think I should use verbose output.

My locale settings is:

LANG=zh_CN.UTF-8
LANGUAGE=zh_CN.UTF-8:zh_TW.UTF-8:zh_HK.utf8:en_US.utf8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=

? 2016?01?19? 19:20, Barry Haddow ??:
> Hi Dingyuan
>
> I have your script and model running, but so far it has not reported any
> errors. It's at iteration 27, and I'm using the latest Moses from git.
>
> How long should I expect it to run before it hits an error? Could it be
> affected by the locale setting?
>
> Have you tried running without xml-input to see if you still have the
> problem?
>
> cheers - Barry
>
> On 19/01/16 05:43, Dingyuan Wang wrote:
>> Hi Barry,
>>
>> I've uploaded the model:
>> https://mega.nz/#!UsVSBCBJ!e5IATFvLqrCb5zhmDekLn8NOGw4PSD9RRQLGQeKEvNY
>>
>> To test the model, I included a script 'repeatnbest.sh' which runs moses
>> repeatedly until encoding error occurs.
>>
>> The file run7.best100.out and run7.out in the archive is the last run
>> that produces the error.
>>
>> It seems that it is WordTranslationFeature that causes the problem.
>>
>> ? 2016?01?19? 00:03, Barry Haddow ??:
>>> Hi Dingyuan
>>>
>>> Something is going wrong with the construction or outputting of feature
>>> names, and it looks like it's WordTranslationFeature that's the problem.
>>> Does the problem go away if you do not use word translation features?
>>>
>>> If you could make available a model that reproduces the nbest list
>>> construction then I would have a chance to debug it,
>>>
>>> cheers - Barry
>>>
>>> On 18/01/16 15:32, Dingyuan Wang wrote:
>>>> Hi Barry,
>>>>
>>>> I've checked all the models and corpora with the script, without
>>>> finding
>>>> any encoding problem.
>>>>
>>>> I also find that all such errors in nbest list occurs only in the
>>>> feature list (3 different samples), without affecting translation
>>>> result. Therefore, the phrase table or training corpus may not be the
>>>> problem.
>>>>
>>>> ? 2016?01?18? 23:04, Barry Haddow ??:
>>>>> Hi Dingyuan
>>>>>
>>>>> Are these encoding errors present in your phrase table? Are they
>>>>> present
>>>>> in your training corpus? Since they appear in the word translation
>>>>> features, and you are using a shortlist, are they in the shortlist
>>>>> files
>>>>> in the model directory? (These have names with "topn" in them afaik).
>>>>>
>>>>> File-system errors are unlikely, and for the most part Moses treats
>>>>> text
>>>>> as byte strings so encoding errors usually trace back to the source
>>>>> text.
>>>>>
>>>>> cheers - Barry
>>>>>
>>>>> On 18/01/16 14:56, Dingyuan Wang wrote:
>>>>>> Hi Barry,
>>>>>>
>>>>>> "The ones starting with the "@"" are due to corrupted bytes in the
>>>>>> nbest
>>>>>> list.
>>>>>>
>>>>>> This kind of corruption occurs from time to time. I wonder if it
>>>>>> comes
>>>>>> from memory errors or filesystem failure or some kind of
>>>>>> pointer/encoding problem in moses.
>>>>>>
>>>>>> I've written a script to find such corrupted lines:
>>>>>>
>>>>>> https://gist.github.com/gumblex/0d9d0848b435e4f9818f
>>>>>>
>>>>>> ? 2016?01?18? 20:42, Barry Haddow ??:
>>>>>>> Hi Dingyuan
>>>>>>>
>>>>>>> The extractor expects feature names to contain an underscore (not
>>>>>>> sure
>>>>>>> exactly why) but some of yours don't, and Moses skips them,
>>>>>>> interpreting
>>>>>>> their values as extra dense features.
>>>>>>>
>>>>>>> The attached screenshot shows my view of the offending names. The
>>>>>>> ones
>>>>>>> starting with the "@" are the problem. So it does look like the
>>>>>>> nbest
>>>>>>> list is corrupted. Can you run the decoder on just that sentence, to
>>>>>>> create an uncompressed version of the nbest list?
>>>>>>>
>>>>>>> cheers - Barry
>>>>>>>
>>>>>>> On 18/01/16 12:02, Dingyuan Wang wrote:
>>>>>>>> Hi Barry,
>>>>>>>>
>>>>>>>> Attached is the zgrep result.
>>>>>>>> I found that in the middle of line 61 a few bytes are corrupted. Is
>>>>>>>> that
>>>>>>>> a moses problem or my memory has a problem?
>>>>>>>>
>>>>>>>> I also checked other files using iconv, they are all OK in UTF-8.
>>>>>>>>
>>>>>>>> ? 2016?01?18? 19:32, Barry Haddow ??:
>>>>>>>>> Hi Dingyuan
>>>>>>>>>
>>>>>>>>> Yes, that's very possible. The error could be in extracting
>>>>>>>>> features.dat
>>>>>>>>> from the nbest list. Are you able to post the nbest list? Or at
>>>>>>>>> least
>>>>>>>>> the entries for sentence 16?
>>>>>>>>>
>>>>>>>>> Run something like
>>>>>>>>>
>>>>>>>>> zgrep "^16 " tuning/tmp.1/run7.best100.out.gz
>>>>>>>>>
>>>>>>>>> cheers - Barry
>>>>>>>>>
>>>>>>>>> On 18/01/16 11:24, Dingyuan Wang wrote:
>>>>>>>>>> Hi Barry,
>>>>>>>>>>
>>>>>>>>>> I have rerun the ems after the first email, and then posted the
>>>>>>>>>> recent
>>>>>>>>>> results, so the line changed.
>>>>>>>>>>
>>>>>>>>>> I just use the latest code, and the EMS script. Pretty much are
>>>>>>>>>> default
>>>>>>>>>> settings. The EMS setting is:
>>>>>>>>>>
>>>>>>>>>> sparse-features = "target-word-insertion top 50,
>>>>>>>>>> source-word-deletion
>>>>>>>>>> top 50, word-translation top 50 50, phrase-length"
>>>>>>>>>>
>>>>>>>>>> I suspect there is something unexpected in the extractor.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ? 2016?01?18? 19:03, Barry Haddow ??:
>>>>>>>>>>> Hi Dingyuan
>>>>>>>>>>>
>>>>>>>>>>> In fact it is not the sparse features nor the Asian characters
>>>>>>>>>>> that
>>>>>>>>>>> are
>>>>>>>>>>> the problem. The offending line has 17 dense features, yet your
>>>>>>>>>>> model
>>>>>>>>>>> has 14 dense features.
>>>>>>>>>>>
>>>>>>>>>>> The string "1 1 1" appears directly after the language model
>>>>>>>>>>> feature in
>>>>>>>>>>> line 1694, in your attachment, adding the extra 3 features. Note
>>>>>>>>>>> that
>>>>>>>>>>> this is not the line you mentioned in your earlier email.
>>>>>>>>>>>
>>>>>>>>>>> I have no idea why there are extra features. Have you made
>>>>>>>>>>> changes to
>>>>>>>>>>> any of the core Moses features?
>>>>>>>>>>>
>>>>>>>>>>> best wishes
>>>>>>>>>>> Barry
>>>>>>>>>>>
>>>>>>>>>>> The offending line:
>>>>>>>>>>> what(): Error in line "-5.44027 0 0 -5.34901 0 0 0 -224.872 1 1
>>>>>>>>>>> 1 -39
>>>>>>>>>>> 18 -26.2331 -40.6736 -44.3698 -82.5072 WT_?~?=3 WT_?~?=1
>>>>>>>>>>> WT_?~?=1
>>>>>>>>>>> WT_?~?=1 WT_?~?=1 PL_s3=5 PL_3,2=2 PL_3,3=3 PL_2,3=4 PL_t3=7
>>>>>>>>>>> PL_s1=5
>>>>>>>>>>> PL_1,2=2 PL_1,1=3 PL_t1=4 PL_2,2=3 PL_t2=7 PL_s2=8 PL_2,1=1 WT_
>>>>>>>>>>> ?~?=1
>>>>>>>>>>> WT_?~?=1 WT_?~?=1 WT_?~?=1 WT_?~?=1 WT_?~?=1 WT_?~
>>>>>>>>>>> ?=1
>>>>>>>>>>> WT_?~
>>>>>>>>>>> ?=1 WT_??~?=1 WT_??~?=1 WT_?~?=1 WT_?~?=1 WT_?~?
>>>>>>>>>>> ?=1
>>>>>>>>>>> WT_?~
>>>>>>>>>>> ?=1 WT_?~?=1 WT_?~??=1 WT_?~??=1 WT_?~?=1 WT_?~?=1
>>>>>>>>>>> WT_
>>>>>>>>>>> ?~?
>>>>>>>>>>> ?=1 WT_?~?=1 WT_?~??=1 WT_?~?=1 WT_?~??=1 WT_?~?
>>>>>>>>>>> ?=1
>>>>>>>>>>> WT_?
>>>>>>>>>>> ?~??=1 WT_?~??=1 WT_?~?=1 WT_?~?=1 WT_?~?=1 WT_?~?
>>>>>>>>>>> ?=1 WT_
>>>>>>>>>>> ?~??=1 WT_??~??=1 " of ...
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 18/01/16 10:37, Dingyuan Wang wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I've attached that. The line number is 1694.
>>>>>>>>>>>>
>>>>>>>>>>>> ? 2016?01?18? 16:43, Barry Haddow ??:
>>>>>>>>>>>>> Hi Dingyuan
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is it possible to attach the features.dat file that is
>>>>>>>>>>>>> causing the
>>>>>>>>>>>>> error? Almost certainly Moses is failing to parse the line
>>>>>>>>>>>>> because of
>>>>>>>>>>>>> the Asian characters in the feature names,
>>>>>>>>>>>>>
>>>>>>>>>>>>> cheers - Barry
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 16/01/16 15:58, Dingyuan Wang wrote:
>>>>>>>>>>>>>> I ran
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~/software/moses/bin/kbmira -J 75 --dense-init run7.dense
>>>>>>>>>>>>>> --sparse-init
>>>>>>>>>>>>>> run7.sparse-weights --ffile run1.features.dat --ffile
>>>>>>>>>>>>>> run2.features.dat
>>>>>>>>>>>>>> --ffile run3.features.dat --ffile run4.features.dat --ffile
>>>>>>>>>>>>>> run5.features.dat --ffile run6.features.dat --ffile
>>>>>>>>>>>>>> run7.features.dat
>>>>>>>>>>>>>> --scfile run1.scores.dat --scfile run2.scores.dat --scfile
>>>>>>>>>>>>>> run3.scores.dat --scfile run4.scores.dat --scfile
>>>>>>>>>>>>>> run5.scores.dat
>>>>>>>>>>>>>> --scfile run6.scores.dat --scfile run7.scores.dat -o
>>>>>>>>>>>>>> /tmp/mert.out
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in the tuning/tmp.1 directory, which will certainly
>>>>>>>>>>>>>> replicate the
>>>>>>>>>>>>>> error.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ? 2016?01?16? 23:42, Hieu Hoang ??:
>>>>>>>>>>>>>>> The mert script prints out every command it runs. You
>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>> able to
>>>>>>>>>>>>>>> replicate the error by running the last command
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 16 Jan 2016 14:18, "Dingyuan Wang" <abcdoyle888@gmail.com
>>>>>>>>>>>>>>> <mailto:abcdoyle888@gmail.com>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sorry, but I can't reliably replicate the same
>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>> TUNING_tune.1 alone. There is no character '_' in
>>>>>>>>>>>>>>> the test
>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>> or top50
>>>>>>>>>>>>>>> list.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm using sparse-features =
>>>>>>>>>>>>>>> "target-word-insertion
>>>>>>>>>>>>>>> top 50,
>>>>>>>>>>>>>>> source-word-deletion top 50, word-translation
>>>>>>>>>>>>>>> top 50
>>>>>>>>>>>>>>> 50,
>>>>>>>>>>>>>>> phrase-length"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've attached some related files from EMS and the
>>>>>>>>>>>>>>> EMS
>>>>>>>>>>>>>>> config.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://mega.nz/#!xs0SFKxL!M_RTBp1JGX24-b4xlYYLP-bLXKiC_Sl-p96x55avAB4
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ? 2016?01?16? 02:45, Hieu Hoang ??:
>>>>>>>>>>>>>>> > could you make your model files available for
>>>>>>>>>>>>>>> download so I
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> > replicate this problem.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > it seems like you're using a feature
>>>>>>>>>>>>>>> function with
>>>>>>>>>>>>>>> sparse
>>>>>>>>>>>>>>> scores. I
>>>>>>>>>>>>>>> > think the character '_' must be escaped.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On 12/01/16 04:00, Dingyuan Wang wrote:
>>>>>>>>>>>>>>> >> Hi all,
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> I'm using EMS for doing experiments. Every
>>>>>>>>>>>>>>> time the
>>>>>>>>>>>>>>> kbmira
>>>>>>>>>>>>>>> died with
>>>>>>>>>>>>>>> >> SIGABRT when turning on one direction, while
>>>>>>>>>>>>>>> tuning
>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>> opposite
>>>>>>>>>>>>>>> >> direction (same config and test set) was
>>>>>>>>>>>>>>> successful.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> The mert.log (stderr) shows follows:
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> kbmira with c=0.01 decay=0.999 no_shuffle=0
>>>>>>>>>>>>>>> >> Initialising random seed from system clock
>>>>>>>>>>>>>>> >> Found 15323 initial sparse features
>>>>>>>>>>>>>>> >> ....terminate called after throwing an
>>>>>>>>>>>>>>> instance of
>>>>>>>>>>>>>>> >> 'MosesTuning::FileFormatException'
>>>>>>>>>>>>>>> >> what(): Error in line "-4.51933 0 0
>>>>>>>>>>>>>>> -6.09733
>>>>>>>>>>>>>>> 0 0 0
>>>>>>>>>>>>>>> -121.556 2
>>>>>>>>>>>>>>> -20 12
>>>>>>>>>>>>>>> >> -31.6201 -38.5211 -26.5112 -60.6166 WT_?~?=2
>>>>>>>>>>>>>>> WT_?~?=1
>>>>>>>>>>>>>>> PL_s1=4
>>>>>>>>>>>>>>> >> PL_s3=1 PL_3,3=1 PL_2,2=3 PL_1,2=1 PL_2,1=3
>>>>>>>>>>>>>>> PL_t1=6
>>>>>>>>>>>>>>> PL_t2=4
>>>>>>>>>>>>>>> PL_t3=2
>>>>>>>>>>>>>>> >> PL_2,3=1 PL_s2=7 PL_1,1=3 WT_?~??=1 WT_?~
>>>>>>>>>>>>>>> ??=1
>>>>>>>>>>>>>>> WT_?~
>>>>>>>>>>>>>>> ?=1
>>>>>>>>>>>>>>> WT_?~?
>>>>>>>>>>>>>>> >> ?=1 WT_?~?=1 WT_?~?=2 WT_?~?=1 WT_
>>>>>>>>>>>>>>> ?~?=1
>>>>>>>>>>>>>>> WT_?~
>>>>>>>>>>>>>>> ??=1
>>>>>>>>>>>>>>> WT_
>>>>>>>>>>>>>>> ?~?=1
>>>>>>>>>>>>>>> >> WT_?~??=1 WT_?~?=1 WT_?~??=1 WT_?~
>>>>>>>>>>>>>>> ??=1
>>>>>>>>>>>>>>> WT_?~?
>>>>>>>>>>>>>>> ?=1 WT_?~
>>>>>>>>>>>>>>> >> ?=1 WT_?~??=1 " of run7.features.dat
>>>>>>>>>>>>>>> >> Aborted
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> I think since run7.scores.dat is generated by
>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>> scripts, I
>>>>>>>>>>>>>>> wouldn't
>>>>>>>>>>>>>>> >> be responsible for making the bad format. Last
>>>>>>>>>>>>>>> time it
>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>> died, I
>>>>>>>>>>>>>>> >> removed the likely offending line in the test
>>>>>>>>>>>>>>> set, but
>>>>>>>>>>>>>>> this time
>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>> >> line appears.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>> >> Dingyuan Wang
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> >> Moses-support mailing list
>>>>>>>>>>>>>>> >> Moses-support@mit.edu
>>>>>>>>>>>>>>> <mailto:Moses-support@mit.edu>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Dingyuan Wang (gumblex)
>>>>>>>>>>>>>>>
>>>
>
>

--
Dingyuan Wang (gumblex)

------------------------------

Message: 2
Date: Tue, 19 Jan 2016 16:31:35 +0000
From: Barry Haddow <bhaddow@inf.ed.ac.uk>
Subject: Re: [Moses-support] kbmira died with SIGABRT when tuning
To: Dingyuan Wang <abcdoyle888@gmail.com>, Hieu Hoang
<hieuhoang@gmail.com>
Cc: moses-support <moses-support@mit.edu>
Message-ID: <569E64E7.9030809@inf.ed.ac.uk>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi Dingyuan

I ran for over 200 iterations and saw no problem. I tried with your LANG
and LANGUAGE settings (I don't have the right packages for the other
settings) and still saw no failure.

Maybe it is a random pointer/memory problem like you suggested. I have
started running your model with valgrind, but nothing so far,

cheers - Barry

On 19/01/16 14:26, Dingyuan Wang wrote:
> Hi Barry,
>
> It usually hits an error in about 1~10 iterations on my laptop. I don't
> know what triggers that, so it may be a probability problem.
>
> Disabling xml-input won't help. I think I should use verbose output.
>
> My locale settings is:
>
> LANG=zh_CN.UTF-8
> LANGUAGE=zh_CN.UTF-8:zh_TW.UTF-8:zh_HK.utf8:en_US.utf8
> LC_CTYPE="zh_CN.UTF-8"
> LC_NUMERIC="zh_CN.UTF-8"
> LC_TIME="zh_CN.UTF-8"
> LC_COLLATE="zh_CN.UTF-8"
> LC_MONETARY="zh_CN.UTF-8"
> LC_MESSAGES="zh_CN.UTF-8"
> LC_PAPER="zh_CN.UTF-8"
> LC_NAME="zh_CN.UTF-8"
> LC_ADDRESS="zh_CN.UTF-8"
> LC_TELEPHONE="zh_CN.UTF-8"
> LC_MEASUREMENT="zh_CN.UTF-8"
> LC_IDENTIFICATION="zh_CN.UTF-8"
> LC_ALL=
>
> ? 2016?01?19? 19:20, Barry Haddow ??:
>> Hi Dingyuan
>>
>> I have your script and model running, but so far it has not reported any
>> errors. It's at iteration 27, and I'm using the latest Moses from git.
>>
>> How long should I expect it to run before it hits an error? Could it be
>> affected by the locale setting?
>>
>> Have you tried running without xml-input to see if you still have the
>> problem?
>>
>> cheers - Barry
>>
>> On 19/01/16 05:43, Dingyuan Wang wrote:
>>> Hi Barry,
>>>
>>> I've uploaded the model:
>>> https://mega.nz/#!UsVSBCBJ!e5IATFvLqrCb5zhmDekLn8NOGw4PSD9RRQLGQeKEvNY
>>>
>>> To test the model, I included a script 'repeatnbest.sh' which runs moses
>>> repeatedly until encoding error occurs.
>>>
>>> The file run7.best100.out and run7.out in the archive is the last run
>>> that produces the error.
>>>
>>> It seems that it is WordTranslationFeature that causes the problem.
>>>
>>> ? 2016?01?19? 00:03, Barry Haddow ??:
>>>> Hi Dingyuan
>>>>
>>>> Something is going wrong with the construction or outputting of feature
>>>> names, and it looks like it's WordTranslationFeature that's the problem.
>>>> Does the problem go away if you do not use word translation features?
>>>>
>>>> If you could make available a model that reproduces the nbest list
>>>> construction then I would have a chance to debug it,
>>>>
>>>> cheers - Barry
>>>>
>>>> On 18/01/16 15:32, Dingyuan Wang wrote:
>>>>> Hi Barry,
>>>>>
>>>>> I've checked all the models and corpora with the script, without
>>>>> finding
>>>>> any encoding problem.
>>>>>
>>>>> I also find that all such errors in nbest list occurs only in the
>>>>> feature list (3 different samples), without affecting translation
>>>>> result. Therefore, the phrase table or training corpus may not be the
>>>>> problem.
>>>>>
>>>>> ? 2016?01?18? 23:04, Barry Haddow ??:
>>>>>> Hi Dingyuan
>>>>>>
>>>>>> Are these encoding errors present in your phrase table? Are they
>>>>>> present
>>>>>> in your training corpus? Since they appear in the word translation
>>>>>> features, and you are using a shortlist, are they in the shortlist
>>>>>> files
>>>>>> in the model directory? (These have names with "topn" in them afaik).
>>>>>>
>>>>>> File-system errors are unlikely, and for the most part Moses treats
>>>>>> text
>>>>>> as byte strings so encoding errors usually trace back to the source
>>>>>> text.
>>>>>>
>>>>>> cheers - Barry
>>>>>>
>>>>>> On 18/01/16 14:56, Dingyuan Wang wrote:
>>>>>>> Hi Barry,
>>>>>>>
>>>>>>> "The ones starting with the "@"" are due to corrupted bytes in the
>>>>>>> nbest
>>>>>>> list.
>>>>>>>
>>>>>>> This kind of corruption occurs from time to time. I wonder if it
>>>>>>> comes
>>>>>>> from memory errors or filesystem failure or some kind of
>>>>>>> pointer/encoding problem in moses.
>>>>>>>
>>>>>>> I've written a script to find such corrupted lines:
>>>>>>>
>>>>>>> https://gist.github.com/gumblex/0d9d0848b435e4f9818f
>>>>>>>
>>>>>>> ? 2016?01?18? 20:42, Barry Haddow ??:
>>>>>>>> Hi Dingyuan
>>>>>>>>
>>>>>>>> The extractor expects feature names to contain an underscore (not
>>>>>>>> sure
>>>>>>>> exactly why) but some of yours don't, and Moses skips them,
>>>>>>>> interpreting
>>>>>>>> their values as extra dense features.
>>>>>>>>
>>>>>>>> The attached screenshot shows my view of the offending names. The
>>>>>>>> ones
>>>>>>>> starting with the "@" are the problem. So it does look like the
>>>>>>>> nbest
>>>>>>>> list is corrupted. Can you run the decoder on just that sentence, to
>>>>>>>> create an uncompressed version of the nbest list?
>>>>>>>>
>>>>>>>> cheers - Barry
>>>>>>>>
>>>>>>>> On 18/01/16 12:02, Dingyuan Wang wrote:
>>>>>>>>> Hi Barry,
>>>>>>>>>
>>>>>>>>> Attached is the zgrep result.
>>>>>>>>> I found that in the middle of line 61 a few bytes are corrupted. Is
>>>>>>>>> that
>>>>>>>>> a moses problem or my memory has a problem?
>>>>>>>>>
>>>>>>>>> I also checked other files using iconv, they are all OK in UTF-8.
>>>>>>>>>
>>>>>>>>> ? 2016?01?18? 19:32, Barry Haddow ??:
>>>>>>>>>> Hi Dingyuan
>>>>>>>>>>
>>>>>>>>>> Yes, that's very possible. The error could be in extracting
>>>>>>>>>> features.dat
>>>>>>>>>> from the nbest list. Are you able to post the nbest list? Or at
>>>>>>>>>> least
>>>>>>>>>> the entries for sentence 16?
>>>>>>>>>>
>>>>>>>>>> Run something like
>>>>>>>>>>
>>>>>>>>>> zgrep "^16 " tuning/tmp.1/run7.best100.out.gz
>>>>>>>>>>
>>>>>>>>>> cheers - Barry
>>>>>>>>>>
>>>>>>>>>> On 18/01/16 11:24, Dingyuan Wang wrote:
>>>>>>>>>>> Hi Barry,
>>>>>>>>>>>
>>>>>>>>>>> I have rerun the ems after the first email, and then posted the
>>>>>>>>>>> recent
>>>>>>>>>>> results, so the line changed.
>>>>>>>>>>>
>>>>>>>>>>> I just use the latest code, and the EMS script. Pretty much are
>>>>>>>>>>> default
>>>>>>>>>>> settings. The EMS setting is:
>>>>>>>>>>>
>>>>>>>>>>> sparse-features = "target-word-insertion top 50,
>>>>>>>>>>> source-word-deletion
>>>>>>>>>>> top 50, word-translation top 50 50, phrase-length"
>>>>>>>>>>>
>>>>>>>>>>> I suspect there is something unexpected in the extractor.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ? 2016?01?18? 19:03, Barry Haddow ??:
>>>>>>>>>>>> Hi Dingyuan
>>>>>>>>>>>>
>>>>>>>>>>>> In fact it is not the sparse features nor the Asian characters
>>>>>>>>>>>> that
>>>>>>>>>>>> are
>>>>>>>>>>>> the problem. The offending line has 17 dense features, yet your
>>>>>>>>>>>> model
>>>>>>>>>>>> has 14 dense features.
>>>>>>>>>>>>
>>>>>>>>>>>> The string "1 1 1" appears directly after the language model
>>>>>>>>>>>> feature in
>>>>>>>>>>>> line 1694, in your attachment, adding the extra 3 features. Note
>>>>>>>>>>>> that
>>>>>>>>>>>> this is not the line you mentioned in your earlier email.
>>>>>>>>>>>>
>>>>>>>>>>>> I have no idea why there are extra features. Have you made
>>>>>>>>>>>> changes to
>>>>>>>>>>>> any of the core Moses features?
>>>>>>>>>>>>
>>>>>>>>>>>> best wishes
>>>>>>>>>>>> Barry
>>>>>>>>>>>>
>>>>>>>>>>>> The offending line:
>>>>>>>>>>>> what(): Error in line "-5.44027 0 0 -5.34901 0 0 0 -224.872 1 1
>>>>>>>>>>>> 1 -39
>>>>>>>>>>>> 18 -26.2331 -40.6736 -44.3698 -82.5072 WT_?~?=3 WT_?~?=1
>>>>>>>>>>>> WT_?~?=1
>>>>>>>>>>>> WT_?~?=1 WT_?~?=1 PL_s3=5 PL_3,2=2 PL_3,3=3 PL_2,3=4 PL_t3=7
>>>>>>>>>>>> PL_s1=5
>>>>>>>>>>>> PL_1,2=2 PL_1,1=3 PL_t1=4 PL_2,2=3 PL_t2=7 PL_s2=8 PL_2,1=1 WT_
>>>>>>>>>>>> ?~?=1
>>>>>>>>>>>> WT_?~?=1 WT_?~?=1 WT_?~?=1 WT_?~?=1 WT_?~?=1 WT_?~
>>>>>>>>>>>> ?=1
>>>>>>>>>>>> WT_?~
>>>>>>>>>>>> ?=1 WT_??~?=1 WT_??~?=1 WT_?~?=1 WT_?~?=1 WT_?~?
>>>>>>>>>>>> ?=1
>>>>>>>>>>>> WT_?~
>>>>>>>>>>>> ?=1 WT_?~?=1 WT_?~??=1 WT_?~??=1 WT_?~?=1 WT_?~?=1
>>>>>>>>>>>> WT_
>>>>>>>>>>>> ?~?
>>>>>>>>>>>> ?=1 WT_?~?=1 WT_?~??=1 WT_?~?=1 WT_?~??=1 WT_?~?
>>>>>>>>>>>> ?=1
>>>>>>>>>>>> WT_?
>>>>>>>>>>>> ?~??=1 WT_?~??=1 WT_?~?=1 WT_?~?=1 WT_?~?=1 WT_?~?
>>>>>>>>>>>> ?=1 WT_
>>>>>>>>>>>> ?~??=1 WT_??~??=1 " of ...
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 18/01/16 10:37, Dingyuan Wang wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've attached that. The line number is 1694.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ? 2016?01?18? 16:43, Barry Haddow ??:
>>>>>>>>>>>>>> Hi Dingyuan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is it possible to attach the features.dat file that is
>>>>>>>>>>>>>> causing the
>>>>>>>>>>>>>> error? Almost certainly Moses is failing to parse the line
>>>>>>>>>>>>>> because of
>>>>>>>>>>>>>> the Asian characters in the feature names,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cheers - Barry
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 16/01/16 15:58, Dingyuan Wang wrote:
>>>>>>>>>>>>>>> I ran
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~/software/moses/bin/kbmira -J 75 --dense-init run7.dense
>>>>>>>>>>>>>>> --sparse-init
>>>>>>>>>>>>>>> run7.sparse-weights --ffile run1.features.dat --ffile
>>>>>>>>>>>>>>> run2.features.dat
>>>>>>>>>>>>>>> --ffile run3.features.dat --ffile run4.features.dat --ffile
>>>>>>>>>>>>>>> run5.features.dat --ffile run6.features.dat --ffile
>>>>>>>>>>>>>>> run7.features.dat
>>>>>>>>>>>>>>> --scfile run1.scores.dat --scfile run2.scores.dat --scfile
>>>>>>>>>>>>>>> run3.scores.dat --scfile run4.scores.dat --scfile
>>>>>>>>>>>>>>> run5.scores.dat
>>>>>>>>>>>>>>> --scfile run6.scores.dat --scfile run7.scores.dat -o
>>>>>>>>>>>>>>> /tmp/mert.out
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> in the tuning/tmp.1 directory, which will certainly
>>>>>>>>>>>>>>> replicate the
>>>>>>>>>>>>>>> error.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ? 2016?01?16? 23:42, Hieu Hoang ??:
>>>>>>>>>>>>>>>> The mert script prints out every command it runs. You
>>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>> able to
>>>>>>>>>>>>>>>> replicate the error by running the last command
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 16 Jan 2016 14:18, "Dingyuan Wang" <abcdoyle888@gmail.com
>>>>>>>>>>>>>>>> <mailto:abcdoyle888@gmail.com>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sorry, but I can't reliably replicate the same
>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>> TUNING_tune.1 alone. There is no character '_' in
>>>>>>>>>>>>>>>> the test
>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>> or top50
>>>>>>>>>>>>>>>> list.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm using sparse-features =
>>>>>>>>>>>>>>>> "target-word-insertion
>>>>>>>>>>>>>>>> top 50,
>>>>>>>>>>>>>>>> source-word-deletion top 50, word-translation
>>>>>>>>>>>>>>>> top 50
>>>>>>>>>>>>>>>> 50,
>>>>>>>>>>>>>>>> phrase-length"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've attached some related files from EMS and the
>>>>>>>>>>>>>>>> EMS
>>>>>>>>>>>>>>>> config.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://mega.nz/#!xs0SFKxL!M_RTBp1JGX24-b4xlYYLP-bLXKiC_Sl-p96x55avAB4
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ? 2016?01?16? 02:45, Hieu Hoang ??:
>>>>>>>>>>>>>>>> > could you make your model files available for
>>>>>>>>>>>>>>>> download so I
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>> > replicate this problem.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > it seems like you're using a feature
>>>>>>>>>>>>>>>> function with
>>>>>>>>>>>>>>>> sparse
>>>>>>>>>>>>>>>> scores. I
>>>>>>>>>>>>>>>> > think the character '_' must be escaped.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On 12/01/16 04:00, Dingyuan Wang wrote:
>>>>>>>>>>>>>>>> >> Hi all,
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> I'm using EMS for doing experiments. Every
>>>>>>>>>>>>>>>> time the
>>>>>>>>>>>>>>>> kbmira
>>>>>>>>>>>>>>>> died with
>>>>>>>>>>>>>>>> >> SIGABRT when turning on one direction, while
>>>>>>>>>>>>>>>> tuning
>>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>> opposite
>>>>>>>>>>>>>>>> >> direction (same config and test set) was
>>>>>>>>>>>>>>>> successful.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> The mert.log (stderr) shows follows:
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> kbmira with c=0.01 decay=0.999 no_shuffle=0
>>>>>>>>>>>>>>>> >> Initialising random seed from system clock
>>>>>>>>>>>>>>>> >> Found 15323 initial sparse features
>>>>>>>>>>>>>>>> >> ....terminate called after throwing an
>>>>>>>>>>>>>>>> instance of
>>>>>>>>>>>>>>>> >> 'MosesTuning::FileFormatException'
>>>>>>>>>>>>>>>> >> what(): Error in line "-4.51933 0 0
>>>>>>>>>>>>>>>> -6.09733
>>>>>>>>>>>>>>>> 0 0 0
>>>>>>>>>>>>>>>> -121.556 2
>>>>>>>>>>>>>>>> -20 12
>>>>>>>>>>>>>>>> >> -31.6201 -38.5211 -26.5112 -60.6166 WT_?~?=2
>>>>>>>>>>>>>>>> WT_?~?=1
>>>>>>>>>>>>>>>> PL_s1=4
>>>>>>>>>>>>>>>> >> PL_s3=1 PL_3,3=1 PL_2,2=3 PL_1,2=1 PL_2,1=3
>>>>>>>>>>>>>>>> PL_t1=6
>>>>>>>>>>>>>>>> PL_t2=4
>>>>>>>>>>>>>>>> PL_t3=2
>>>>>>>>>>>>>>>> >> PL_2,3=1 PL_s2=7 PL_1,1=3 WT_?~??=1 WT_?~
>>>>>>>>>>>>>>>> ??=1
>>>>>>>>>>>>>>>> WT_?~
>>>>>>>>>>>>>>>> ?=1
>>>>>>>>>>>>>>>> WT_?~?
>>>>>>>>>>>>>>>> >> ?=1 WT_?~?=1 WT_?~?=2 WT_?~?=1 WT_
>>>>>>>>>>>>>>>> ?~?=1
>>>>>>>>>>>>>>>> WT_?~
>>>>>>>>>>>>>>>> ??=1
>>>>>>>>>>>>>>>> WT_
>>>>>>>>>>>>>>>> ?~?=1
>>>>>>>>>>>>>>>> >> WT_?~??=1 WT_?~?=1 WT_?~??=1 WT_?~
>>>>>>>>>>>>>>>> ??=1
>>>>>>>>>>>>>>>> WT_?~?
>>>>>>>>>>>>>>>> ?=1 WT_?~
>>>>>>>>>>>>>>>> >> ?=1 WT_?~??=1 " of run7.features.dat
>>>>>>>>>>>>>>>> >> Aborted
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> I think since run7.scores.dat is generated by
>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>> scripts, I
>>>>>>>>>>>>>>>> wouldn't
>>>>>>>>>>>>>>>> >> be responsible for making the bad format. Last
>>>>>>>>>>>>>>>> time it
>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>> died, I
>>>>>>>>>>>>>>>> >> removed the likely offending line in the test
>>>>>>>>>>>>>>>> set, but
>>>>>>>>>>>>>>>> this time
>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>> >> line appears.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>> >> Dingyuan Wang
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> >> Moses-support mailing list
>>>>>>>>>>>>>>>> >> Moses-support@mit.edu
>>>>>>>>>>>>>>>> <mailto:Moses-support@mit.edu>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Dingyuan Wang (gumblex)
>>>>>>>>>>>>>>>>
>>

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

End of Moses-support Digest, Vol 111, Issue 58
**********************************************

Moses-support Digest, Vol 111, Issue 58

0 Response to "Moses-support Digest, Vol 111, Issue 58"

Post a Comment