Moses-support Digest, Vol 87, Issue 60

Send Moses-support mailing list submissions to
moses-support@mit.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-request@mit.edu

You can reach the person managing the list at
moses-support-owner@mit.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

1. Difference between lexical reordering and distortion? (Andrew)
2. Re: EMS MML IndexError: list index out of range (jian zhang)


----------------------------------------------------------------------

Message: 1
Date: Mon, 27 Jan 2014 05:04:05 +0900
From: Andrew <ravenyj@hotmail.com>
Subject: [Moses-support] Difference between lexical reordering and
distortion?
To: "moses-support@mit.edu" <moses-support@mit.edu>
Message-ID: <BLU171-W1EC296943B760F760B812B2A30@phx.gbl>
Content-Type: text/plain; charset="iso-2022-jp"

Hi,
in the tutorial (http://www.statmt.org/moses/?n=Moses.Tutorial),
distortion model is said to be responsible for the reordering of the input,
but in moses.ini file, there are separate weights for lexical reordering and distortion model.
So I was wondering how they are different.
Thank you in advance for your help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140127/bb487783/attachment-0001.htm

------------------------------

Message: 2
Date: Mon, 27 Jan 2014 00:14:32 +0000
From: jian zhang <jianzhang09@gmail.com>
Subject: Re: [Moses-support] EMS MML IndexError: list index out of
range
To: Barry Haddow <bhaddow@staffmail.ed.ac.uk>
Cc: moses-support@mit.edu
Message-ID:
<CALA=z0C1tUaX6FOesEh8ZGh0pu+qhr51ZUPyD8Xd6H2+XCOhAg@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Barry,

The domains.1 file contains correct line numbers, however, the file names
(news and other) are suspect.

My [CORPUS] has defined
[CORPUS:in]
clean-stem = $training-in-domain-corpus
[CORPUS:out]
clean-stem = $training-out-domain-corpus

and before it, there are

input-extension = fr
output-extension = en
$training-in-domain-corpus = /home/corpus/in-domain-fr-en/news.fr-en.tc.cl
$training-out-domain-corpus = /home/corpus/out-domain-fr-en/
other.fr-en.tc.cl

However, when running TRAINING_mml-score.1, first the $FILTER_DOMAIN loads
the filtering domain name and "out" is loaded (defined at
mml-filter-corpora = out), then checks the available domains from domains.1
file at the next while (<DOMAIN>) {...} loop, where news and other are
loaded. Since the domain names are not matched between available domains
and filtering domains and caused $DOMAIN_FILTERED{$line_number} is null all
the time. The result is subroutine check_sentence_filtered always returns
false and a sentence will be always in domain (score 99999).

After I change the short names "in" and "out" to "news" and "others", the
TRAINING_mml-filter-before-wa did not report any error.

Thanks again.

Jian





On Sun, Jan 26, 2014 at 12:37 PM, Barry Haddow
<bhaddow@staffmail.ed.ac.uk>wrote:

> Hi Jian
>
> The logic looks correct to me. If the domains file has been provided, we
> then need to check if the sentence is in-domain. If the domains file is not
> provided, then all sentences are considered out-of-domain.
>
> The fact that all scores are 99999 means that the MML filter is seeing all
> your sentences as in-domain. It could be that something went wrong during
> corpus preprocessing, or during the creation of the domains file
> (/home/mml/mml-test/experiment/model/domains.1). Do the lengths in the
> domains file match the lengths of your in and out corpora?
>
> cheers - Barry
>
>
> On 25/01/14 03:29, jian zhang wrote:
>
> Hi Barry, I don't not understand line *if (defined($filter_domains) &&
> !&check_sentence_filtered($i))* at mml-score.perl, before computing the
> bilingual cross-entropy difference,
> Should it not be *if (!defined($filter_domains) &&
> !&check_sentence_filtered($i)) *?
>
> Regards,
>
> Jian Zhang
>
>
>
>
> On Fri, Jan 24, 2014 at 10:27 PM, jian zhang <jianzhang09@gmail.com>wrote:
>
>> Hi Barry,
>>
>> All the scores are 99999 in that file.
>>
>> Thanks,
>>
>>
>> Jian
>>
>>
>> On Fri, Jan 24, 2014 at 3:51 PM, Barry Haddow <
>> bhaddow@staffmail.ed.ac.uk> wrote:
>>
>>> Hi Jian
>>>
>>> This is a bit suspect:
>>>
>>>
>>> 2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring 2075137
>>>
>>> Are the scores in this file sensible (or are they all the same?)
>>>
>>> /home/mml/mml-test/experiment/training/corpus-mml-score.1
>>>
>>> cheers - Barry
>>>
>>>
>>> On 24/01/14 14:53, jian zhang wrote:
>>>
>>>> Hi,
>>>>
>>>> I got error of IndexError: list index out of range at the
>>>> TRAINING_mml-filter-before-wa step.
>>>>
>>>> I had read the post at
>>>> https://www.mail-archive.com/moses-support@mit.edu/msg08767.html,
>>>> however I still can not figure out what is wrong.
>>>>
>>>> The full error is
>>>>
>>>> general:strategy = Score
>>>> general:source_language = fr
>>>> general:target_language = en
>>>> general:input_stem = /home/mml/mml-test/experiment/training/corpus.1
>>>> general:output_stem =
>>>> /home/mml/mml-test/experiment/training/corpus-mml.1
>>>> general:domain_file = /home/mml/mml-test/experiment/model/domains.1
>>>> general:domain_file_out =
>>>> /home/mml/mml-test/experiment/training/corpus-mml.1
>>>> score:score_file =
>>>> /home/mml/mml-test/experiment/training/corpus-mml-score.1
>>>> score:proportion = 0.9
>>>>
>>>> 2014-01-24 14:17:26,276 Retaining at least 0 entries and ignoring
>>>> 2075137
>>>> Traceback (most recent call last):
>>>> File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py",
>>>> line 156, in <module>
>>>> main()
>>>> File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py",
>>>> line 111, in main
>>>> strategy = strategy_class(config)
>>>> File "/home/tools/mosesdecoder/scripts/ems/support/mml-filter.py",
>>>> line 72, in __init__
>>>> [float(line[:-1]) for line in open(self.score_file)],
>>>> reverse=True)[ignore_count + count]
>>>> IndexError: list index out of range
>>>>
>>>> And my ems configuration file has:
>>>>
>>>> #################################################################
>>>> # PARALLEL CORPUS PREPARATION:
>>>> # create a tokenized, sentence-aligned corpus, ready for training
>>>>
>>>> [CORPUS]
>>>>
>>>> #in-domain parallel corpus
>>>> [CORPUS:in]
>>>> clean-stem = $training-in-domain-corpus
>>>>
>>>> [CORPUS:out]
>>>> #out-domain parallel corpus
>>>> clean-stem = $training-out-domain-corpus
>>>>
>>>>
>>>> #################################################################
>>>> # LANGUAGE MODEL TRAINING
>>>> [LM]
>>>> [LM:lm]
>>>> type = 8
>>>> lm = $language-model
>>>> #################################################################
>>>> # MODIFIED MOORE LEWIS FILTERING
>>>>
>>>> [MML]
>>>>
>>>> lm-training = $srilm-dir/ngram-count
>>>> lm-settings = "-interpolate -kndiscount -unk"
>>>> lm-binarizer = $moses-src-dir/bin/build_binary
>>>> lm-query = $moses-src-dir/bin/query
>>>> order = 5
>>>>
>>>> ### in-/out-of-domain source/target corpora to train the 4 language
>>>> model
>>>> #
>>>> # in-domain parallel corpus
>>>> indomain-stem = [CORPUS:in:clean-split-stem]
>>>>
>>>> # out-of-domain parallel corpus
>>>> outdomain-stem = [CORPUS:out:clean-split-stem]
>>>>
>>>> # settings: number of lines sampled from the corpora to train each
>>>> language model on
>>>> settings = "--line-count 100000"
>>>>
>>>> #################################################################
>>>> # TRANSLATION MODEL TRAINING
>>>> [TRAINING]
>>>> script = $moses-script-dir/training/train-model.perl
>>>> training-options = "-mgiza -mgiza-cpus 12 -sort-buffer-size 16G
>>>> -sort-compress gzip -sort-parallel 12 -cores 12"
>>>> parallel = yes
>>>> alignment-symmetrization-method = grow-diag-final-and
>>>> lexicalized-reordering = msd-bidirectional-fe
>>>> score-settings = "--GoodTuring"
>>>> include-word-alignment-in-rules = yes
>>>>
>>>> #space separated all out-of domain corpora to be filtered
>>>> mml-filter-corpora = out
>>>> mml-before-wa = "-proportion 0.9"
>>>>
>>>> #####################################################
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> Jian Zhang
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>> --
>>> Jian Zhang
>>> Centre for Next Generation Localisation (CNGL)<http://www.cngl.ie/index.html>
>>> Dublin City University <http://www.dcu.ie/>
>>>
>>>
>>>
>>>
>
>
> --
> Jian Zhang
> Centre for Next Generation Localisation (CNGL)<http://www.cngl.ie/index.html>
> Dublin City University <http://www.dcu.ie/>
>
>
>


--
Jian Zhang
Centre for Next Generation Localisation (CNGL)<http://www.cngl.ie/index.html>
Dublin City University <http://www.dcu.ie/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20140127/14137f14/attachment.htm

------------------------------

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 87, Issue 60
*********************************************

0 Response to "Moses-support Digest, Vol 87, Issue 60"

Post a Comment