Summary: | Naive Bayes classifier (NBC) is an effective classification technique in data mining and machine learning, which is based on the attribute conditional independence assumption. However, this assumption rarely holds true in real-world applications, so numerous researches have been made to alleviate the assumption by attribute weighting. To the best of our knowledge, almost all studies have calculated attribute weights according to correlation measure or classification accuracy. In this paper, we propose a novel causality-based attribute weighting method to establish the weighted NBC called IFG-WNBC, where causal information flow (IF) theory and genetic algorithm (GA) are adopted to search for optimal weights. The introduction of IF produces a bran-new weight measure criterion from the angle of causality other than correlation. The population initialization in GA is also improved with IF-based weights for efficient optimization. Multi-set of comparison experiments on UCI data sets demonstrate that IFG-WNBC achieves superiority over classic NBC and other common weighted NBC algorithms in classification accuracy and running time.
|