Summary: | To mitigate the negative effect of classification bias caused by overfitting, semi-naive Bayesian techniques seek to mine the implicit dependency relationships in unlabeled testing instances. By redefining some criteria from information theory, Target Learning (TL) proposes to build for each unlabeled testing instance <inline-formula> <math display="inline"> <semantics> <mi mathvariant="script">P</mi> </semantics> </math> </inline-formula> the Bayesian Network Classifier BNC<inline-formula> <math display="inline"> <semantics> <msub> <mrow></mrow> <mi mathvariant="script">P</mi> </msub> </semantics> </math> </inline-formula>, which is independent and complementary to BNC<inline-formula> <math display="inline"> <semantics> <msub> <mrow></mrow> <mi mathvariant="script">T</mi> </msub> </semantics> </math> </inline-formula> learned from training data <inline-formula> <math display="inline"> <semantics> <mi mathvariant="script">T</mi> </semantics> </math> </inline-formula>. In this paper, we extend TL to Universal Target Learning (UTL) to identify redundant correlations between attribute values and maximize the bits encoded in the Bayesian network in terms of log likelihood. We take the <i>k</i>-dependence Bayesian classifier as an example to investigate the effect of UTL on BNC<inline-formula> <math display="inline"> <semantics> <msub> <mrow></mrow> <mi mathvariant="script">P</mi> </msub> </semantics> </math> </inline-formula> and BNC<inline-formula> <math display="inline"> <semantics> <msub> <mrow></mrow> <mi mathvariant="script">T</mi> </msub> </semantics> </math> </inline-formula>. Our extensive experimental results on 40 UCI datasets show that UTL can help BNC improve the generalization performance.
|