Augmenting Naïve Bayes Classifiers with k-Tree Topology
Fereshteh R. Dastjerdi, Liming CaiThe Bayesian network is a directed, acyclic graphical model that can offer a structured description for probabilistic dependencies among random variables. As powerful tools for classification tasks, Bayesian classifiers often require computing joint probability distributions, which can be computationally intractable due to potential full dependencies among feature variables. On the other hand, Naïve Bayes, which presumes zero dependencies among features, trades accuracy for efficiency and often comes with underperformance. As a result, non-zero dependency structures, such as trees, are often used as more feasible probabilistic graph approximations; in particular, Tree Augmented Naïve Bayes (TAN) has been demonstrated to outperform Naïve Bayes and has become a popular choice. For applications where a variable is strongly influenced by multiple other features, TAN has been further extended to the k-dependency Bayesian classifier (KDB), where one feature can depend on up to k other features (for a given k≥2). In such cases, however, the selection of the k parent features for each variable is often made through heuristic search methods (such as sorting), which do not guarantee an optimal approximation of network topology. In this paper, the novel notion of k-tree Augmented Naïve Bayes (k-TAN) is introduced to augment Naïve Bayesian classifiers with k-tree topology as an approximation of Bayesian networks. It is proved that, under the Kullback–Leibler divergence measurement, k-tree topology approximation of Bayesian classifiers loses the minimum information with the topology of a maximum spanning k-tree, where the edge weights of the graph are mutual information between random variables conditional upon the class label. In addition, while in general finding a maximum spanning k-tree is NP-hard for fixed k≥2, this work shows that the approximation problem can be solved in time O(nk+1) if the spanning k-tree also desires to retain a given Hamiltonian path in the graph. Therefore, this algorithm can be employed to ensure efficient approximation of Bayesian networks with k-tree augmented Naïve Bayesian classifiers of the guaranteed minimum loss of information.