On using the chi-squared metric for determining stochastic dependence

作者:

Highlights:

摘要

In several pattern recognition applications it is often necessary to approximate probability distributions/densities. In estimating this density function, it is either assumed that the form of the density function is known and that parameters that characterize the distribution are merely estimated, or it is assumed that no information about the density function is available. The latter formulation is considered with the additional constraint that even the stochastic dependence is unknown. For the case of discrete-valued features the well-known method due to Chow and Liu (IEEE Trans. Inf. Theory14, 462–467 (May 1968)) which uses dependence trees, can be used to approximate the underlying probability distribution. This method determines the best dependence-tree based on the well-acclaimed expected mutual information measure (EMIM) metric. The suitability of a chi-squared metric is studied for the same purpose. For a restricted class of distributions, both these metrics are shown to be equivalent and stochastically optimal. For more general cases, the latter metric is almost as efficient as the optimal one, and in all cases the technique presented here is computationally almost an order of magnitude faster than the EMIM based method.

论文关键词:Statistical information,Probability distribution,Estimation,Approximation,Closeness of approximation,Dependence trees

论文评审过程:Received 14 November 1991, Accepted 16 March 1992, Available online 19 May 2003.

论文官网地址:https://doi.org/10.1016/0031-3203(92)90151-8