Automatic thesaurus construction using Bayesian networks

作者:

Highlights:

摘要

Automatic thesaurus construction is accomplished by extracting term relations mechanically. A popular method uses statistical analysis to discover the term relations. For low-frequency terms, however, the statistical information of the terms cannot be reliably used for deciding the relationship of terms. This problem is generally referred to as the data-sparseness problem. Unfortunately, many studies have shown that low-frequency terms are of most use in thesaurus construction. This paper characterizes the statistical behavior of terms by using an inference network. A formal approach for the data-sparseness problem, which is crucial in constructing a thesaurus, is developed. The validity of this approach is shown by experiments.

论文关键词:

论文评审过程:Received 15 December 1995, Accepted 1 March 1996, Available online 19 February 1999.

论文官网地址:https://doi.org/10.1016/0306-4573(96)00026-X