A mathematical formulation of keyword compression for thesauri
作者:
Highlights:
•
摘要
In this paper we demonstrate a new method for concentrating the set of key-words of a thesaurus. This method is based on a mathematical study that we have carried out into the distribution of characters in a defined natural language.We have built a function f of concentration which generates only a few synonyms. In applying this function to the set of key-words of a thesaurus, we reduce each key-word to four characters without synonymity. (For three characters we have a rate of synonymity of approx. 1/1000th.)A new structure of binary files allows the thesaurus to be contained in a table of less than 700 bytes.
论文关键词:
论文评审过程:Available online 13 July 2002.
论文官网地址:https://doi.org/10.1016/0306-4573(77)90037-1