A new approach on search for similar documents with multiple categories using fuzzy clustering
作者:
Highlights:
•
摘要
Searching for similar document has an important role in text mining and document management. In whether similar document search or in other text mining applications generally document classification is focused and class or category that the documents belong to is tried to be determined. The aim of the present study is the investigation of the case which includes the documents that belong to more than one category. The system used in the present study is a similar document search system that uses fuzzy clustering. The situation of belonging to more than one category for the documents is included by this system. The proposed approach consists of two stages to solve multicategories problem. The first stage is to find out the documents belonging to more than one category. The second stage is the determination of the categories to which these found documents belong to. For these two aims α-threshold Fuzzy Similarity Classification Method (α-FSCM) and Multiple Categories Vector Method (MCVM) are proposed as written order. Experimental results showed that proposed system can distinguish the documents that belong to more than one category efficiently. Regarding to the finding which documents belong to which classes, proposed system has better performance and success than the traditional approach.
论文关键词:Text mining,Document similarity,Similarity search,Fuzzy clustering,Multiple categories
论文评审过程:Available online 16 April 2007.
论文官网地址:https://doi.org/10.1016/j.eswa.2007.04.003