Assessing user-specific difficulty of documents

作者：

Highlights：

•

摘要

On the web, a huge variety of text collections contain knowledge in different expertise domains, such as technology or medicine. The texts are written for different uses and thus for people having different levels of expertise on the domain. Texts intended for professionals may not be understandable at all by a lay person, and texts for lay people may not contain all the detailed information needed by a professional. Many information retrieval applications, such as search engines, would offer better user experience if they were able to select the text sources that best fit the expertise level of the user. In this article, we propose a novel approach for assessing the difficulty level of a document: our method assesses difficulty for each user separately. The method enables, for instance, offering information in a personalised manner based on the user’s knowledge of different domains. The method is based on the comparison of terms appearing in a document and terms known by the user. We present two ways to collect information about the terminology the user knows: by directly asking the users the difficulty of terms or, as a novel automatic approach, indirectly by analysing texts written by the users. We examine the applicability of the methodology with text documents in the medical domain. The results show that the method is able to distinguish between documents written for lay people and documents written for experts.

论文关键词：Difficulty measure,User modelling,Readability,Keyphrase extraction

论文评审过程：Received 7 October 2011, Revised 16 February 2012, Accepted 15 April 2012, Available online 14 May 2012.

论文官网地址：https://doi.org/10.1016/j.ipm.2012.04.001