Methodology for refining subject terms and supporting subject indexing with taxonomy: A case study of the APO digital repository

作者:

Highlights:

• Propose a methodology for refining existing subject terms by estimating their frequencies and semantics, and for inducing a taxonomy from the refined subject terms by integrating their mutual usages.

• Provide thorough analysis of our proposed methodology using the APO (Analysis & Policy Observatory) digital repository to show the applicability of the methodology

• Measure the generalisability of the proposed taxonomy inducing method, in comparison with the state–of-the-art taxonomy inducing method, TaxoFinder

摘要

In digital repositories, it is crucial to refine existing subject terms and exploit a taxonomy with subject terms, in order to promote information retrieval tasks such as indexing, cataloging and searching of digital documents. In this paper, we address how to refine an existing set of subject terms, often containing irrelevant ones or creating noise, that are used to index digital documents. Further, we present how to automatically induce a subject term taxonomy to capture and utilise the semantic relations among subject terms. Most related works have little studied these problems, focusing mostly on creating subject terms or building a taxonomy of key terms from text documents. We propose a methodology2 for refining an existing set of subject terms in a digital repository by identifying their semantics, as well as inducing a taxonomy with subject terms by analysing their mutual usages, maximising their semantic relatedness. Then, we present a case study using the (Analysis & Policy Observatory) APO digital repository to analyse the proposed methodology and demonstrate its applicability. Further, to validate the generalisability of the proposed taxonomy inducing method, we evaluate it using a gold-standard taxonomy in life sciences, Medical Subject Headings (MeSH), in comparison with the state–of-the-art taxonomy inducing method, TaxoFinder. Our evaluation shows that our methodology has a high potential for refining an existing set of subject terms and capturing their semantic relationships by inducing a subject term taxonomy.

论文关键词:Controlled terms,Subject taxonomies,Subject headings,Thesaurus construction,APO,Digital library

论文评审过程:Received 13 October 2020, Revised 17 February 2021, Accepted 5 March 2021, Available online 13 March 2021, Version of Record 15 May 2021.

论文官网地址:https://doi.org/10.1016/j.dss.2021.113542