Textual Data Mining to Support Science and Technology Management
作者:Paul Losiewicz, Douglas W. Oard, Ronald N. Kostoff
摘要
This paper surveys applications of data mining techniques to large text collections, and illustrates how those techniques can be used to support the management of science and technology research. Specific issues that arise repeatedly in the conduct of research management are described, and a textual data mining architecture that extends a classic paradigm for knowledge discovery in databases is introduced. That architecture integrates information retrieval from text collections, information extraction to obtain data from individual texts, data warehousing for the extracted data, data mining to discover useful patterns in the data, and visualization of the resulting patterns. At the core of this architecture is a broad view of data mining—the process of discovering patterns in large collections of data—and that step is described in some detail. The final section of the paper illustrates how these ideas can be applied in practice, drawing upon examples from the recently completed first phase of the textual data mining program at the Office of Naval Research. The paper concludes by identifying some research directions that offer significant potential for improving the utility of textual data mining for research management applications.
论文关键词:text data mining, information retrieval, knowledge discovery in databases, bibliometrics, computational linguistics
论文评审过程:
论文官网地址:https://doi.org/10.1023/A:1008777222412