Text windows and phrases differing by discipline, location in document, and syntactic structure

作者:

Highlights:

摘要

Knowledge of window style, content, location, and grammatical structure may be used to classify documents as originating within a particular discipline or may be used to place a document on a theory vs practice spectrum. This distinction is also studied here using the type-token ratio to differentiate between sublanguages. The statistical significance of windows is computed, based on the presence of terms in titles, abstracts, citations, and section headers, as well as binary-independent and inverse-document-frequency weightings. The characteristics of windows are studied by examining their within-window density and the S concentration, the concentration of terms from various document fields (e.g. title, abstract) in the fulltext. The rate of window occurrences from the beginning to the end of document fulltext differs between academic fields. Different syntactic structures in sublanguages are examined, and their use is considered for discriminating between specific academic disciplines and, more generally, between theory vs practice or knowledge vs applications-oriented documents.

论文关键词:

论文评审过程:Received 14 November 1995, Accepted 11 March 1996, Available online 26 February 1999.

论文官网地址:https://doi.org/10.1016/S0306-4573(96)00017-9