Interpretable semantic textual similarity of sentences using alignment of chunks with classification and regression

作者:Goutam Majumder, Partha Pakray, Ranjita Das, David Pinto

摘要

The proposed work is focused on establishing an interpretable Semantic Textual Similarity (iSTS) method for a pair of sentences, which can clarify why two sentences are completely or partially similar or have some variations. This proposed interpretable approach is a pipeline of five modules that begins with the pre-processing and chunking of text. Further chunks of two sentences are aligned using a one–to–multi (1:M) chunk aligner. Thereafter, support vector, Gaussian Naive Bayes and k–Nearest Neighbours classifiers are then used to create a multiclass classification algorithm, and different class labels are used to define an alignment type. At last, a multivariate regression algorithm is developed to find the semantic equivalence of an alignment with a score (that ranges from 0 to 5). The efficiency of the proposed method is verified on three different datasets and also compared to other state–of–the–art interpretable STS (iSTS) methods. The evaluated results show that the proposed method performs better than other iSTS methods. Most importantly, the modules of the proposed iSTS method are used to develop a Textual Entailment (TE) method. It is found that, when we combined chunk level, alignment, and sentence level features the entailment results significantly improves.

论文关键词:Semantic textual similarity, Natural language understanding, Text classification, Multivariate regression

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-020-02144-x