Automatic classification of complaint letters according to service provider categories

作者:

Highlights:

摘要

In the technological age, the phenomenon of complaint letters published on the Internet is increasing. Therefore, it is important to automatically classify complaint letters according to various criteria, such as company categories. In this research, we investigated the automatic text classification of complaint letters written in Hebrew that were sent to various companies from a wide variety of categories. The classification was performed according to company categories such as insurance, cellular communication, and rental cars. We conducted an extensive set of classification experiments of complaint letters to seven/six/five/four company categories. The classification experiments were performed using various sets of word unigrams, four machine learning methods, two feature filtering methods, and parameter tuning. The classification results are relatively high for all six measures: accuracy, precision, recall, F1, PRC-area, and ROC-area. The best accuracy results for seven, six, five, and four categories are 84.5%, 88.4%, 91.4%, and 93.8%, respectively. An analysis of the most frequently occurring words in the complaints about almost all categories revealed that the most significant issues were related to poor service and delayed delivery. An interesting result shows that only in the domain of hospitals was the subject of the domain itself (i.e., the patient, the medical treatment, the place of the treatment, and the medical staff) the most important issue. Another interesting finding is that the issue of “price” was of little or no importance to the complainants. These findings suggest that in their preoccupation with their bottom line of profitability, many service providers are blind to how paramount good service and timely delivery (and, in the case of hospitals, the domain itself) are to their clientele.

论文关键词:Bag of words,Complaint letters,Semantic fields,Service providers,Supervised machine learning,Text classification

论文评审过程:Received 14 February 2018, Revised 13 August 2019, Accepted 13 August 2019, Available online 29 August 2019, Version of Record 29 August 2019.

论文官网地址:https://doi.org/10.1016/j.ipm.2019.102102