A Tabu search heuristic for smoke term curation in safety defect discovery

作者:

Highlights:

• A heuristic method for curating danger words predicting safety defects is proposed.

• The heuristic method is contrasted with prior manual curation methods.

• The methodology is applied to online review datasets in multiple industries.

• Star ratings are included in the methodology to boost performance.

• The danger word lists significantly improve safety monitoring performance.

摘要

The ability to detect and rapidly respond to the presence of safety defects is vital to firms and to regulatory agencies. In this paper, we employ a text mining methodology to generate industry-specific “smoke terms” for identifying these defects in the countertop appliances and over-the-counter medicine industries. Building upon prior work, we propose several methodological improvements to enhance the precision of our industry-specific terms. First, we replace the subjective manual curation of these terms with an automated Tabu search algorithm, which provides a statistically significant improvement over a sample of human-curated lists. Contrary to the assumptions of prior work, we find that shorter, targeted smoke term lists produce superior precision. Second, we incorporate non-textual review features to enhance the performance of these smoke term lists. In total, we find greater than a twofold improvement over typical human-curated lists. As safety surveillance is vital across industries, our method has great potential to assist firms and regulatory agencies in identifying and responding quickly to safety defects.

论文关键词:Text mining,Online reviews,Tabu search,Heuristics,Defects,Business intelligence

论文评审过程:Received 29 April 2017, Revised 20 October 2017, Accepted 20 October 2017, Available online 24 October 2017, Version of Record 12 December 2017.

论文官网地址:https://doi.org/10.1016/j.dss.2017.10.012