Website categorization: A formal approach and robustness analysis in the case of e-commerce detection
作者:
Highlights:
• Robust formal approach to website categorization based on web mining and classification.
• Entirely automated procedure using a computationally viable pipeline.
• Application to an important case: the detection of e-commerce in corporate websites.
• Uses machine learning and dictionaries, hence applicable in other contexts or languages.
• Analysis of the robustness w.r.t. the presence of misclassified training records.
摘要
•Robust formal approach to website categorization based on web mining and classification.•Entirely automated procedure using a computationally viable pipeline.•Application to an important case: the detection of e-commerce in corporate websites.•Uses machine learning and dictionaries, hence applicable in other contexts or languages.•Analysis of the robustness w.r.t. the presence of misclassified training records.
论文关键词:Classification,Machine learning,E-commerce,Feature engineering,Text mining,Surveys
论文评审过程:Received 17 April 2019, Revised 9 September 2019, Accepted 1 October 2019, Available online 4 October 2019, Version of Record 18 October 2019.
论文官网地址:https://doi.org/10.1016/j.eswa.2019.113001