On learning to predict Web traffic

作者:

Highlights:

摘要

The ease of collecting data about customers through the Internet has facilitated the process of developing large repositories of data. These data can and do contain patterns that are useful for the decision maker. Knowledge discovery and data mining methods have been widely used to extract these patterns. It is acknowledged that about 80% of the resources in a majority of data mining applications are spent on cleaning and preprocessing the data. However, there have been relatively few studies on preprocessing data used as input in these data mining systems. In this study, we present a feature selection method based on the Hausdorff distance measure, and evaluate its effectiveness in preprocessing input data for inducing decision trees. Message traffic data from a Web site are used to illustrate performance of the proposed method.

论文关键词:Data mining,Feature selection,Web traffic

论文评审过程:Available online 31 May 2002.

论文官网地址:https://doi.org/10.1016/S0167-9236(02)00107-0