Fast blocking of undesirable web pages on client PC by discriminating URL using neural networks

作者:

Highlights:

摘要

The world wide web (WWW) has become the largest information archive in the world because of its connectivity and scalability. Web pages, identified by URLs, are the basic forms for transmitting the requested information to clients’ PC, whose number continuously explodes. There are a large portion of the web pages containing undesirable content, such as pornography, crimes, drugs and terrorisms, which makes viewers’ discretion necessary. The large number of the undesirable web pages, however, has made the blocking more difficult on a client PC, because checking through the large collection of URLs is a time-consuming task. We propose a neural network method for determining the existing status of a requested URL in the large prohibited collection. The large prohibited URL collection containing 400,000 URLs was obtained by specifying a number of keywords, e.g. “porn” or “sex”, on several commercial search engines. The simulation results show superior performances in both memory requirement and speed, comparing with a database implementation on the same PC.

论文关键词:World wide web,URL,Web page content,Neural network

论文评审过程:Available online 1 February 2007.

论文官网地址:https://doi.org/10.1016/j.eswa.2007.01.023