An enhanced Web page change detection approach based on limiting similarity computations to elements of same type
作者:Hassan Artail, Michel Abi-Aad
摘要
This paper describes an efficient Web page detection approach based on restricting the similarity computations between two versions of a given Web page to the nodes with the same HTML tag type. Before performing the similarity computations, the HTML Web page is transformed into an XML-like structure in which a node corresponds to an open-closed HTML tag. Analytical expressions and supporting experimental results are used to quantify the improvements that are made when comparing the proposed approach to the traditional one, which computes the similarities across all nodes of both pages. It is shown that the improvements are highly dependent on the diversity of tags in the page. That is, the more diverse the page is (i.e., contains mixed content of text, images, links, etc.), the greater the improvements are, while the more uniform it is, the lesser they are.
论文关键词:Web page, Change detection, Change monitoring, Similarity computation, Performance improvements
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10844-007-0046-z