RetriBlog: An architecture-centered framework for developing blog crawlers
作者:
Highlights:
•
摘要
Blogs have become an important social tool. It allows the users to share their tastes, express their opinions, report news, form groups related to some subject, among others. The information obtained from the blogosphere may be used to create several applications in various fields. However, due to the growing number of blogs posted every day, as well as the dynamicity of the blogosphere, the task of extracting relevant information from the blogs has become difficult and time consuming. In this paper, we use information retrieval and extraction techniques to deal with this problem. Furthermore, as blogs have many variation points is required to provide applications that can be easily adapted. Faced with this scenario, the work proposes RetriBlog, an architecture-centered framework for the development of blog crawlers. Finally, it presents an evaluation of the proposed algorithms and three case studies.
论文关键词:Social web,Blog crawler,Content extraction,Tag recommendation
论文评审过程:Available online 28 September 2012.
论文官网地址:https://doi.org/10.1016/j.eswa.2012.08.020