Unified domain-specific language for collecting and processing data of social media

作者:Nikolay Butakov, Maxim Petrov, Ksenia Mukhina, Denis Nasonov, Sergey Kovalchuk

摘要

Data provided by social media becomes an increasingly important analysis material for social scientists, market analysts, and other stakeholders. Diversity of interests leads to the emergence of a variety of crawling techniques and programming solutions. Nevertheless, these solutions have a lack of flexibility to satisfy requirements of different users and individual crawling scenarios, that can range from a simple query to a complex workflow containing multiple steps and requiring data from different networks to be collected. To address this problem, our paper proposes an approach based on a developed domain specific language (DSL) and architecture of distributed crawling system. The DSL has a declarative style that requires the user to define the description of needed data and based on an ontological model of social networks and the essential crawling techniques. Thus, the crawling system can be applied to collect the data from different online social networks within complex workflows along with the exploitation of various crawling methods implemented in a distributed computing environment.

论文关键词:Social networks, Social media, Crawling, Domain-specific language, Ontology

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-018-0508-5