An automated system for grammatical analysis of Twitter messages. A learning task application

作者:

Highlights:

摘要

This paper describes an educational study involving the use of Twitter as a way to enhance High School students’ interaction while improving the linguistic quality of their messages. For this purpose, an interactive system has been developed for Twitter collection and analysis from grammatical perspective. The automated system involves a comprehensive data normalization phase, which allows us to identify any unknown token, and a grammatical analysis system. The latter makes use of a logical reasoning on bi-gram token representation as well as a simple rule-based reasoning in case of named-entity detection. The developed system allows the user to perform spatial, topic-based or identity-based search functionalities. Besides, the system generates interrupt to moderator (s) together with some statistical parameters related to user activity as soon as a linguistic inconsistency has been detected in order to take relevant course of actions. The automated system allows us to identify both the text normalization issues and the grammatical inconsistencies. The latter makes use of logical reasoning using bi-gram Wikipedia matching. A statistical analysis of tweet messages gathered from students that took part to this study has been carried out. Besides, the contribution of the peers to the improvement of the linguistic quality of users’ messages has been quantified and investigated. The study demonstrates the interest of the participants to this new learning experience and evaluates the influence of the peers on their writing skills. Especially, the visibility and noticeability of Twitter messages to a large audience have been found to contribute widely to raise students’ awareness about the linguistic quality of their messages. The study has also revealed the predominance of the slang language in their daily Twitter writings. Such abbreviations have shown to pose the greatest challenge for any automatic text analysis. Similarly named-entity identification and handling have also been shown to be very challenging, especially, given the nature of Twitter messages where capitalizing is often employed for emphasize as well.

论文关键词:Data mining,Twitter,Social network,Learning

论文评审过程:Received 13 April 2015, Revised 10 November 2015, Accepted 20 February 2016, Available online 11 March 2016, Version of Record 16 April 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.02.015