Composition pattern oriented tag extraction from short documents using a structural learning method

作者:Yongwook Shin, Sung-Jun Lee, Jonghun Park

摘要

With the rapid growth of web, automatic tagging that detects informative terms from a document becomes an important problem for information aggregation and sharing services. In particular, automatic tagging for short documents becomes more interesting as many users are increasingly publishing information through social media services which encourage users to create the documents of short length. In this paper, we propose a novel automatic tagging model for short text documents from social media services, following the framework of supervised learning. We redefine traditional frequency-based term features so that they can address the properties of the documents created under length limitation and consider sequential dependencies between successive terms in a document based on a structural support vector machine. In addition, our proposed approach incorporates composition patterns by which users put informative terms into their documents. Extensive experiments have been conducted to validate the presented approach, and it was found that the proposed term features were effective for extracting tags, and the tag extractor trained by considering the sequential dependencies and composition patterns achieved superior performance results over the existing alternative methods.

论文关键词:Automatic tagging, Keyword extraction, Composition pattern, Structural learning, Information retrieval, Classification, Social media

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-012-0594-6