A hybrid framework to extract bilingual multiword expression from free text

作者:

Highlights:

摘要

Bilingual multiword expression extraction is always a significant problem in extracting meaning from free text. This involves analyzing large amounts of textual information. In this paper we propose a text mining approach to extract bilingual multiword expression. Both statistic and rule-based methods are employed into the system. There are two phases in the extraction process. In the first phase, lots of candidates are extracted from the corpus by statistic methods. The algorithm of multiple sequence alignment is sensitive to the flexible multiword. In the second phase, error-driven rules and patterns are extracted from corpus. For acquired high qualified instances, the manual work with active learning is also performed in sample selection. These trained rules are used to filter the candidates. Bilingual comparisons are used in a parallel corpus. Parts of bilingual syntactic patterns are obtained from the bilingual phrase dictionary. Some related experiments are designed for achieving the best performance because there are lots of parameters in this system. Experimental results showed our approach gains good performance.

论文关键词:Multiword expression,Text mining,Sequence alignment,Error-driven rule

论文评审过程:Available online 19 July 2010.

论文官网地址:https://doi.org/10.1016/j.eswa.2010.06.067