Neural networks for language identification: a comparative study

作者:

Highlights:

摘要

Since the advent of Jordan's recurrent network [Jordan, M. I. (1986) Serial Order: A Parallel Distributed Processing Approach. Tech. Rep. No. 8604. Institute for Cognitive Science, University of California, San Diego.] which allows the processing of data with a temporal component, neural networks have been used routinely for sequence processing. This type of network is analysed in this paper for its ability to discriminate between different languages based on its processing of a small sample of text. The motivation for developing this model was for its potential use in the on-line version of a Trinity College 1872 Printed Catalogue, a library catalogue which has entries in 14 different languages spanning over 5 centuries. It was thought that neural networks would perform well where entries to be analysed comprised only a few words. The neural network's performance was compared with that of trigrams and a suffix/morphology analysis. The trigrams proved to be superior, classifying over 92% of the entries correctly compared to 88% for the neural network and 85% for the morphology/suffix analysis. Trigrams were also far superior in the speed at which statistics were compiled and the rate at which text was processed.

论文关键词:

论文评审过程:Received 1 April 1997, Accepted 1 January 1998, Available online 21 October 1998.

论文官网地址:https://doi.org/10.1016/S0306-4573(98)00008-9