Online Text Prediction with Recurrent Neural Networks

作者:Juan Antonio Pérez-Ortiz, Jorge Calera-Rubio, Mikel L. Forcada

摘要

Arithmetic coding is one of the most outstanding techniques for lossless data compression. It attains its good performance with the help of a probability model which indicates at each step the probability of occurrence of each possible input symbol given the current context. The better this model, the greater the compression ratio achieved. This work analyses the use of discrete-time recurrent neural networks and their capability for predicting the next symbol in a sequence in order to implement that model. The focus of this study is on online prediction, a task much harder than the classical offline grammatical inference with neural networks. The results obtained show that recurrent neural networks have no problem when the sequences come from the output of a finite-state machine, easily giving high compression ratios. When compressing real texts, however, the dynamics of the sequences seem to be too complex to be learned online correctly by the net.

论文关键词:arithmetic coding, online nonlinear prediction, recurrent neural networks, text compression

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1012491324276