Output-based transfer learning in genetic programming for document classification

作者:

Highlights:

摘要

Transfer learning has been studied in document classification for transferring a model trained from a source domain (SD) to a relatively similar target domain (TD). In feature-based transfer learning techniques, there is an investigation on the features being transferred from SD to TD. This paper conducts an investigation on an output-based transfer learning system using Genetic Programming (GP) in document classification tasks, which automatically selects features to construct classifiers. The proposed GP system directly generates programs from a set of sparse features and only considers the output change of the evolved programs from SD to TD. A linear model is then used to combine existing GP programs from SD as features to TD. Also, new GP programs are mutated from the programs evolved in SD to improve the accuracy. Via directly utilizing the evolved GP programs and their mutations, the feature extraction and estimation processes on TD are avoided. The results for the experiments demonstrates that the GP programs from SD can be effectively used for classifying documents in the relevant TD. The results also show that it is easy to train effective classifiers on TD when the GP programs are used as features. Furthermore, the proposed linear model, using multiple GP programs from SD as its inputs, outperforms single GP programs which are directly obtained from TD.

论文关键词:Genetic programming,Transfer learning,Feature extraction,Document classification

论文评审过程:Received 14 May 2020, Revised 18 August 2020, Accepted 3 November 2020, Available online 11 November 2020, Version of Record 24 December 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.106597