Gene expression data analysis of human lymphoma using support vector machines and output coding ensembles

作者:

Highlights:

摘要

The large amount of data generated by DNA microarrays was originally analysed using unsupervised methods, such as clustering or self-organizing maps. Recently supervised methods such as decision trees, dot-product support vector machines (SVM) and multi-layer perceptrons (MLP) have been applied in order to classify normal and tumoural tissues. We propose methods based on non-linear SVM with polynomial and Gaussian kernels, and output coding (OC) ensembles of learning machines to separate normal from malignant tissues, to classify different types of lymphoma and to analyse the role of sets of coordinately expressed genes in carcinogenic processes of lymphoid tissues. Using gene expression data from “Lymphochip”, a specialised DNA microarray developed at Stanford University School of Medicine, we show that SVM can correctly separate normal from tumoural tissues, and OC ensembles can be successfully used to classify different types of lymphoma. Moreover, we identify a group of coordinately expressed genes related to the separation of two distinct subgroups inside diffuse large B-cell lymphoma (DLBCL), validating a previous Alizadeh’s hypothesis about the existence of two distinct diseases inside DLBCL.

论文关键词:Gene expression data analysis,Output coding ensembles of learning machines,Support vector machines,DNA microarrays

论文评审过程:Received 23 October 2001, Revised 22 April 2002, Accepted 16 May 2002, Available online 11 October 2002.

论文官网地址:https://doi.org/10.1016/S0933-3657(02)00077-5