Distortion-free PCA on sample space for highly variable gene detection from single-cell RNA-seq data
作者:Momo Matsuda, Yasunori Futamura, Xiucai Ye, Tetsuya Sakurai
摘要
Single-cell RNA-seq (scRNA-seq) allows the analysis of gene expression in each cell, which enables the detection of highly variable genes (HVG) that contribute to cell-to-cell variation within a homogeneous cell population. HVG detection is necessary for clustering analysis to improve the clustering result. scRNA-seq includes some genes that are expressed with a certain probability in all cells which make the cells indistinguishable. These genes are referred to as background noise. To remove the background noise and select the informative genes for clustering analysis, in this paper, we propose an effective HVG detection method based on principal component analysis (PCA). The proposed method utilizes PCA to evaluate the genes (features) on the sample space. The distortion-free principal components are selected to calculate the distance from the origin to gene as the weight of each gene. The genes that have the greatest distances to the origin are selected for clustering analysis. Experimental results on both synthetic and gene expression datasets show that the proposed method not only removes the background noise to select the informative genes for clustering analysis, but also outperforms the existing HVG detection methods.
论文关键词:single-cell RNA-sequencing, feature selection, principal component analysis, highly variable gene detection, background noise, clustering analysis
论文评审过程:
论文官网地址:https://doi.org/10.1007/s11704-022-1172-z