Constrained query of order-preserving submatrix in gene expression data

作者:Tao Jiang, Zhanhuai Li, Xuequn Shang, Bolin Chen, Weibang Li, Zhilei Yin

摘要

Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSM datasets. However, improving OPSM query relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst’s expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly focus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search.

论文关键词:gene expression data, OPSM, constrained query, brute-force search, feature sequence, cIndex

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11704-016-5487-5