A method for inferring probabilistic consensus structure with applications to molecular sequence data

作者:

Highlights:

摘要

The notion of probabilistic consensus structure, a probabilistic model satisfying certain domain constraints, is defined as a statistical relational graph inferred from an ensemble of sequences (or strings). The probabilistic consensus structure is a special form of random graph indicating both the statistical variation and the structural relationship of its components. It is inferred from unsupervised learning using both statistical and given domain constraints. Thus, in principle, its detection is more reliable than using either statistical analysis or domain knowledge (such as heuristic search) alone. An algorithm inferring the probabilistic consensus structure from a random n-tuple is designed based on the detection of statistical interdependency under a given structural constraint criterion. A circular diagram is proposed to facilitate visual interaction with the user for detecting global structural characteristics. The consensus structure can be used as an extended higher-order representation of the random n-tuple for statistical and structural pattern recognition and it represents a form of inherent interdependency of the domain. Additional directionality from human experts can be imposed to generate a causal model. The method is applied to molecular modeling of ribonucleic acid (RNA) from aligned homologous molecular sequences. Experiments based on hypothetical and real molecular structures involving transfer and ribosomal RNA sequences demonstrate the detection of secondary structural domains, as well as of tertiary interactions.

论文关键词:Probabilistic consensus structure,Structural constraint,Statistical interdependency,Structural pattern recognition,molecular modeling,RNA

论文评审过程:Received 23 April 1992, Accepted 23 September 1992, Available online 19 May 2003.

论文官网地址:https://doi.org/10.1016/0031-3203(93)90117-F