Unsupervised Discovery of Geometrically Common Structural Motifs and Long-Range Contacts in Protein 3D Structures

AuthorKaiser, F.; Labudde, D.
TypeJournal Article
AbstractThe essential role of small evolutionarily conserved structural units in proteins has been extensively researched and validated. A popular example are serine proteases, where the peptide cleavage reaction is realized by a configuration of only three residues. Brought to spatial proximity during the protein folding process, such structural motifs are often long-range contacts and usually hard to detect at sequence level. Due to the constantly increasing resource of protein 3D structure data, the computational identification of structural motifs can contribute significantly to the understanding of protein fold and function. Thus, we propose a method to discover structural motifs of high geometrical similarity and desired sequence separation in protein 3D structure data. By utilizing methods originated from data mining, no a priori knowledge is required. The applicability of the method is demonstrated by the identification of the catalytic unit of serine proteases and the ion-coordination center of cupredoxins. Furthermore, large-scale analysis of the entire Protein Data Bank points towards the presence of ubiquitous structural motifs, independent of any specific fold or function. We envision that our method is suitable to uncover functional mechanisms and to derive fingerprint libraries of structural motifs, which could be used to assess protein family association.
PartIEEE ACM transactions on computational biology and bioinformatics, Vol.16 (2019), No.2, pp.671-680