G protein coupled receptors (GPCRs) are one of the most prominent and abundant family of membrane proteins in the human genome. Since they are main targets of many drugs, GPCR research has grown significantly in recent years. However the fact that only few structures of GPCRs are known still remains as an important challenge. Therefore, the classification of GPCRs is a significant problem provoked from increasing gap between orphan GPCR sequences and a small amount of annotated ones. This work employs motif distillation using defined parameters, distinguishing power evaluation method and general weighted set cover problem in order to determine the minimum set of motifs which can cover a particular GPCR subfamily. Our results indicate that in Family A Peptide subfamily, 91% of all proteins listed in GPCRdb can be covered by using only 691 different motifs, which can be employed later as an invaluable source for developing a third level GPCR classification tool.
g-protein coupled receptors data mining pattern recognition
Presented in 7th IAPR International Conference, PRIB 2012, Tokyo, Japan, November 8-10, 2012.