Multi-label prediction of enzyme classes using InterPro signatures.

Luna De Ferrari, Stuart Aitken, Jano van Hemert, Igor Goryanin

Research output: Contribution to conferencePaper

Abstract

In this work we use InterPro protein signatures to predict enzymatic function.We evaluate the method over more than 300,000 proteins (55% enzymes, 45% non-enzymes) for which Swiss-Prot and KEGG have agreeing Enzyme Commission annotations. We applied multi-label classification to account for proteins with multiple enzymatic functions (about 3% of UniProt) using Mulan, a library of algorithms based on the Weka framework. We achieved > 97% recall, accuracy and precision in predicting enzymatic classes. To understand the role played by the data set size, we compared smaller data sets, either random or specific to taxonomic domains such as archaea, bacteria, fungi, invertebrates, plants and vertebrates.We find that the success of prediction increases with data set size. Limiting the data to a particular taxonomic set, while saving computational time, only covers a reduced set of enzymatic classes and achieves better accuracy than a random set only if the proteins are grouped by high level taxonomic domains (archaea, bacteria and eukaria).
Original languageEnglish
Publication statusPublished - 2010

Fingerprint

Dive into the research topics of 'Multi-label prediction of enzyme classes using InterPro signatures.'. Together they form a unique fingerprint.

Cite this