Description
Predicting enzyme function: active, guided or machine learning?Manual annotation cannot keep up with enzyme sequence discovery. I will present our approach to the problem, evaluated on more than 300,000 Swiss-Prot and KEGG proteins, resulting in over 98% accuracy. It uses off-the-shelf machine learning (Weka and Mulan) and InterPro sequence signatures to predict the Enzyme Commission class for any sequence. A recent collaboration with EBI and UniProtKB is adapting the method for the automated annotation of the 29 million proteins in UniProt-TrEMBL.
I will also introduce the less known *active* and *guided* learning to support enzyme function curation. Evaluated on 5,750 E. coli proteins, the strategies presented could have cut the curation effort of Swiss-Prot by almost two thirds, while maintaining very high accuracy and recall.
The methods can be applied to real-life datasets of millions of proteins thanks to their limited computational requirements, parallelisation, good coverage of rare classes and flexibility in selecting instances for annotation.
Period | 31 Jan 2013 |
---|---|
Event title | SynthSys, Waddington building, University of Edinburgh |
Event type | Seminar |
Location | Edinburgh, United KingdomShow on map |