SynthSys, Waddington building, University of Edinburgh

  • Luna De Ferrari (Invited speaker)

Activity: Talk or presentation typesInvited talk

Description

Predicting enzyme function: active, guided or machine learning?

Manual annotation cannot keep up with enzyme sequence discovery. I will present our approach to the problem, evaluated on more than 300,000 Swiss-Prot and KEGG proteins, resulting in over 98% accuracy. It uses off-the-shelf machine learning (Weka and Mulan) and InterPro sequence signatures to predict the Enzyme Commission class for any sequence. A recent collaboration with EBI and UniProtKB is adapting the method for the automated annotation of the 29 million proteins in UniProt-TrEMBL.
I will also introduce the less known *active* and *guided* learning to support enzyme function curation. Evaluated on 5,750 E. coli proteins, the strategies presented could have cut the curation effort of Swiss-Prot by almost two thirds, while maintaining very high accuracy and recall.
The methods can be applied to real-life datasets of millions of proteins thanks to their limited computational requirements, parallelisation, good coverage of rare classes and flexibility in selecting instances for annotation.
Period31 Jan 2013
Event titleSynthSys, Waddington building, University of Edinburgh
Event typeSeminar
LocationEdinburgh, United KingdomShow on map