How to Winnow Actives from Inactives: Introducing Molecular Orthogonal Sparse Bigrams (MOSBs) and Multiclass Winnow

Research output: Contribution to journalArticlepeer-review

Abstract

In the present paper we combine the Winnow algorithm and an advanced scheme for feature generation into a tool for multiclass classification. The Winnow algorithm, specifically designed in the late 1980s to work well with high-dimensional data, by design ignores most of the irrelevant features for the scoring of each single training/test case. To augment the pool of available molecular features we use the Winnow algorithm in conjunction with a process that creates additional features from a set of given ones. We adapt a technique formerly employed in text classification termed “orthogonal sparse bigrams” and extend the use of that method to the domain of cheminformatics. Using circular molecular fingerprints as initial features, we create “molecular orthogonal sparse bigrams” (MOSBs) and report their successful application to the task of classification of bioactive molecules. Additionally, we introduce a memory-efficient way of bagging individual classifiers, avoiding the need to hold the complete training data set in memory. To compare the performance of our method with published results, we use the Hert data set of 8293 active molecules in 11 classes. We compare our method to Random Forest and find that our method not only is comparable or better in classification accuracy (up to 50% higher in MCC [Matthews correlation coefficient], 98% higher in fraction of correct predictions) but also is quicker to train (by a factor between 2 and 18, depending on the feature generation), more memory efficient, and able to cope more easily with large data sets when we seeded the actives into a pool of 94290 inactive molecules. It is shown that this method can be used with different fingerprints.
Original languageEnglish
Pages (from-to)306-318
Number of pages13
JournalJournal of Chemical Information and Modeling
Volume48
Issue number2
DOIs
Publication statusPublished - Feb 2008

Keywords

  • PHYSICOCHEMICAL DESCRIPTORS
  • RANDOM FOREST
  • CLASSIFICATION
  • QSAR
  • PREDICTION
  • ENRICHMENT
  • ENSEMBLES
  • MODELS
  • KERNEL
  • QSPR

Fingerprint

Dive into the research topics of 'How to Winnow Actives from Inactives: Introducing Molecular Orthogonal Sparse Bigrams (MOSBs) and Multiclass Winnow'. Together they form a unique fingerprint.

Cite this