Abstract
The application of Machine Learning to cheminformatics is a large and active field of research, but there exist few papers which discuss whether ensembles of different Machine Learning methods can improve upon the performance of their component methodologies. Here we investigated a variety of methods, including kernel-based, tree, linear, neural networks, and both greedy and linear ensemble methods. These were all tested against a standardised methodology for regression with data relevant to the pharmaceutical development process. Thinvestigation focused on QSPR problems within drug-like chemical space. We aimed to investigate which methods perform best, and how the ‘wisdom of crowds’ principle can be applied to ensemble predictors. It was found that no single method performs best for all
problems, but that a dynamic, well-structured ensemble predictor would perform very well across the board, usually providing an improvement in performance over the best single method. Its use of weighting factors allows the greedy ensemble to acquire a bigger contribution from the better performing models, and this helps the greedy ensemble generally
to outperform the simpler linear ensemble. Choice of data pre-processing methodology was found to be crucial to performance of each method too.
problems, but that a dynamic, well-structured ensemble predictor would perform very well across the board, usually providing an improvement in performance over the best single method. Its use of weighting factors allows the greedy ensemble to acquire a bigger contribution from the better performing models, and this helps the greedy ensemble generally
to outperform the simpler linear ensemble. Choice of data pre-processing methodology was found to be crucial to performance of each method too.
Original language | English |
---|---|
Pages (from-to) | 634-647 |
Journal | Molecular Informatics |
Volume | 34 |
Issue number | 9 |
Early online date | 25 Mar 2015 |
DOIs | |
Publication status | Published - Sept 2015 |
Keywords
- Machine Learning
- Quantitative structure-property relationships
- Greedy ensembles
- Linear ensembles
Fingerprint
Dive into the research topics of 'Greedy and linear ensembles of machine learning methods outperform single approaches for QSPR regression problems'. Together they form a unique fingerprint.Datasets
-
Data underpinning : Greedy and linear ensembles of machine learning methods outperform single approaches for QSPR regression problems
Kew, W. (Creator) & Mitchell, J. B. O. (Creator), University of St Andrews, 2015
Dataset
File