Skip to main navigation Skip to search Skip to main content
  • School of Chemistry, Biomolecular Sciences Building, North Haugh, St Andrews, KY16 9ST, UK

Accepting Postgraduate Research Students

Personal profile

Research overview

 

Our research covers everything that is broadly both chemical and computational. Some of the main themes are described below, but if you're just after a list of publications, here you are.

 

  • Machine Learning

 

A substantial part of computational chemistry involves building mathematical models to analyse data. The Machine Learning (ML) part of our work comprises everything that is not an attempt realistically to model the processes by which the real world actually works. In jargon, this is everything that is not physics-based. Such tasks might firstly be regression, that is predicting numerical values such as solubilities. Secondly they might be classification, assigning items such as molecules to classes like "toxic" or "non-toxic". Thirdly, they might be clustering, finding patterns in unlabelled data. In our group, we use such models to predict and calculate properties such as solubility, bioactivity and toxicity.

 

Such modelling in fact has a long history in chemistry, dating back to the 19th century. However, for much of that time models were limited to simple linear regressions. In the latter part of the 20th century, the field developed through building QSAR (Quantitative Structure-Activity Relationship) and QSPR (ditto, but now it's Structure-Property) models with multi-linear regression, and then onto non-linear methods. The field was usually known as chemoinformatics (or cheminfomatics, being unsure how to spell its own name). In the modern era, the sophistication of the models has increased to a point where it's more descriptive, and certainly more widely understood, to call these techniques Machine Learning.

 

What about Artifical Intelligence (AI) - can we cut a divide between ML and AI? Probably not a clear one. As Google DeepMind executive Mat Velloso said: “If it is written in Python, it's probably machine learning. If it is written in PowerPoint, it's probably AI.” While there's clearly a sbstantial overlap between the categories, we tend to refer to souped up non-linear regression models as ML, but to LLMs as AI. Nonetheless, under the lid, LLMs are just large neural networks doing neural network things like optimising weights.

 

  • Molecular Simulation

 

By way of contrast, molecular simulation is definitely a physics-based approach. We set up in the computer a mathematical representation of the molecules involved, one that typically includes the chemical nature and spatial co-ordinates of each constituent atom. The computer then produces a possible future of that molecular system, calculating its response to its physical and chemical environment at each timestep to create a trajectory, in a process known as Molecular Dynamics.

 

If carried out intelligently, such methods can provide great scientific insight into the behaviour of the system, covering things such as structure, energy, interactions with other molecules, phase changes and much more. Typically, simulations are carried out with the molecules contained in 3-dimensional boxes that are stacked together without limit in all directions and fill space with no gaps, a scenario described as having "periodic boundary conditions." Our group use such methods for structural studies of the interactions between enzymes and their substrates, with applications like plastic-eating enzymes and new medicines.

 

The forces, or more explicitly the interaction energies, between molecules are defined by a "force field", which has little to do with science fiction but a lot to do with the fundamental physical processes governing the attractive and repulsive interactions amongst atoms and molecules. This forms a major part of the scientific input into simulations. Historically, force fields have been either fitted to experiment or parameterised via theoretical calculation, but increasingly they are now being generated through ML.

 

  • Quantum Chemistry

 

For all the usefulness of simulations, typically their force fields know nothing about covalent bond making or breaking, which means that they can't be used to study chemical reactions, molecular orbitals or even the vibrational motions of molecules. Instead, a more chemically intelligent approach is required, and this is provided by the electronic structure methods of quantum chemistry. Such approaches are known as "first principles," due to their sound basis in atomic and molecular quantum mechanics.

 

The most foundational such method historically has been Hartree-Fock self-consistent field theory (HF). However, in this century, Density Functional Theory (DFT) has become a much more widely known and used alternative, largely because it generally gives a more accurate result at a lesser cost.

 

We use quantum chemical methods such as HF and DFT for a variety of applications, including the energetics of chemical reactions, development of force fields, physics-based calculation of solubility and the prediction of crystal structures. While our group are very much users rather than developers of quantum chemical methods, we appreciate their central role in computational chemistry.

 

  • Bioinformatics

 

The sequential and alphabetical nature of both DNA and proteins makes them a rich source of computational research. Study of these essential and foundational biomolecules provides a window into the evolutionary history of life and its chemistry, as well as the impressive structural diversity of protein folds. Our own research frequently occupies the interface between chemistry and biology, the interactions between large biological polymers and smaller molecules being fundamental to processes of life and disease alike.

 

Much of our work in these areas has centred on enzymes, their chemical functions and their evolutionary histories. In this post-AlphaFold era, we continue to seek out new research questions that can shed light on the rich and diverse repertoire of biochemistry. In this endeavour, we frequently collaborate with collegues in Biology as well as Chemistry.

 

Additional information about the current Mitchell Group can be found here: https://jbomgroup.wp.st-andrews.ac.uk/

Research interests

Enzyme Catalysis: Enzyme-catalysed reactions are ubiquitous and essential to the chemistry of life. Structures, gene sequences, mechanisms, metabolic pathways and kinetic data are currently spread between many different databases and throughout the literature. We have created MACiE, the world's first comprehensive electronic database of the chemical mechanisms of enzymatic reactions. We are now using MACiE to investigate fundamental questions about the chemistry of enzyme functions, their evolution, and their substrate specificity.
Computing Solubility and Bioavailability: Improving the prediction of solubility is essential to reduce the current unacceptable attrition rate in drug development. We have developed methods to predict solubility for drug-like molecules, with particular reference to dependence on pH, salt effects and crystal polymorphism. We tested a number of predictive methods, including Multi-Linear Regression, Random Forest and Support Vector Machines. This work spans traditional quantum chemistry, molecular simulation, QSAR and chemical informatics. The combination of models for protein target prediction with large databases containing toxicological information for individual molecules allows the derivation of “toxiclogical” profiles, i.e., to what extent are molecules of known toxicity predicted to interact with a set of protein targets. To predict protein targets of drug-like and toxic molecules, we have built a computational multiclass model using the Winnow algorithm based on a dataset of known protein targets.  
Computational Toxicology: We are working on the development of in silico techniques for the prediction of toxicological properties and, more broadly, the elucidation of the mechanisms of action of toxic substances. It is hoped that a better understanding of the causal factors pertaining to toxicity will yield greater predictive insight as well. We currently have a project entitled "Machine Learning Methods for Predicting Phospholipidosis". We are using a variety of Machine Learning Methods, including Random Forest and a novel Genetic Algorithm. Phospholipidosis can be characterised by the accumulation of phospholipids in the lysosomes of many cell types. It may be induced by certain drugs of which the most common are cationic amphiphilic drugs.
Protein-ligand Binding Affinities: We have recently used the Random Forest machine learning method to generate a scoring function. Unlike other knowledge-based functions, it makes no assumptions about the mathematical form of the relationship between observed frequencies of contacts and their contributions to the binding free energy. Our method also allows known binding affinities to contribute to the learning process.

Future research

* AI for computational chemistry
* Synthetic data and Machine Learning exploits
* Modelling the evolution of enzyme catalysis.
* Theoretical computation of solubility and improved understanding of hydrophobicity.
* Predictive computational models for bioactivity and toxicology.
* Machine learning approaches to predict enzyme function.

Industrial relevance

Much of our work on predicting bioavailability and solubility is of particular interest to the pharma industry, but also is especially relevant to the fields of food science and nutrition. In recent years, we have moved into computational toxicology, a field of wide-ranging industrial and commercial importance, especially with the advent of REACH legislation. Our work on enzyme reactions has applications in  designing better, and indeed novel, enzyme-based systems, from laundry to biofuels to deodorants.
We are building some public domain toxicology and bioavailability models, designed to be openly available to SMEs and non-profit organisations, as well as academics.

Biography

John Mitchell has a PhD in Theoretical Chemistry from Cambridge. He returned there from University College London in 2000, taking up a lectureship in Chemistry. He was appointed to a readership at St Andrews in 2009. His recent research has used computational techniques in pharmaceutical chemistry and structural bioinformatics. His group have worked extensively on prediction of bioactivity, solubility, melting point and hydrophobicity from chemical structure, using both informatics and theoretical chemistry methodologies. Recently they have developed novel applications of machine learning in computational biochemistry, such as drug side effect prediction, and identifying athletic performance enhancers.

Profile Keywords

Machine Learning, Artificial Intelligence & informatics in Chemistry; Prediction of solubility and other molecular thermodynamic properties; Modelling the organic crystalline state; Classification and computer-based representation of enzyme reaction mechanisms; Bioinformatics studies of molecular evolution; Modelling protein-ligand interactions.

Teaching activity

Lecturer CH5714 Chemical Applications of Electronic Structure Calculations; Lecturer CH4431 Scientific Writing; Lecturer CH3717 Statistical Mechanics and Computational Chemistry; Convenor & Tutor, CH1202 Introductory Chemistry; Lecturer ID2005 Scientific Thinking; Tutor CH2701 Physical Chemistry 2; Tutor CH1401 Introductory Inorganic and Physical Chemistry; Tutor CH5461 Integrating Chemistry; Project Supervisor CH4442 & CH5441 Research Projects; Lecturer SUPACCH Computational Chemistry (Postgraduate course).

Education/Academic qualification

Doctor of Philosophy, Theoretical Studies of Hydrogen Bonding, University of Cambridge

1 Oct 198730 Sept 1990

Award Date: 2 Feb 1991

Keywords

  • QD Chemistry
  • solubility
  • computational chemistry
  • chemoinformatics
  • bioinformatics
  • Machine Learning
  • Artificial Intelligence
  • simulation

Expertise related to UN Sustainable Development Goals

In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This person’s work contributes towards the following SDG(s):

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being
  2. SDG 4 - Quality Education
    SDG 4 Quality Education
  3. SDG 11 - Sustainable Cities and Communities
    SDG 11 Sustainable Cities and Communities
  4. SDG 13 - Climate Action
    SDG 13 Climate Action
  5. SDG 14 - Life Below Water
    SDG 14 Life Below Water

Fingerprint

Dive into the research topics where John B. O. Mitchell is active. These topic labels come from the works of this person. Together they form a unique fingerprint.
  • 1 Similar Profiles

Collaborations and top research areas from the last five years

Recent external collaboration on country/territory level. Dive into details by clicking on the dots or