The application of machine learning methods to aggregate geochemistry predicts quarry source location: an example from Ireland

Tadhg Dornan, Gary O'Sullivan, Neal O'Riain, Eva Stueeken, Robbie Goodhue

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)
20 Downloads (Pure)


Attempts using geochemical data to classify quarry sources which provided reactive rock aggregate, composed of Carboniferous aged pyritic mudrocks and limestones, which has caused structural damage to over 12, 500 homes across Ireland have not yet succeeded. In this paper, a possible solution to this problem is found by performing machine learning models, such as Logistic regression and Random Forest, upon a geochemical dataset obtained through the scanning electron microscope energy-dispersive X-ray spectroscopy (SEM-EDS) and Laser ablation-quadrupole-inductively couple plasma mass spectrometry (LA-Q-ICPMS) of pyrite and Isotope ratio mass spectrometry (IRMS) of bulk rock aggregate, to predict quarry source location. When comparing the classification scores, the LA-Q-ICPMS dataset achieved the highest average classification score of 55.38% for Random Forest and 67.73% for Logistic regression based on 10-fold cross validation testing. As a result, this dataset was then used to classify a set of known unknown samples and achieved average classification accuracies of 40.30% for random forest and 66.80% for logistic regression, based on a systematic train-test procedure.

There is scope to enhance these classification scores to an accuracy of 100% by combining the geochemical datasets together. However, due to the difficulty in linking pyrites analysed by SEM-EDS to those analysed by LA-Q-ICPMS, and relating a bulk rock analytical technique (IRMS) to mineral geochemistry (SEM-EDS, LA-Q-ICPMS), median values have to be used when combining IRMS (Fe, S) and SEM-EDS (TS and δ34S) datasets with LA-Q-ICPMS data. Therefore, if these combined datasets were used as part of an applied quarry classification system, statistically meaningful mean values taken from a near normally distributed dataset would have to be used in order to accurately represent the quarry composition.

Original languageEnglish
Article number104495
JournalComputers & Geosciences
VolumeIn press
Early online date20 Apr 2020
Publication statusE-pub ahead of print - 20 Apr 2020


Dive into the research topics of 'The application of machine learning methods to aggregate geochemistry predicts quarry source location: an example from Ireland'. Together they form a unique fingerprint.

Cite this