Exploring techniques to distinguish between real images and those generated using stable diffusion XL

Benjamin Sanders*, David Morrison, David Harris-Birtill

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The recent development of text-to-image diffusion models has allowed us to quickly generate realistic images from textual prompts. Despite enabling innovation in particular domains, concerns have been raised over the prospect of malicious users posing synthetic images as genuine. To assess if it is possible to discern between real images and those generated using diffusion models, a novel convolutional neural network was built, trained and tested on a bespoke dataset formed of authentic images from the ImageNet dataset and corresponding synthetic images generated using Stable Diffusion XL: an open-source text-to-image diffusion model. With the public release of this dataset, it is currently the largest publicly accessible collection of images generated using Stable Diffusion XL, significantly contributing to future research in this area. The positive results from our experiment performing a binary classification of synthetic and real images demonstrate the effectiveness in detecting synthetic images, with up to 98.38% accuracy using a ResNet-18 baseline, and 97.24% with the proposed CNN.
Original languageEnglish
Article numbere0339917
Pages (from-to)1-15
Number of pages15
JournalPLoS ONE
Volume21
Issue number1
DOIs
Publication statusPublished - 27 Jan 2026

Fingerprint

Dive into the research topics of 'Exploring techniques to distinguish between real images and those generated using stable diffusion XL'. Together they form a unique fingerprint.

Cite this