Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Rafael Ferreira da Silva, Rosa Filgueira, Ewa Deelman, Erola Pairo-Castineira, Ian Michael Overton, Malcolm Atkinson

Research output: Contribution to journalArticlepeer-review

Abstract

Scientific workflows have become mainstream for conductinglarge-scale scientific research. As a result, many workflowapplications and Workflow Management Systems (WMSs)have been developed as part of the cyberinfrastructure toallow scientists to execute their applications seamlessly ona range of distributed platforms. In spite of many successstories, a key challenge for running workflows in distributedsystems is failure prediction, detection, and recovery. Inthis paper, we propose an approach to use control theorydeveloped as part of autonomic computing to predict failures before they happen, and mitigated them when possible.The proposed approach applying the proportional-integralderivative controller (PID controller) control loop mechanism, which is widely used in industrial control systems, tomitigate faults by adjusting the inputs of the controller. ThePID controller aims at detecting the possibility of a fault farenough in advance so that an action can be performed toprevent it from happening. To demonstrate the feasibility ofthe approach, we tackle two common execution faults of theBig Data era—data storage overload and memory overflow.We define, implement, and evaluate simple PID controllersto autonomously manage data and memory usage of a bioinformatics workflow that consumes/produces over 4.4TB ofdata, and requires over 24TB of memory to run all tasksconcurrently. Experimental results indicate that workflowexecutions may significantly benefit from PID controllers,in particular under online and unknown conditions. Simulation results show that nearly-optimal executions (slowdownof 1.01) can be attained when using our proposed method,and faults are detected and mitigated far in advance of theiroccurence.
Original languageEnglish
Pages (from-to)15-24
Number of pages10
JournalCEUR Workshop Proceedings
Volume1800
Publication statusPublished - 28 Feb 2017

Fingerprint

Dive into the research topics of 'Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows'. Together they form a unique fingerprint.

Cite this