Scalable adaptive optimizations for stream-based workflows in multi-HPC-clusters and cloud infrastructures

Liang Liang, Rosa Filgueira*, Yan Yan, Thomas Heinis

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Downloads (Pure)


This work presents three new adaptive optimization techniques to maximize the performance of dispel4py workflows. dispel4py is a parallel Python-based stream-oriented dataflow framework that acts as a bridge to existing parallel programming frameworks like MPI or Python multiprocessing. When a user runs a dispel4py workflow, the original framework performs a fixed workload distribution among the processes available for the run. This allocation does not take into account the features of the workflows, which can cause scalability issues, especially for data-intensive scientific workflows. Our aim, therefore, is to improve the performance of dispel4py workflows by testing different workload strategies that automatically adapt to workflows at runtime. For achieving this objective, we have implemented three new techniques, called Naive Assignment, Staging and Dynamic Scheduling. We have evaluated our proposal with several workflows from different domains and across different computing resources. The results show that our proposed techniques have significantly (up to 10X) improved the performance of the original dispel4py framework.
Original languageEnglish
Pages (from-to)102-116
Number of pages15
JournalFuture Generation Computer Systems
Early online date8 Oct 2021
Publication statusPublished - Mar 2022


  • Scientific workflow
  • Stream-based workflow
  • Workflow optimization
  • dispel4py
  • Distributed systems


Dive into the research topics of 'Scalable adaptive optimizations for stream-based workflows in multi-HPC-clusters and cloud infrastructures'. Together they form a unique fingerprint.

Cite this