TY - JOUR
T1 - Dispel4py
T2 - A Python framework for data-intensive scientific computing
AU - Filguiera, Rosa
AU - Krause, Amrey
AU - Atkinson, Malcolm
AU - Klampanos, Iraklis
AU - Moreno, Alexander
N1 - Funding Information:
This research was supported by the VERCE project (EU FP7 RI 283543), and by the TerraCorrelator project (funded by NERC NE/L012979/1), with major contributions from the Open Cloud Consortium funded by grants from Gordon and Betty Moore Foundation and the National Science Foundation, and led by the University of Chicago.
Publisher Copyright:
© The Author(s) 2015.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. These combine the familiarity of Python programming with the scalability of workflows. Data streaming is used to gain performance, rapid prototyping and applicability to live observations. dispel4py enables scientists to focus on their scientific goals, avoiding distracting details and retaining flexibility over the computing infrastructure they use. The implementation, therefore, has to map dispel4py abstract workflows optimally onto target platforms chosen dynamically. We present four dispel4py mappings: Apache Storm, message-passing interface (MPI), multi-threading and sequential, showing two major benefits: a) smooth transitions from local development on a laptop to scalable execution for production work, and b) scalable enactment on significantly different distributed computing infrastructures. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved; they have provided demanding real applications and helped us develop effective training. The dispel4py.org is an open-source project to which we invite participation. The effective mapping of dispel4py onto multiple target infrastructures demonstrates exploitation of data-intensive and high-performance computing (HPC) architectures and consistent scalability.
AB - This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. These combine the familiarity of Python programming with the scalability of workflows. Data streaming is used to gain performance, rapid prototyping and applicability to live observations. dispel4py enables scientists to focus on their scientific goals, avoiding distracting details and retaining flexibility over the computing infrastructure they use. The implementation, therefore, has to map dispel4py abstract workflows optimally onto target platforms chosen dynamically. We present four dispel4py mappings: Apache Storm, message-passing interface (MPI), multi-threading and sequential, showing two major benefits: a) smooth transitions from local development on a laptop to scalable execution for production work, and b) scalable enactment on significantly different distributed computing infrastructures. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved; they have provided demanding real applications and helped us develop effective training. The dispel4py.org is an open-source project to which we invite participation. The effective mapping of dispel4py onto multiple target infrastructures demonstrates exploitation of data-intensive and high-performance computing (HPC) architectures and consistent scalability.
KW - data streaming
KW - Data-intensive computing
KW - e-infrastructures
KW - programming frameworks
KW - scientific workflows
UR - http://www.scopus.com/inward/record.url?scp=85021296575&partnerID=8YFLogxK
U2 - 10.1177/1094342016649766
DO - 10.1177/1094342016649766
M3 - Article
AN - SCOPUS:85021296575
SN - 1094-3420
VL - 31
SP - 316
EP - 334
JO - International Journal of High Performance Computing Applications
JF - International Journal of High Performance Computing Applications
IS - 4
ER -