Benchmarking and performance modelling of dataflow with cycles

Sheriffo Ceesay, Yuhui Lin, Adam Barker

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Over the years, the popularity of iterative data-intensive applications such as machine learning applications has grown immensely. Unlike batch applications, iterative applications such as k-means, regression or classification algorithms require multiple access to the input data to train it sufficiently for convergence. In the context of big data, these applications are executed on distributed computing frameworks such as Apache Spark. These frameworks are simple to deploy and use, however, under the hood they are complex and highly configurable. To perform an exhaustive study of the impact of these ubiquitous parameters on application performance would be cumbersome due to the exponential amount of their combinations.

In this paper, we group applications based on a common dataflow and communication pattern. We then present a multi-objective performance prediction framework to model the performance of these applications. The models can predict the execution time of a given application with high accuracy. The framework can be used to infer optimal configuration parameters to meet application execution deadlines. Based on these optimal configurable values, we recommend the best EC2 instances in terms of cost. The average error rate of the prediction results is ± 14% from the measured value.
Original languageEnglish
Title of host publicationIEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies (BDCAT '21)
PublisherACM
Pages91–100
Number of pages10
ISBN (Print)9781450391641
DOIs
Publication statusPublished - 6 Dec 2021
EventACM 8th International Conference on Big Data Computing, Applications and Technologies - Leicester, United Kingdom
Duration: 6 Oct 20219 Oct 2021
Conference number: 8
https://www.cs.le.ac.uk/events/BDCAT2021/

Conference

ConferenceACM 8th International Conference on Big Data Computing, Applications and Technologies
Abbreviated titleBDCAT '21
Country/TerritoryUnited Kingdom
CityLeicester
Period6/10/219/10/21
Internet address

Keywords

  • Communication Patterns
  • Big Data
  • Dataflow With Cycles
  • Machine Learning
  • Modelling

Fingerprint

Dive into the research topics of 'Benchmarking and performance modelling of dataflow with cycles'. Together they form a unique fingerprint.

Cite this