Abstract
Over the years, the popularity of iterative data-intensive applications such as machine learning applications has grown immensely. Unlike batch applications, iterative applications such as k-means, regression or classification algorithms require multiple access to the input data to train it sufficiently for convergence. In the context of big data, these applications are executed on distributed computing frameworks such as Apache Spark. These frameworks are simple to deploy and use, however, under the hood they are complex and highly configurable. To perform an exhaustive study of the impact of these ubiquitous parameters on application performance would be cumbersome due to the exponential amount of their combinations.
In this paper, we group applications based on a common dataflow and communication pattern. We then present a multi-objective performance prediction framework to model the performance of these applications. The models can predict the execution time of a given application with high accuracy. The framework can be used to infer optimal configuration parameters to meet application execution deadlines. Based on these optimal configurable values, we recommend the best EC2 instances in terms of cost. The average error rate of the prediction results is ± 14% from the measured value.
In this paper, we group applications based on a common dataflow and communication pattern. We then present a multi-objective performance prediction framework to model the performance of these applications. The models can predict the execution time of a given application with high accuracy. The framework can be used to infer optimal configuration parameters to meet application execution deadlines. Based on these optimal configurable values, we recommend the best EC2 instances in terms of cost. The average error rate of the prediction results is ± 14% from the measured value.
Original language | English |
---|---|
Title of host publication | IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies (BDCAT '21) |
Publisher | ACM |
Pages | 91–100 |
Number of pages | 10 |
ISBN (Print) | 9781450391641 |
DOIs | |
Publication status | Published - 6 Dec 2021 |
Event | ACM 8th International Conference on Big Data Computing, Applications and Technologies - Leicester, United Kingdom Duration: 6 Oct 2021 → 9 Oct 2021 Conference number: 8 https://www.cs.le.ac.uk/events/BDCAT2021/ |
Conference
Conference | ACM 8th International Conference on Big Data Computing, Applications and Technologies |
---|---|
Abbreviated title | BDCAT '21 |
Country/Territory | United Kingdom |
City | Leicester |
Period | 6/10/21 → 9/10/21 |
Internet address |
Keywords
- Communication Patterns
- Big Data
- Dataflow With Cycles
- Machine Learning
- Modelling