Research on the Accurate and Fast Determination Method of Spark Configuration Parameter Range
CSTR:
Author:
Affiliation:

1.Southern University of Science and Technology;2.Shenzhen Institute of Advanced Technology Chinese Academy of Sciences

Clc Number:

TP274

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Currently, with the exponential growth of data on the internet, the complexity of big data processing systems has also increased dramatically. To adapt to changes in factors such as cluster resources, datasets, and applications, big data processing systems provide adjustable configuration parameters tailored to different application scenarios. Among these systems, Spark is one of the most popular and contains over 200 configuration parameters for controlling parallelism, I/O behavior, memory settings, and compression. Incorrect configuration of these parameters often leads to severe performance degradation and stability issues. However, both ordinary users and expert administrators face significant challenges in understanding and tuning these settings for optimal performance, resulting in substantial human and time costs. In the tuning process, selecting unreasonable parameter ranges can increase time costs by fivefold, or even worse, cause operational failures in the cluster and terminate system operation—an incalculable loss for large-scale clusters serving customers.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:November 29,2024
  • Revised:February 27,2025
  • Adopted:February 28,2025
  • Online: March 03,2025
  • Published:
Article QR Code