快速确定大数据Spark应用配置参数值域的方法研究
CSTR:
作者:
作者单位:

1.南方科技大学;2.中国科学院深圳先进技术研究院

作者简介:

通讯作者:

中图分类号:

TP274

基金项目:


Research on the Accurate and Fast Determination Method of Spark Configuration Parameter Range
Author:
Affiliation:

1.Southern University of Science and Technology;2.Shenzhen Institute of Advanced Technology Chinese Academy of Sciences

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    当前,互联网数据处理规模增长迅猛,用户的处理需求也变化多样。为适应不同数据处理场景的需求,如不同的集群资源和数据集,流行的大数据处理系统为用户提供了越来越多的可配置参数。譬如,最受欢迎的大数据处理系统Spark提供了超过200个以上的参数。它们控制了应用程序的并行度、I/O行为、内存使用和数据压缩等。但是,这些配置参数设置不当往往会导致程序性能严重下降,甚至导致大数据系统的运行崩溃,给客户造成不可估量的损失。

    Abstract:

    Currently, with the exponential growth of data on the internet, the complexity of big data processing systems has also increased dramatically. To adapt to changes in factors such as cluster resources, datasets, and applications, big data processing systems provide adjustable configuration parameters tailored to different application scenarios. Among these systems, Spark is one of the most popular and contains over 200 configuration parameters for controlling parallelism, I/O behavior, memory settings, and compression. Incorrect configuration of these parameters often leads to severe performance degradation and stability issues. However, both ordinary users and expert administrators face significant challenges in understanding and tuning these settings for optimal performance, resulting in substantial human and time costs. In the tuning process, selecting unreasonable parameter ranges can increase time costs by fivefold, or even worse, cause operational failures in the cluster and terminate system operation—an incalculable loss for large-scale clusters serving customers.

    参考文献
    相似文献
    引证文献
引用本文

李瑞,李乐乐,喻之斌.快速确定大数据Spark应用配置参数值域的方法研究 [J].集成技术,

Citing format
LI RUI, LI LELE, YU ZHIBIN. Research on the Accurate and Fast Determination Method of Spark Configuration Parameter Range[J]. Journal of Integration Technology.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-11-29
  • 最后修改日期:2025-02-27
  • 录用日期:2025-02-28
  • 在线发布日期: 2025-03-03
  • 出版日期:
文章二维码