Performance Optimization of Offline Batch Jobs in Erasure-Coded Storage Systems


Ethical statement:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials

    With the explosive growth of Internet data, many distributed storage systems have integrated erasure-coding mechanisms to ensure data reliability, while further reducing storage overhead. However, erasure-coding has changed the data placement scheme, thus affecting the data access of other services of the cluster. This paper proposes a new data placement scheme and a task scheduling strategy based on heterogeneous Hadoop cluster that can be better adapted to the “one-to-many” data access scenarios of a typical offline batch job——MapReduce applications. By analyzing the hardware parameters and historical load of each node in a heterogeneous cluster, the data blocks of the same erasure coded stripe are distributed as many as possible on nodes with similar performance. This way ensures that the data access pressure to each node of the cluster during the execution of the MapReduce job achieves relatively balanced state. In addition, when the system schedules tasks, the task concurrency of nodes is determined according to the current load and computing power of each node and so to avoid straggler task caused by heavy load in some nodes and optimize the progress of the MapReduce job. The experimental results show that compared with the default random data placement and task allocation strategy in Hadoop, the data layout strategy Heterogeneous-aware Data Placement Algorithm (HDPA) and the task allocation strategy Dynamic Task Allocation Algorithm (DTAA) proposed in this paper can effectively reduce the long tail effect of tasks in different types of MapReduce applications, thus reducing the running time by 10.5%~42%.

    Cited by
Get Citation

YANG Zhenyu, LV Min, LI Yongkun. Performance Optimization of Offline Batch Jobs in Erasure-Coded Storage Systems[J]. Journal of Integration Technology,2022,11(3):85-97

Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Received:
  • Revised:
  • Adopted:
  • Online: May 18,2022
  • Published: