高级检索

基于机器学习的动态分区并行文件系统性能优化

Research on Machine Learning-Based Performance Optimization of Dynamic Partitioned Parallel File System

  • 摘要: 近年来, 随着大数据、云计算技术的发展, 应用系统越来越集中, 规模亦越来越大, 使得存储系统的性能问题越来越突出。为应对其性能要求, 并行文件系统得到了大量的应用。然而现有的并行文件系统优化方法, 大多只考虑应用系统或并行文件系统本身, 较少考虑两者之间的协同。该文基于应用系统在并行文件系统上的访问模式对存储系统的性能有显著影响这一特点, 提出基于动态分区的并行文件系统优化方法。首先, 利用机器学习技术来分析挖掘各个性能影响因素和性能指标之间的关系和规律, 生成优化模型。其次, 以优化模型为基础, 辅助并行文件系统的参数调优工作。最后, 基于 Ceph 存储系统进行原型实现, 并设计了三层架构应用系统进行了性能测试, 最终达到优化并行文件系统访问性能的目的。实验结果表明, 所提出方法可以达到 85% 的预测优化准确率;在所提出模型的辅助优化下, 并行文件系统的吞吐量性能得到约 3.6 倍的提升。

     

    Abstract: Thanks to the rapid evolvement of technologies including big data and cloud computing, application systems become more and more centralized with boosted scale, which gradually highlights the performance issues of storage systems. Parallel file systems have been applied in a wide range of applications to meet the performance requirements of large-scale applications running on the storage systems. However, the majority of currently used parallel file system optimization methods only takes the application system or the parallel file system itself into account, and seldom considers the collaboration among them. Considering that the access mode of an application system when accessing the parallel file system will have a significant impact on the storage system performance, this study proposes a parallel file system optimization approach based on dynamic partitioning. The key idea is to firstly leverage machine learning techniques to reveal the relationships between factors that can influence the system performance and build an optimization model accordingly. Then, the optimization model will facilitate the parameter optimization of parallel file systems. Finally, the model is tested on a Ceph-based storage system prototype with a three-layer application system. The proposed model successfully optimizes the parallel file system access performance. Experimentally, the proposed model achieves an optimization prediction accuracy of 85%. With the assistance of the proposed model, the system throughput is improved by 3.6 times.

     

/

返回文章
返回