Abstract:Thanks to the rapid evolvement of technologies including big data and cloud computing, application systems become more and more centralized with boosted scale, which gradually highlights the performance issues of storage systems. Parallel file systems have been applied in a wide range of applications to meet the performance requirements of large-scale applications running on the storage systems. However, the majority of currently used parallel file system optimization methods only takes the application system or the parallel file system itself into account, and seldom considers the collaboration among them. Considering that the access mode of an application system when accessing the parallel file system will have a significant impact on the storage system performance, this study proposes a parallel file system optimization approach based on dynamic partitioning. The key idea is to firstly leverage machine learning techniques to reveal the relationships between factors that can influence the system performance and build an optimization model accordingly. Then, the optimization model will facilitate the parameter optimization of parallel file systems. Finally, the model is tested on a Ceph-based storage system prototype with a three-layer application system. The proposed model successfully optimizes the parallel file system access performance. Experimentally, the proposed model achieves an optimization prediction accuracy of 85%. With the assistance of the proposed model, the system throughput is improved by 3.6 times.