Abstract:Recently, there has been an explosive growth in cloud computing, greatly increasing the importance of storage in such systems. A wide range of applications have been running in cloud and more and more variant applications are rushing into this platform. Different applications may have different requirements for storages such as file size, the number of files, and I/O performance. This indicates only a unified file system in cloud would keep the overall system performance suboptimal or even cannot satisfy the need of all applications in a cloud. However, it is unclear that whether it is beneficial to optimize the overall I/O performance by employing variant file systems in a single cloud computing platform. In this paper, we address the above problem by characterizing several popular distributed files systems used in cloud computing. These file systems are ceph, moosefs, glusterfs and hdfs. Through the characterization, we find that the performance of the same operation such as read or write may be dramatically different for different file systems. When the file size is less than 256 MB, moosefs has the best writing performance. On average, its writing performance outperforms others by 22.3%. As for reading performance, glusterfs is the best when the file size is larger than 256KB. Its reading performance is 21.0% higher than other file systems. These findings lead us to design a hybrid file system for cloud computing platform, attempting significantly improve the overall performance.