Abstract:Cloud computing grows rapidly nowadays, which brings virtualization technology to traditional datacenters in order to implement service-on-demand of computing resources, such as Amazon’s Elastic Cloud Computing (EC2) Services. Hadoop is an open-source implementation of Google’s MapReduce, which is a distributed parallel computing model for large-scale dataset. Hadoop is gaining more and more focuses both in academy and industry. It is an open question that how to combine cloud computing infrastructures with Hadoop efficiently, i.e., making full use of the former’s elastic resources and the latter’s advantages of scalability, fault-tolerance and running on commodity hardware. In this paper, we carry out a series of experiments to evaluate and analyze the performance of Hadoop on our heterogeneous clouding computing testbed. We demonstrate that the performance of Hadoop is degraded under the scenario with high I/O overheads, compared with the traditional scenario where each node in a cluster is a physical machine. Our work can act as a basis for improving the performance of Hadoop under the cloud computing environments.