Abstract:Volume, variety and velocity are the three challenges that the big data computing must be faced with, which cannot be dealt with by traditional IT technologies. Benefiting from numerous domestic and overseas Internet companies’ practical applications and continuous code contributions, the Apache Hadoop has become a mature software stack and the de facto standard of the PetaByte scale data processing. Furthermore, around different types of data processing requirements, different software ecosystems have been established. In the big data system field, three research works of data placement, index construction and compression and decompression hardware acceleration, i.e. RCFile, CCIndex and SwiftFS respectively, effectively solving the storage space and query performance issues, were introduced in this paper. The above research achievements have been already integrated into the Golaxy big data engine software stack in the form of key technologies, and directly supported multiple practical applications of Taobao Inc. and Tencent Inc.