基于LOGO计算框架开发的新型分布式机器学习算法库
作者:
作者单位:

1.深圳大学;2.人工智能与数字经济广东省实验室深圳

作者简介:

通讯作者:

中图分类号:

T 工业技术

基金项目:

深圳市基础研究重点项目(JCYJ20220818100205012) ,广东省自然科学基金面上项目(2023A1515011667),深圳市基础研究面上项目(JCYJ20210324093609026)


A New Distributed Machine Learning Library Developed Based on LOGO Computing Framework
Author:
Affiliation:

1.Shenzhen University,Shenzhen;2.Guangdong Laboratory of Artificial Intelligence and Digital Economy SZ,Shenzhen

Fund Project:

Key Basic Research Foundation of Shenzhen (JCYJ20220818100205012), Natural Science Foundation of Guangdong Province (2023A1515011667), Basic Research Foundations of Shenzhen (JCYJ20210324093609026)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    LOGO是一种新的分布式计算框架,与流行的MapReduce计算框架不同,LOGO框架下的大数据分布式计算由两步操作完成,即LO操作在节点虚拟机内运行串行算法完成一个随机样本块的独立计算,产生局部计算结果;GO操作将所有局部结果上传到主节点,在主节点内对局部结果做集成,得到大数据的近似计算结果。LOGO计算框架执行迭代算法时,消除了节点间的数据通信,极大地提高了分布式计算的效率,降低了内存需求,提高了数据扩展性。本文介绍基于LOGO计算框架自主研发的一种新型分布式机器学习算法库RSP-LOGOML。新型分布式计算由LO操作执行的串行算法和GO操作执行的集成算法两部分组成,LO操作直接执行已有的机器学习串行算法,不需按MapReduce编程模型对算法重写,GO操作对串行计算结果进行集成。本文阐述LOGO分布式计算的原理、算法库架构、串行算法封装和GO操作集成策略,展示Spark实现、App应用开发和多种算法测试结果。

    Abstract:

    LOGO is a new distributed computing framework using a Non-MapReduce computing paradigm. Under the LOGO framework, big data distributed computing is completed in two steps. The LO operation runs a serial algorithm in a number of nodes or virtual machines to process independently the random sample data blocks, generating local results. The GO operation uploads all local results to the master node and integrate them to obtain the approximate result of the big data set. The LOGO computing framework eliminates data communication between nodes during iterations of the algorithm, greatly improving computing efficiency, reducing memory requirements, and enhancing data scalability. This article proposes a new distributed machine learning algorithm library RSP-LOGOML under the LOGO computing framework. A new distributed computing is divided into two parts: the serial algorithm executed by the LO operation and the ensemble algorithm executed in the GO operation. The LO operation can directly execute existing serial machine learning algorithms without the need to rewrite them according to MapReduce. The GO operation executes ensemble algorithms of different kinds depending on the ensemble tasks. In this article, the principle of LOGO distributed computing is introduced first, followed by the algorithm library structure, the method for packaging existing serial algorithms and the ensemble strategy. Finally, implementation in Spark, App development, and the results of performance tests for various algorithms are demonstrated.

    参考文献
    相似文献
    引证文献
引用本文

梁展雄,孙旭东,蔡湧达,等.基于LOGO计算框架开发的新型分布式机器学习算法库 [J].集成技术,

Citing format
Liang Zhanxiong, Sun Xudong, Cai Yonda, et al. A New Distributed Machine Learning Library Developed Based on LOGO Computing Framework[J]. Journal of Integration Technology.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-02-24
  • 最后修改日期:2024-02-24
  • 录用日期:
  • 在线发布日期: 2024-05-20
  • 出版日期:
文章二维码