大模型推理基础设施的技术挑战与应对策略

摘要: 当前，大模型已跨越规模化落地门槛，在诸多场景展现出应用潜力。随着AI智能体生态的发展，大模型推理负载已超越训练负载，成为驱动算力需求增长的核心引擎。然而，这种从模型“训练”到“推理”的重心转移，使得推理基础设施面临严峻考验。本文分析了大模型推理基础设施的演进趋势，总结了大模型推理基础设施在“计算、传输、存储、调度”四个维度面临的技术挑战，并结合联想集团的创新实践，提出了针对性的软硬件协同解决方案。

Abstract: Currently, LLMs have become mature enough and are demonstrating application potential in various scenarios. With the development of the AI agent ecosystem, the inference workload of LLM has surpassed the training workload, becoming the core engine driving the growth of computing power demand. However, this shift in focus from model "training" to "inference" has placed significant pressure on inference infrastructure. This article analyzes the evolutionary trends of LLM inference infrastructure, summarizes the technical challenges it faces across the four dimensions of "computing, transmission, storage, and scheduling, " and, in combination with Lenovo's innovative practices, proposes targeted hardware-software collaborative solutions.