大语言模型驱动的铁路信息风险隐患智能识别设计与实现

马童; 杜永林; 孙佳露; 王楠; 牛温佳

doi:10.12146/j.issn.2095-3135.20251117001

大语言模型驱动的铁路信息风险隐患智能识别设计与实现

Design and Implementation of an LLM-Driven Intelligent Risk Identification System for Railway Cybersecurity

摘要

摘要: 随着信息技术的发展，铁路信息面临的风险隐患越来越多，对诸多风险隐患报告的分析提取需要人工操作，费时费力且容易出错，基于此本文提出一种面向铁路网络安全的端到端大语言模型智能识别方法。方法基于Qwen1.5-14B，通过前缀微调与指令模板实现参数高效的领域化适配，避免修改基座参数。为降低“幻觉”，构建专家校准的标准风险清单，并采用集合式文本相似度将模型输出映射至标准条目，保证一致性与可验证性。通过多源文本采集与结构化转换，构建知识库。实验结果显示，在攻击组织、IP、Hash、Email、URL、YARA 等要素上，信息抽取智能体整体优于代表性通用 LLM；在端到端风险与隐患识别上，方法的F1分别达到82.2%与83.6%，具备良好的实用性与鲁棒性。

Abstract: The growing cybersecurity risks in railways necessitate efficient analysis of vulnerability reports, yet manual processing remains time-consuming and error-prone. This paper presents an end-to-end large language model (LLM) based approach for automated intelligence extraction in railway cybersecurity.Built upon Qwen1.5-14B, the approach employs prefix tuning and instruction templates for parameter-efficient domain adaptation without modifying the base parameters. To mitigate hallucinations, we construct an expert-curated canonical risk list and map model outputs to standardized entries via set-based text similarity, ensuring consistency and verifiability. A knowledge base is built through multi-source text acquisition and structuring. Experimental results show that the information-extraction agent outperforms representative general-purpose LLMs across key elements such as threat groups, IP addresses, hashes, emails, URLs, and YARA rules; for end-to-end risk and hazard identification, the method achieves F1-scores of 82.2% and 83.6%, demonstrating strong practicality and robustness.

HTML全文

参考文献(0)

施引文献

资源附件(0)