基于机器学习的加密流量分析方法综述
作者:
中图分类号:

TP393.08

基金项目:

河南省高校人文社会科学一般项目(2024-ZZJH-290);公安部科技强警基础工作计划项目(2023JC21);河南警察学院科研项目 (HNJY-2023-42)


A Survey of Machine Learning-Based Encrypted Traffic Analysis Methods
Author:
Fund Project:

This work is supported by General Project for Research in Humanities and Social Sciences in Universities of Henan Province (2024-ZZJH-290), Basic Research Program for Science and Technology Strengthening Police Force of the Ministry of Public Security (2023JC21), and Research Project of Henan Police College (HNJY-2023-42)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [60]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    随着互联网技术的快速发展,网络安全问题日益突出,加密流量的识别与分类成为一个重要研究方向。作者对当前基于机器学习的加密流量分类技术进行全面综述。首先,从分层的角度简要介绍常见的加密协议及特点;其次,对加密流量分析领域的数据集和评估指标进行概览;再次,对基于传统机器学习的加密流量分析方法和基于深度学习的方法进行讨论,对其中的特征工程、分类器模型等关键技术进行分析;最后,总结该领域目前面临的可解释性不足、对抗样本风险等挑战,对未来的可解释性加强、自动化特征提取和模型结构优化等研究方向进行展望。

    Abstract:

    With the rapid development of Internet technology, network security issues have become increasingly prominent. Among these, the identification and classification of encrypted traffic have emerged as significant research directions. This paper provides a comprehensive review of current machine learning-based techniques for encrypted traffic classification. First, it briefly introduces common encryption protocols and their characteristics from a layered perspective. Then, it provides an overview of the datasets and evaluation metrics used in this field. Furthermore, a discussion on encrypted traffic analysis methods based on traditional machine learning and deep learning is conducted, with a detailed analysis of key techniques such as feature engineering and classifier models. Finally, it summarizes the challenges currently faced in this field, including the lack of interpretability and the risk of adversarial examples, and looks ahead to future research directions aimed at enhancing interpretability, automating feature extraction, and automating optimizing model structures.

    参考文献
    [1] Langley A , Riddoch A , Wilk A ,et al.The QUIC Transport Protocol: Design and Internet-Scale Deployment[C]//ACM Special Interest Group on Data Communication.ACM, 2017.DOI:10.1145/3098822.3098842.
    [2] Sharafaldin I, Lashkari A H, Ghorbani A A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization[C]//Proceedings of the 4th International Conference on Information Systems Security and Privacy. SCITEPRESS-Science and Technology Publications, 2018.
    [3] Lashkari A H, Gil G D, Mamun M S I, et al. Characterization of tor traffic using time based features[C]//International Conference on Information Systems Security and Privacy. SciTePress, 2017, 2: 253-262.
    [4] Draper-Gil G, Lashkari A H, Mamun M S I, et al. Characterization of encrypted and vpn traffic using time-related[C]//Proceedings of the 2nd international conference on information systems security and privacy (ICISSP). 2016: 407-414.
    [5] MontazeriShatoori M, Davidson L, Kaur G, et al. Detection of doh tunnels using time-series classification of encrypted traffic[C]//2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). IEEE, 2020: 63-70.
    [6] Habibi Lashkari A, Kaur G, Rahali A. Didarknet: A contemporary approach to detect and characterize the darknet traffic using deep image learning[C]//2020 the 10th international conference on communication and network security. 2020: 1-13.
    [7] Rezaei S, Liu X. How to achieve high classification accuracy with just a few labels: A semi-supervised approach using sampled packets[J]. arXiv preprint arXiv:1812.09761, 2018.
    [8] Zhao R, Deng X, Wang Y, et al. Flow sequence-based anonymity network traffic identification with residual graph convolutional networks[C]//2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS). IEEE, 2022: 1-10.
    [9] Alshammari R, Zincir-Heywood A N. Can encrypted traffic be identified without port numbers, IP addresses and payload inspection[J]. Computer networks, 2011, 55(6): 1326-1350.
    [10] Luo P, Chu J, Yang G. IP packet-level encrypted traffic classification using machine learning with a light weight feature engineering method[J]. Journal of Information Security and Applications, 2023, 75: 103519.
    [11] Al-Fayoumi M, Al-Fawa''reh M, Nashwan S. VPN and Non-VPN Network Traffic Classification Using Time-Related Features[J]. Computers, Materials Continua, 2022, 72(2).
    [12] Wei N, Yin L, Zhou X, et al. A feature enhancement-based model for the malicious traffic detection with small-scale imbalanced dataset[J]. Information Sciences, 2023, 647: 119512.
    [13] Zhang Q, Su C J. Application-layer Characterization and Traffic Analysis for Encrypted QUIC Transport Protocol[C]//2023 IEEE Conference on Communications and Network Security (CNS). IEEE, 2023: 1-9.
    [14] Satrabhandhu W, Tritilanunt S. Encrypted Traffic characterization using None Zero payload and Payload Ratio Characteristics[C]//2021 25th International Computer Science and Engineering Conference (ICSEC). IEEE, 2021: 63-69.
    [15] Xu S J, Geng G G, et al. Seeing traffic paths: encrypted traffic classification with path signature features[J]. IEEE Transactions on Information Forensics and Security, 2022, 17: 2166-2181.
    [16] Chen Z, Cheng G, Wei Z, et al. Higher Layers, Better Results: Application Layer Feature Engineering in Encrypted Traffic Classification[C]//International Conference on Wireless Algorithms, Systems, and Applications. Cham: Springer Nature Switzerland, 2022: 548-556.
    [17] Weng Z, Chen T, Zhu T, et al. TLSmell: Direct Identification on Malicious HTTPs Encryption Traffic with Simple Connection-Specific Indicators[J]. Comput. Syst. Sci. Eng., 2021, 37(1): 105-119.
    [18] Tao L, Gu L. An improved fingerprint matching algorithm to detect malware encrypted traffic based on weighted Bayes[C]//International Conference on Cryptography, Network Security, and Communication Technology (CNSCT 2022). SPIE, 2022, 12245: 77-81.
    [19] Sun G, Chen T, Su Y, et al. Internet traffic classification based on incremental support vector machines[J]. Mobile Networks and Applications, 2018, 23: 789-796..
    [20] Zhioua S. Tor traffic analysis using hidden markov models[J]. Security and Communication Networks, 2013, 6(9): 1075-1086.
    [21] He G, Yang M, Luo J, et al. A novel application classification attack against Tor[J]. Concurrency and Computation: Practice and Experience, 2015, 27(18): 5640-5661.
    [22] Gupta N, Jindal V, Bedi P. Encrypted traffic classification using extreme gradient boosting algorithm[C]//International Conference on Innovative Computing and Communications: Proceedings of ICICC 2021, Volume 3. Springer Singapore, 2022: 225-232.
    [23] Afuwape A A, Xu Y, Anajemba J H, et al. Performance evaluation of secured network traffic classification using a machine learning approach[J]. Computer Standards Interfaces, 2021, 78: 103545.
    [24] U?urlu M, Do?ru ? A, Arslan R S. A new classification method for encrypted internet traffic using machine learning[J]. Turkish Journal of Electrical Engineering and Computer Sciences, 2021, 29(5): 2450-2468.
    [25] Isingizwe D F, Wang M, Liu W, et al. Analyzing learning-based encrypted malware traffic classification with automl[C]//2021 IEEE 21st International Conference on Communication Technology (ICCT). IEEE, 2021: 313-322.
    [26] Rao Z, Niu W, Zhang X S, et al. Tor anonymous traffic identification based on gravitational clustering[J]. Peer-to-Peer Networking and Applications, 2018, 11: 592-601.
    [27] Wang W, Zhu M, Wang J, et al. End-to-end encrypted traffic classification with one-dimensional convolution neural networks[C]//2017 IEEE international conference on intelligence and security informatics (ISI). IEEE, 2017: 43-48.
    [28] Shapira T, Shavitt Y. FlowPic: A generic representation for encrypted traffic classification and applications identification[J]. IEEE Transactions on Network and Service Management, 2021, 18(2): 1218-1232.
    [29] Ma X, Zhu W, Wei J, et al. EETC: An extended encrypted traffic classification algorithm based on variant resnet network[J]. Computers Security, 2023, 128: 103175.
    [30] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
    [31] Wang Y, Gao Y, Li X, et al. Encrypted Traffic Classification Model Based on SwinT-CNN[C]//2023 4th International Conference on Computer Engineering and Application (ICCEA). IEEE, 2023: 138-142.
    [32] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.
    [33] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
    [34] Cho K, van Merrienboer B, Gulcehre C, et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2014: 1724.
    [35] Liu X, You J, Wu Y, et al. Attention-based bidirectional GRU networks for efficient HTTPS traffic classification[J]. Information Sciences, 2020, 541: 297-315.
    [36] Zhao Z, Li Z, Jiang J, et al. ERNN: Error-Resilient RNN for Encrypted Traffic Detection towards Network-Induced Phenomena[J]. IEEE Transactions on Dependable and Secure Computing, 2023.
    [37] Song Z, Zhao Z, Zhang F, et al. I2RNN: An Incremental and Interpretable Recurrent Neural Network for Encrypted Traffic Classification[J]. IEEE Transactions on Dependable and Secure Computing, 2023.
    [38] Zhang H, Yu L, Xiao X, et al. TFE-GNN: A Temporal Fusion Encoder Using Graph Neural Networks for Fine-grained Encrypted Traffic Classification[C]//Proceedings of the ACM Web Conference 2023. 2023: 2066-2075.
    [39] Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs[J]. Advances in neural information processing systems, 2017, 30.
    [40] Diao Z, Xie G, Wang X, et al. EC-GCN: A encrypted traffic classification framework based on multi-scale graph convolution networks[J]. Computer Networks, 2023, 224: 109614.
    [41] Wang P, Wang Z, Ye F, et al. Bytesgan: A semi-supervised generative adversarial network for encrypted traffic classification in SDN edge gateway[J]. Computer Networks, 2021, 200: 108535.
    [42] Wang P, Li S, Ye F, et al. PacketCGAN: Exploratory study of class imbalance for encrypted traffic classification using CGAN[C]//ICC 2020-2020 IEEE International Conference on Communications (ICC). IEEE, 2020: 1-7.
    [43] Mirza M, Osindero S. Conditional generative adversarial nets[J]. arXiv preprint arXiv:1411.1784, 2014.
    [44] Shi Z, Luktarhan N, Song Y, et al. BFCN: A Novel Classification Method of Encrypted Traffic Based on BERT and CNN[J]. Electronics, 2023, 12(3): 516.
    [45] Shi Z, Luktarhan N, Song Y, et al. TSFN: A Novel Malicious Traffic Classification Method Using BERT and LSTM[J]. Entropy, 2023, 25(5): 821.
    [46] He H Y, Yang Z G, Chen X N. PERT: Payload encoding representation from transformer for encrypted traffic classification[C]//2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K). IEEE, 2020: 1-8.
    [47] Kenton J D M W C, Toutanova L K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT. 2019: 4171-4186.
    [48] Shin C Y, Park J T, Baek U J, et al. A Feasible and Explainable Network Traffic Classifier Utilizing DistilBERT[J]. IEEE Access, 2023.
    [49] Sanh V, Debut L, Chaumond J, et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter[J]. arXiv preprint arXiv:1910.01108, 2019.
    [50] Dai J, Xu X, Xiao F. GLADS: A global-local attention data selection model for multimodal multitask encrypted traffic classification of IoT[J]. Computer Networks, 2023, 225: 109652.
    [51] Aceto G, Ciuonzo D, Montieri A, et al. DISTILLER: Encrypted traffic classification via multimodal multitask deep learning[J]. Journal of Network and Computer Applications, 2021, 183: 102985.
    [52] Yang C, Xiong G, Zhang Q, et al. Few-shot encrypted traffic classification via multi-task representation enhanced meta-learning[J]. Computer Networks, 2023, 228: 109731.
    [53]侯剑,鲁辉,刘方爱等.加密恶意流量检测及对抗综述[J].软件学报,2024,35(01):333-355.
    [54] Yuan X, He P, Zhu Q, et al. Adversarial examples: Attacks and defenses for deep learning[J]. IEEE transactions on neural networks and learning systems, 2019, 30(9): 2805-2824.
    [55] Moraffah R, Karami M, Guo R, et al. Causal interpretability for machine learning-problems, methods and evaluation[J]. ACM SIGKDD Explorations Newsletter, 2020, 22(1): 18-33.
    [56] Liu K, Fu Y, Wu L, et al. Automated feature selection: A reinforcement learning perspective[J]. IEEE Transactions on Knowledge and Data Engineering, 2021.
    [57] Ren P, Xiao Y, Chang X, et al. A comprehensive survey of neural architecture search: Challenges and solutions[J]. ACM Computing Surveys (CSUR), 2021, 54(4): 1-34.
    [58] Baltru?aitis T, Ahuja C, Morency L P. Multimodal machine learning: A survey and taxonomy[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 41(2): 423-443.
    [59] Liu Y, James J Q, Kang J, et al. Privacy-preserving traffic flow prediction: A federated learning approach[J]. IEEE Internet of Things Journal, 2020, 7(8): 7751-7763.
    [60] Zhao Y, Chen J, Wu D, et al. Multi-task network anomaly detection using federated learning[C]//Proceedings of the 10th international symposium on information and communication technology. 2019: 273-279.
    相似文献
    引证文献
引用本文

引文格式
仝 鑫,杨 莹,索奇伟,等.基于机器学习的加密流量分析方法综述 [J].集成技术,2024,13(5):74-92

Citing format
TONG Xin, YANG Ying, SUO Qiwei, et al. A Survey of Machine Learning-Based Encrypted Traffic Analysis Methods[J]. Journal of Integration Technology,2024,13(5):74-92

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-01-30
  • 最后修改日期:2024-02-02
  • 在线发布日期: 2024-07-16
文章二维码