Abstract:Data provenance technology is capable of recording and tracking the origins of sensitive documents to prevent their leakage. Traditional network path tracing methods are ineffective in tracking offline documents, and key tracing for encrypted files does not ensure reliable provenance for shared files. Existing techniques such as annotation, reverse querying, and data watermarking often require user involvement and are implemented at the application layer, resulting in inadequate security, lack of transparency and flexibility, and insufficient overall system scalability. This paper introduces an innovative script-based dynamic fingerprint provenance architecture that utilizes modifications to the Linux kernel to achieve foundational provenance, enhancing the security and transparency of document tracing. The fingerprint tracking algorithm is implemented through user scripts, improving the flexibility and effectiveness of document provenance. Additionally, the fingerprint-driven algorithm is designed to meet the demands of multi-load sharing, ensuring efficient and scalable document sharing. Upon verification, this architecture has a minimal impact on the operating system and exhibits excellent scalability. In scenarios involving single or multiple load sharing, the fingerprint-driven algorithm demonstrates transparency, real-time performance, and efficiency.