WU Fuxiang
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, ChinaCHENG Jun
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, ChinaTP 391.7
This work is supported by National Natural Science Foundation of China (U21A20487, 62372440)
In recent years, the rapid development of generative AI has made text-driven video prediction large models a hot topic in academia and industry. Video prediction and generation should address temporal dynamics and consistency, requiring precise control of scene structures, subject behaviors, camera movements, and semantic expressions. One major challenge is accurately controlling scene dynamics in video prediction to achieve high-quality, semantically consistent outputs. Researchers have proposed key control methods, including camera control enhancement, reference video control, semantic consistency enhancement, and subject feature control improvement. These methods aim to improve generation quality, ensuring outputs align with historical context while meeting user needs. This paper systematically explores the core concepts, advantages, limitations, and future directions of these four control approaches.
WU Fuxiang, CHENG Jun. A Review of Scene Dynamic Control in Text-Guided Video Prediction Large Models[J]. Journal of Integration Technology,2025,14(1):9-24
CopyMobile website