A Review of Scene Dynamic Control in Text-Guided Video Prediction Large Models

doi:10.12146/j.issn.2095-3135.20241201002

Home > Archive>Volume 0, Issue , >. DOI:10.12146/j.issn.2095-3135.20241201002

A Review of Scene Dynamic Control in Text-Guided Video Prediction Large Models
DOI:
                        10.12146/j.issn.2095-3135.20241201002
                    
CSTR:
                        
Author:
                        
Affiliation:Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Clc Number:TP 391.7
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In recent years, the rapid development of generative AI has made text-driven video prediction models a hot topic in academia and industry. Video prediction should address temporal dynamics and consistency, requiring precise control of scene structures, subject behaviors, camera movements, and semantic expressions. One major challenge is accurately controlling scene dynamics in video prediction to achieve high-quality, semantically consistent outputs. Researchers have proposed key control methods, including camera control, reference video control, semantic enhancement, and subject feature control. These methods aim to improve generation quality, ensuring outputs align with historical context while meeting user needs. This paper systematically explores the core concepts, advantages, limitations, and future directions of these four control approaches.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:December 01,2024
Revised:December 08,2024
Adopted:December 11,2024
Online: December 11,2024
Published:

Home

About Journal

Editorial Team

Author Center

Peer Review

Reader Center

Ethics

Contact us

中文

Get Citation

Share

Article Metrics

History

Article QR Code