I am currently a second-year Master’s student at the College of Computer Science and Artificial Intelligence, Fudan University.
My research interests lie in Multi-modal large models(MLLM) and Controllable image/video generation.
I am currently an Applied Research Intern at ByteDance, where I work on multimodal embedding models and post-training of multimodal large models.
News
- 2026.02: I joined ByteDance
as an Applied Research Intern, working on MLLMs.
- 2025.10: I joined Meituan
Intelligent Creation Team as a Research Intern.
Publications
Under Review

Seamless Sketch-guided Image Inpainting via Flow-based Background Trajectory Alignment
Lintao Zhang*, Runfeng Bao*, Quan Zhou, Xichen Ye, Xiangcheng Du, Yingbin Zheng, WEIZHONG ZHANG, Peizhu Gong, Cheng Jin
- Tackles the tension between structural controllability and visual consistency in sketch-guided image inpainting, where complex scenes often suffer from background drift and boundary discontinuities.
- Builds a unified text-sketch-mask inpainting framework with Region-Aware Sketch Control (RASC) for decoupled foreground/background guidance, and introduces FBTA/FBTA-fast to explicitly align background latent trajectories during diffusion sampling.
Under Review

SyncPainter: Caption-guided Video Inpainting with Multimodal Semantic Alignment
- Studies the logical distortion problem of current state-of-the-art video inpainting models under complex occlusion, and explores an automated restoration paradigm guided by MLLMs.
- Develops Mask-Aware Dual-Path Attention (MAAF), which decouples generation and reference streams for precise semantic-background alignment, together with latent-space temporal propagation for flow-free feature transfer and flicker reduction.
Educations
- 2024.09 - 2027.06, M.S. in Artificial Intelligence, Fudan University, Shanghai, China.
- 2020.10 - 2024.06, B.E. in Data Science and Big Data Technology, Zhejiang University of Technology, Hangzhou, China.
Internships
- ByteDance
2026.02 - Present, Applied Research Intern, focusing on multimodal embedding models and post-training of multimodal large models. - Meituan

2025.10 - 2026.01, Research Intern, Intelligent Creation Team, focusing on multimodal algorithms.