I am currently a second-year Master’s student at the College of Computer Science and Artificial Intelligence, Fudan University.

My research interests lie in Multi-modal large models(MLLM) and Controllable image/video generation.

I am currently an Applied Research Intern at ByteDance, where I work on multimodal embedding models and post-training of multimodal large models.

News

Publications

Under Review

Seamless Sketch-guided Image Inpainting via Flow-based Background Trajectory Alignment

Lintao Zhang^*, Runfeng Bao^*, Quan Zhou, Xichen Ye, Xiangcheng Du, Yingbin Zheng, WEIZHONG ZHANG, Peizhu Gong, Cheng Jin

Tackles the tension between structural controllability and visual consistency in sketch-guided image inpainting, where complex scenes often suffer from background drift and boundary discontinuities.
Builds a unified text-sketch-mask inpainting framework with Region-Aware Sketch Control (RASC) for decoupled foreground/background guidance, and introduces FBTA/FBTA-fast to explicitly align background latent trajectories during diffusion sampling.

Under Review

SyncPainter: Caption-guided Video Inpainting with Multimodal Semantic Alignment

Studies the logical distortion problem of current state-of-the-art video inpainting models under complex occlusion, and explores an automated restoration paradigm guided by MLLMs.
Develops Mask-Aware Dual-Path Attention (MAAF), which decouples generation and reference streams for precise semantic-background alignment, together with latent-space temporal propagation for flow-free feature transfer and flicker reduction.

2024.09 - 2027.06, M.S. in Artificial Intelligence, Fudan University, Shanghai, China.
2020.10 - 2024.06, B.E. in Data Science and Big Data Technology, Zhejiang University of Technology, Hangzhou, China.

ByteDance
2026.02 - Present, Applied Research Intern, focusing on multimodal embedding models and post-training of multimodal large models.
Meituan
2025.10 - 2026.01, Research Intern, Intelligent Creation Team, focusing on multimodal algorithms.