I am currently a second-year Master’s student at the College of Computer Science and Artificial Intelligence, Fudan University.

My research interests lie in Multi-modal large models(MLLM) and Controllable image/video generation.

I am currently an Applied Research Intern at ByteDance, where I work on multimodal embedding models and post-training of multimodal large models.

News

  • 2026.02: I joined ByteDance as an Applied Research Intern, working on MLLMs.
  • 2025.10: I joined Meituan Intelligent Creation Team as a Research Intern.

Publications

Under Review
Unified multimodal image inpainting

Seamless Sketch-guided Image Inpainting via Flow-based Background Trajectory Alignment

Lintao Zhang*, Runfeng Bao*, Quan Zhou, Xichen Ye, Xiangcheng Du, Yingbin Zheng, WEIZHONG ZHANG, Peizhu Gong, Cheng Jin

  • Tackles the tension between structural controllability and visual consistency in sketch-guided image inpainting, where complex scenes often suffer from background drift and boundary discontinuities.
  • Builds a unified text-sketch-mask inpainting framework with Region-Aware Sketch Control (RASC) for decoupled foreground/background guidance, and introduces FBTA/FBTA-fast to explicitly align background latent trajectories during diffusion sampling.
Under Review
MLLM-guided video inpainting

SyncPainter: Caption-guided Video Inpainting with Multimodal Semantic Alignment

  • Studies the logical distortion problem of current state-of-the-art video inpainting models under complex occlusion, and explores an automated restoration paradigm guided by MLLMs.
  • Develops Mask-Aware Dual-Path Attention (MAAF), which decouples generation and reference streams for precise semantic-background alignment, together with latent-space temporal propagation for flow-free feature transfer and flicker reduction.

Educations

  • 2024.09 - 2027.06, M.S. in Artificial Intelligence, Fudan University, Shanghai, China.
  • 2020.10 - 2024.06, B.E. in Data Science and Big Data Technology, Zhejiang University of Technology, Hangzhou, China.

Internships

  • ByteDance
    2026.02 - Present, Applied Research Intern, focusing on multimodal embedding models and post-training of multimodal large models.
  • Meituan
    2025.10 - 2026.01, Research Intern, Intelligent Creation Team, focusing on multimodal algorithms.