About Me

I am actively looking for industry positions in MLLM pretraining, VLA, World Model, and related areas. Feel free to reach out via email if you are interested!

Hi 👋, I am a final-year Ph.D. candidate at the University of Adelaide, supervised by A/Prof. Bohan Zhuang and A/Prof. Qi Wu.

My research focuses on developing efficient and scalable AI algorithms for multimodal learning and image generation. Previously, I have worked on the following topics:

  • Long Context Efficiency of MLLMs across inference, training, and post-training, including ZipVL, OmniSparse, and Sparsity Forcing.

  • Efficient Autoregressive Image Generation with speculative decoding, parallel pipeline design and post-training distillation, including ZipAR, NAR and FlashAR (upcoming).

Currently, my research focuses on embodied learning, with two main directions:

  • Efficient VLA: improving policy efficiency by enabling VLA models to complete tasks with fewer action steps and over longer horizons, reducing redundant inference while maintaining robust task performance.

  • Physical-aware World Models: building world models with a 3D Gaussian-aware tokenizer for spatially grounded scene representation, and designing benchmarks at the level of physical formulas to evaluate whether models truly understand physical laws, towards large-scale world simulation.

News

  • [2026.04] 🎉 1 paper accepted to IEEE TAFFC.
  • [2026.04] 🎉 2 papers accepted to ACL 2026.
  • [2026.03] 🎉 4 papers accepted to CVPR 2026.

Work Experience

  • Research Intern, TikTok, Sydney (Oct 2024 – Apr 2025)
  • Research Intern, Ant Group, Hangzhou (Sep 2022 – Apr 2024)

Selected Publications

Efficient MLLM

Beyond Accuracy: An Empirical Study of Perception Stability in Multimodal Large Language Models
Feng Chen, Chenhui Gou, Yefei He, Yang Yang, Bohan Zhuang, Qi Wu
CVPR Findings, 2026
Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
Feng Chen, Yefei He, Lequan Lin, Chenhui Gou, Jing Liu, Bohan Zhuang, Qi Wu
ICLR, 2026
[Paper]
OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs
Feng Chen, Yefei He, Shaoxuan He, Yuanyu He, Jing Liu, Lequan Lin, Akide Liu, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Bohan Zhuang, Qi Wu
AAAI, 2026
[Paper]
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
Yefei He, Feng Chen, Jing Liu, Wenqi Shao, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
ICCV, 2025
[Paper]
Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention
Zhen Yang, Mingyang Zhang, Feng Chen, Ganggui Ding, Liang Hou, Xin Tao, Ying-Cong Chen
ACL, 2026
[Paper]
ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking
Lequan Lin, Dai Shi, Andi Han, Feng Chen, Qiuzheng Chen, Jiawen Li, Zhaoyang Li, Jiyuan Li, Zhenbang Sun, Junbin Gao
NeurIPS, 2025
[Paper]

Efficient and Controllable World Model

LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models
Zicheng Duan, Jiatong Xia, Zeyu Zhang, Wenbo Zhang, Gengze Zhou, Chenhui Gou, Yefei He, Feng Chen (Project Lead), Xinyu Zhang, Lingqiao Liu
arXiv preprint, 2026
[Paper]
Chain of Event-Centric Causal Thought for Physically Plausible Video Generation
Zixuan Wang*, Yixin Hu*, Haolan Wang, Feng Chen*, Yan Liu, Wen Li, Yinjie Lei  
CVPR, 2026
[Paper]
Neighboring Autoregressive Modeling for Efficient Visual Generation
Yefei He*, Yuanyu He*, Shaoxuan He*, Feng Chen*, Hong Zhou, Kaipeng Zhang, Bohan Zhuang  
ICCV, 2025
[Paper]
ZipAR: Accelerating Auto-Regressive Image Generation through Spatial Locality
Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
ICML, 2025
[Paper]
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Zixuan Wang, Duo Peng, Feng Chen, Yuwei Yang, Yinjie Lei
CVPR, 2025
[Paper]
Training-free Motion Factorization for Compositional Video Generation
Zixuan Wang, Ziqin Zhou, Feng Chen, Duo Peng, Yixin Hu, Changsheng Li, Yinjie Lei
CVPR, 2026
[Paper]

Others

Uncertainty-guided Learning for Improving Image Manipulation Detection
Kaixiang Ji, Feng Chen, Xin Guo, Yadong Xu, Jian Wang, Jingdong Chen
ICCV, 2023
[Paper]
Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER
Feng Chen, Jiajia Liu, Kaixiang Ji, Wang Ren, Jian Wang, Jingdong Wang
ACM MM, 2023
[Paper]

Professional Activities

  • Reviewer: CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, IJCV

Awards

  • Jiangsu Province Outstanding Graduate, 2021
  • Jiangsu Province Outstanding Master Thesis Award, 2021