About Me

I am actively looking for industry positions in MLLM pretraining, VLA, World Model, and related areas. Feel free to reach out via email if you are interested!

Hi 👋, I am a final-year Ph.D. candidate at the University of Adelaide, supervised by A/Prof. Bohan Zhuang and A/Prof. Qi Wu.

My research focuses on developing efficient and scalable AI algorithms for multimodal understanding and image generation. Previously, I have worked on the following topics:

  • Long Context Efficiency of MLLMs across inference, training, and post-training, including ZipVL, OmniSparse, and Sparsity Forcing.

  • Efficient Autoregressive Image Generation with speculative decoding, parallel pretrain and post-training distillation, including ZipAR, NAR and FlashAR.

Currently, my research focuses on embodied learning, with two main directions:

  • Efficient Long-horizon VLA/WAM: improving policy efficiency and computation efficiency of VLA/WAM models to complete tasks with fewer action steps and over longer horizons, reducing redundant inference cost while maintaining robust task performance.

  • 3D-aware Physical World Models: building world models with a 3D Gaussian-aware tokenizer for spatially grounded scene representation, and designing benchmarks at the level of physical formulas to evaluate whether models truly understand physical laws, towards large-scale world simulation.

We are looking for highly self-motivated Master, PhD students and interns. If you are interested in joining our ZipLab team, please feel free to contact us with your CV as the post!

News

  • [2026.06] Our self-evolving multi-agent framework for 3D understanding is now released.
  • [2026.06] Our PolicyTrim, a policy acceleration framework for VLA, is now released.
  • [2026.06] 🎉 4 papers accepted to ECCV 2026.
  • [2026.06] Our latent 3D memory for video world model is now released.
  • [2026.05] Our A3, a dynamic execution horizon method for VLA, is now released.
  • [2026.05] Our FlashAR, a post-training acceleration for AR image generationm, is now released.
  • [2026.04] 🎉 1 paper accepted to IEEE TAFFC.
  • [2026.04] 🎉 2 papers accepted to ACL 2026.
  • [2026.03] 🎉 4 papers accepted to CVPR 2026.

Work Experience

  • Research Intern, TikTok, Sydney (Oct 2024 – Apr 2025)
  • Research Intern, Ant Group, Hangzhou (Sep 2022 – Apr 2024)

Preprint

arXiv
Latent Spatial Memory for Video World Models
Weijie Wang, Haoyu Zhao, Yifan Yang, Feng Chen, Zeyu Zhang, Yefei He, Zicheng Duan, Donny Y. Chen, Yuqing Yang, Bohan Zhuang
arXiv preprint, 2026
[Paper] [Project] [Code] [Hugging Face]
arXiv
FlashAR: Efficient Post-Training Acceleration for Autoregressive Image Generation
Junkang Zhou*, Yefei He*, Feng Chen*, Weijie Wang, Bohan Zhuang  
arXiv preprint, 2026
[Paper] [Project] [Code]
arXiv
Dynamic Execution Commitment of Vision-Language-Action Models
Feng Chen, Xianghui Wang, Yuxuan Chen, Boying Li, Yefei He, Zeyu Zhang, Yicheng Wu  
arXiv preprint, 2026
[Paper] [Project] [Code]

Selected Publications

* Equal contribution. Project lead.

Efficient MLLM/VLA/Agent

ECCV
PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models
Xianghui Wang*, Feng Chen*, Wenbo Zhang, Hua Yan, Zixuan Wang, Changsheng Li, Yinjie Lei
ECCV, 2026
[Paper] [Project] [Code]
ECCV
Agentic Collaborative Cognition for Zero-Shot 3D Understanding
Wenxin Wang*, Bo Zhang*, Feng Chen, Zixuan Wang, Wen Li, Changsheng Li, Yinjie Lei
ECCV, 2026
CVPR
Beyond Accuracy: An Empirical Study of Perception Stability in Multimodal Large Language Models
Feng Chen, Chenhui Gou, Yefei He, Yang Yang, Bohan Zhuang, Qi Wu
CVPR Findings, 2026
[Paper] [Code]
ICLR
Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
Feng Chen, Yefei He, Lequan Lin, Chenhui Gou, Jing Liu, Bohan Zhuang, Qi Wu
ICLR, 2026
[Paper]
AAAI
OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs
Feng Chen, Yefei He, Shaoxuan He, Yuanyu He, Jing Liu, Lequan Lin, Akide Liu, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Bohan Zhuang, Qi Wu
AAAI, 2026
[Paper]
ICCV
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
Yefei He, Feng Chen, Jing Liu, Wenqi Shao, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
ICCV, 2025
[Paper]
ACL
Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention
Zhen Yang, Mingyang Zhang, Feng Chen, Ganggui Ding, Liang Hou, Xin Tao, Ying-Cong Chen
ACL, 2026
[Paper] [Code]
NeurIPS
ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking
Lequan Lin, Dai Shi, Andi Han, Feng Chen, Qiuzheng Chen, Jiawen Li, Zhaoyang Li, Jiyuan Li, Zhenbang Sun, Junbin Gao
NeurIPS, 2025
[Paper]

Efficient and Controllable World Model

ECCV
LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models
Zicheng Duan, Jiatong Xia, Zeyu Zhang, Wenbo Zhang, Gengze Zhou, Chenhui Gou, Yefei He, Feng Chen, Xinyu Zhang, Lingqiao Liu
ECCV, 2026
[Paper] [Project] [Code]
CVPR
Chain of Event-Centric Causal Thought for Physically Plausible Video Generation
Zixuan Wang*, Yixin Hu*, Haolan Wang, Feng Chen*, Yan Liu, Wen Li, Yinjie Lei  
CVPR, 2026
[Paper] [Code]
ICCV
Neighboring Autoregressive Modeling for Efficient Visual Generation
Yefei He*, Yuanyu He*, Shaoxuan He*, Feng Chen*, Hong Zhou, Kaipeng Zhang, Bohan Zhuang  
ICCV, 2025
[Paper] [Project] [Code]
ICML
ZipAR: Accelerating Auto-Regressive Image Generation through Spatial Locality
Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
ICML, 2025
[Paper] [Code]
CVPR
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Zixuan Wang, Duo Peng, Feng Chen, Yuwei Yang, Yinjie Lei
CVPR, 2025
[Paper] [Code]
CVPR
Training-free Motion Factorization for Compositional Video Generation
Zixuan Wang, Ziqin Zhou, Feng Chen, Duo Peng, Yixin Hu, Changsheng Li, Yinjie Lei
CVPR, 2026
[Paper]

Others

Information Fusion
Balanced Cross-modal Prompt Learning and Fusion Network for Multi-modal Fake News Detection
Fei Wu, Hao Jin, Feng Chen, Yimu Ji, Xiao-Yuan Jing, Guo-Ping Jiang
Information Fusion, 2025
[Paper]
Pattern Recognition
Learning Multi-granularity Representation with Transformer for Visible-Infrared Person Re-identification
Yujian Feng*, Feng Chen*, Guozi Sun, Fei Wu, Yimu Ji, Tianliang Liu, Shangdong Liu, Xiao-Yuan Jing, Jiebo Luo
Pattern Recognition, 2025
[Paper]
Pattern Recognition
Homogeneous and Heterogeneous Relational Graph for Visible-Infrared Person Re-Identification
Yujian Feng*, Feng Chen*, Jian Yu, Yimu Ji, Fei Wu, Shangdong Liu, Xiao-Yuan Jing
Pattern Recognition, 2025
[Paper] [Code]
IEEE TMM
Cross-Modality Spatial-Temporal Transformer for Video-Based Visible-Infrared Person Re-Identification
Yujian Feng*, Feng Chen*, Jian Yu, Yimu Ji, Fei Wu, Tianliang Liu, Shangdong Liu, Xiao-Yuan Jing, Jiebo Luo
IEEE Transactions on Multimedia, 2024
[Paper]
IEEE TMM
Visible-Infrared Person Re-Identification via Cross-Modality Interaction Transformer
Yujian Feng, Jian Yu, Feng Chen, Yimu Ji, Fei Wu, Shangdong Liu, Xiao-Yuan Jing
IEEE Transactions on Multimedia, 2023
[Paper]
SDM
Coarse-to-Fine Open Information Extraction via Relation Oriented Reading Comprehension
Tingxin Li, Rui Meng, Feng Chen, Jianming Wu
SDM, 2023
[Paper]
ICCV
Uncertainty-guided Learning for Improving Image Manipulation Detection
Kaixiang Ji, Feng Chen, Xin Guo, Yadong Xu, Jian Wang, Jingdong Chen
ICCV, 2023
[Paper]
ACM MM
Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER
Feng Chen, Jiajia Liu, Kaixiang Ji, Wang Ren, Jian Wang, Jingdong Wang
ACM MM, 2023
[Paper]
Pattern Recognition
JSPNet: Learning Joint Semantic & Instance Segmentation of Point Clouds via Feature Self-Similarity and Cross-Task Probability
Feng Chen, Fei Wu, Guangwei Gao, Yimu Ji, Jing Xu, Guo-Ping Jiang, Xiao-Yuan Jing
Pattern Recognition, 2022
[Paper] [Code]

Professional Activities

  • Reviewer: CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, IJCV

Awards

  • Jiangsu Province Outstanding Graduate, 2021
  • Jiangsu Province Outstanding Master Thesis Award, 2021