Visual Spatial Intelligence
G2TAM: Geometry Grounded Track Anything Model
Arxiv 2026
Chenming Zhu, Peizhou Cao, Jingli Lin, Wenbo Hu, Yunlong Ran, Tai Wang,
Jiangmiao Pang, Xihui Liu
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
Arxiv 2026
Jingli Lin*, Runsen Xu*, Shaohao Zhu, Sihan Yang, Peizhou Cao, Yunlong Ran, Miao Hu,
Chenming Zhu, Yiman Xie, Yilin Long, Wenbo Hu, Dahua Lin, Tai Wang, Jiangmiao Pang
G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
CVPR 2026
Wenbo Hu, Jingli Lin, Yilin Long, Yunlong Ran, Lihan Jiang, Yifan Wang, Chenming Zhu,
Runsen Xu, Tai Wang, Jiangmiao Pang
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
ICLR 2026
Sihan Yang*, Runsen Xu*, Yiman Xie, Sizhe Yang, Mo Li, Jingli Lin, Chenming Zhu,
Xiaochen Chen, Haodong Duan, Xiangyu Yue, Dahua Lin, Tai Wang, Jiangmiao Pang
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
NeurIPS 2025
Jingli Lin*, Chenming Zhu*, Runsen Xu, Xiaohan Mao, Xihui Liu, Tai Wang, Jiangmiao Pang
Project Lead
Embodied 3D Perception
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities
ICCV 2025
Chenming Zhu, Tai Wang, Wenwei Zhang, Jiangmiao Pang, Xihui Liu
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
ECCV 2024
Chenming Zhu, Tai Wang, Wenwei Zhang, Kai Chen, Xihui Liu
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
NeurIPS 2024
Ruiyuan Lyu, Tai Wang, Jingli Lin, Shuai Yang, Xiaohan Mao, Yilun Chen, Runsen Xu, Haifeng Huang,
Chenming Zhu, Dahua Lin, Jiangmiao Pang
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
CVPR 2024
Tai Wang*, Xiaohan Mao*, Chenming Zhu*, Runsen Xu, Ruiyuan Lyu, Peisen Li, Xiao Chen,
Wenwei Zhang, Kai Chen, Tianfan Xue, Xihui Liu, Cewu Lu, Dahua Lin, Jiangmiao Pang
Vision-Language Navigation (VLN)
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling
ICRA 2026
Meng Wei, Chenyang Wan, Xiqian Yu, Tai Wang, Yuqiang Yang, Xiaohan Mao, Chenming Zhu,
Wenzhe Cai, Hanqing Wang, Yilun Chen, Xihui Liu, Jiangmiao Pang
InternVLA-N1: An Open Dual-System Vision-Language Navigation Foundation Model with Learned Latent Plans
Technical report 2026
Core Contributor