|
Wentong Li 李文通
College of Artificial Intelligence
Nanjing University of Aeronautics and Astronautics (NUAA)
No.29 Jiangjun Road, Nanjing,China
Office: 1205, No.1 Building
|
|
About Me
I am an Associate Professor of the College of Artificial Intelligence at Nanjing University of Aeronautics and Astronautics.
In August 2025, I was a visiting researcher at Department of Computing, The Hong Kong Polytechnic University, where I collaborated with my Ph.D. advisor, Prof. Lei Zhang (IEEE Fellow).
Previously, I completed my Ph.D at College of Computer Science and Technology, Zhejiang University, supervised by Prof.
Jianke Zhu and Prof.
Lei Zhang , in June 2024.
My recent research interests are Visual/Scene Understanding, Embodied AI and Vision-Language Models, particularly in:
-
Fine-grained object-level spatial-temporal understanding:
PixelRefer(ArXiv2025),
VideoRefer(CVPR2025),
Osprey(CVPR2024)
-
Embodied understanding, reasoning, planning and action:
Inst3D-LLM(CVPR2025),
EOC-Bench(NeurIPS2025)
-
Efficient and effective VLMs/MLLMs:
TokenPacker(IJCV2025),
VisionTrim(ICLR2026)
-
Visual detection & segmentation:
Point2RBox-v3 (ICLR2026),
Box2Mask(T-PAMI2024),
Point2Mask(ICCV2023),
APro(NeurIPS2023),
H2RBox(ICLR2023),
Oriented RepPoints(CVPR2022)
Looking for self-motivated Masters, Research Interns/Assistants and Ph.Ds (co-supervised), please email me if you have interest.
News
-
[2026.01]: Two papers are accepted by ICLR 2026.
-
[2025.12]: We released a Survey forging Spatial Intelligence for Autonomous Systems.
-
[2025.11]: One paper about Object-level Generation on Camouflage Images is accepted by AAAI 2026.
-
[2025.11]: Our PixelRefer is reported by PaperWeekly and 机器之心, respectively.
-
[2025.10]: We released PixelRefer, a new unified pixel-level MLLM framework for fine-grained regional
understanding.
-
[2025.10]: Shared a talk@PRCV2025.[Slides]
-
[2025.9]: Two papers are accepted by NeurIPS 2025.
-
[2025.8]: Be funded by NSFC 🎉.
-
[2025.8]: Be invited to serve as Area Chair for ICLR 2026.
-
[2025.8]: Visited The Hong Kong Polytechnic University, where I enjoyed the visit and shared a talk.[Slides]
-
[2025.6]: We released the EOC-Bench, an object-centric embodied cognition benchmark in dynamic egocentric scenarios.
-
[2025.5]: One paper is accepted by IJCV (TokenPacker, 57 citations at the time of acceptance).
-
[2025.4]: Our VideoRefer and VideoRefer-Bench have been discussed and adopted by NVIDIA & UC Berkely in their DAM work.
-
[2025.2]: Five papers are accepted by CVPR 2025 (One Highlight).
-
[2025.2]: We released the VideoRefer-700K dataset on HuggingFace. Please see the VideoRefer Suite for the details.
-
[2024.12]: Awarded Outstanding Doctoral Dissertation Award of ZJU (浙江大学优秀博士学位论文).
-
[2024.6]: Obtained my Ph.D. degree from ZJU.
Publications
(*:equal contribution, #:corresponding author, +:project leader)
Preprints
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
Song Wang, Lingdong Kong, Xiaolu Liu, Hao Shi, Wentong Li, Jianke Zhu, Steven C. H. Hoi
Arxiv, 2025
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
Yuqian Yuan, Wenqiao Zhang, Xin Li, Shihao Wang, Kehan Li, Wentong Li#, Jun Xiao, Lei Zhang, Beng Chin Ooi
Arxiv, 2025
Selected Publications
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
Hanxun Yu*, Wentong Li*, Xuan Qu*, Song Wang, Junbo Chen, Jianke Zhu
ICLR, 2026
MUVR: A Multi-Modal Untrimmed Video Retrieval Benchmark with Multi-Level Visual Correspondence
Yue Feng, Jinwei Hu, Qijia Lu, Jiawei Niu, Li Tan, Shuo Yuan, Ziyi Yan, Yizhen Jia, Qingzhi He, Shiping Ge, Ethan Q. Chen, Wentong Li#, Limin Wang, Jie Qin
NeurIPS (DB Track), 2025
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
Yuqian Yuan*, Ronghao Dang*, Long Li*, Wentong Li*, Diao Jiao, Xin Li, Deli Zhao, Fan Wang, Wenqiao Zhang, Jun Xiao, Yueting Zhuang
NeurIPS (DB Track), 2025
TokenPacker: Efficient Visual Projector for Multimodal LLM
Wentong Li*, Yuqian Yuan*, Jian Liu, Dongqi Tang, Song Wang, Jie Qin, Jianke Zhu, Lei Zhang
IJCV, 2025
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing
CVPR, 2025
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu*, Wentong Li*, Song Wang, Junbo Chen, Jianke Zhu
CVPR, 2025 (Highlight, 2.9%)
Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan*, Wentong Li*+, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang, Jianke Zhu
CVPR, 2024 (Project Leader)
Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution
Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Risheng Yu, Xiansheng Hua, Lei Zhang
T-PAMI, 2024
Research Experiences
Honors
-
Outstanding Doctoral Dissertation Award of Zhejiang University, 2024
-
Excellent Doctoral Graduates of Zhejiang Province, China (Top 1%), 2024
-
Excellent Doctoral Graduates of Zhejiang University, 2024
-
Tencent Scholarship, 2023
-
Five-A Postgraduate Student, 2023
-
Outstanding Postgraduate Student, 2020-2023
-
Longhu Scholarship, 2022
-
First-class Academic Scholarship, 2018-2023
-
National Scholarship, 2016
Academic Services
-
Area Chair:
ICLR2026
-
Conference Reviewer:
AAAI2025, ICLR2025, CVPR2025, ICML2025, ICCV2025, NeurIPS2025, ACM MM2025
CVPR2024, ICLR2024, ICML2024, ECCV2024, ACM MM2024, NeurIPS2024
CVPR2023, ICCV2023, NeurIPS2023, ACM MM2023
-
Journal Reviewer:
Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
International Journal of Computer Vision (IJCV)
Transactions on Image Processing (TIP)
Transactions on Circuits and Systems for Video Technology (TCSVT)
Transactions on Multimedia (TMM)
Transactions on Geoscience and Remote Sensing (TGRS)
Pattern Recognition (PR)
ACM Computing Surveys
ISPRS Journal of Photogrammetry and Remote Sensing (P&RS)
Neurcomputing
Tech. Talks
-
Efficient Visual Understanding and Interaction with VLMs, PolyU HongKong, slides, 2025/08.
-
Fine-grained Image Understanding with VLMs, ECNU, Visual Perception+X(VPX) Group, 2024/09.
-
Osprey:Pixel Understanding with Visual Instruction Tuning, Video, slides, AI TIME, 2024/01.
-
Point-supervised Image Segmentation, AntGroup, Machine Intelligence Group, 2023/09.
Teaching
-
Intro. to AI: A Foundational Course, NUAA, Fall 2025.
-
Foundations and Frontiers of Multimodal Large Models, NUAA, Spring 2025.
-
Image Processing and Analysis, Police Brain of Zhejiang Province, Teaching Assistant, Fall 2022.
-
FDS2021: Foundation of Data Structure, Zhejiang University, Teaching Assistant, Fall 2021.
| © Wentong Li | Last update: Jan. 2026 |