Wentong Li 李文通

College of Artificial Intelligence

Nanjing University of Aeronautics and Astronautics (NUAA)

No.29 Jiangjun Road, Nanjing,China

Office: 1205, No.1 Building

     

About Me

I am an Associate Professor of the College of Artificial Intelligence at Nanjing University of Aeronautics and Astronautics, working closely with Prof. Jie Qin. Previously, I completed my Ph.D at College of Computer Science and Technology, Zhejiang University, fortunately supervised by Prof. Jianke Zhu and Prof. Lei Zhang (PolyU HK, IEEE Fellow), in June 2024. My recent research interests are Visual/Scene Understanding, Embodied AI and Multimodal Large Language Models, particularly in:
1. Common vision-language tasks with MLLMs/VLMs, including visual referring&grounding for image/video/3D scene.
2. Embodied scene understanding&interaction , including ego-centric image/video analysis, reasoning and interaction.
3. Efficient and effective MLLMs, including token reduction, lightweight mllm, efficient high-resolution understanding.
Before, I mainly focus on the field of the techniques for object detection, image segmentaion and their weakly-supervised/label-efficient approaches. Besides, I am also interested in autonomous driving tasks (HD-Map, 3D-Occupancy, etc.) and 3D reconstruction tasks.

News

Preprints

photo
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
Yuqian Yuan*, Ronghao Dang*, Long Li*, Wentong Li*, Diao Jiao, Xin Li, Deli Zhao, Fan Wang, Wenqiao Zhang, Jun Xiao, Yueting Zhuang
Arxiv, 2506.05287, 2025
photo
TDS-CLIP: Temporal Difference Side Network for Efficient Video Action Recognition
Bin Wang, Wentong Li, Wenqian Wang, Mingliang Gao, Runmin Cong and Wei Zhang
Arxiv, 2025

PaperCode

Selected Publications

photo
TokenPacker: Efficient Visual Projector for Multimodal LLM
Wentong Li*, Yuqian Yuan*, Jian Liu, Dongqi Tang, Song Wang, Jie Qin, Jianke Zhu, Lei Zhang
IJCV (Accept), 2025

PaperCode HuggingFace Model | 中文解读Daily Papers | Citations:70+

photo
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning 🔥
Hanxun Yu*, Wentong Li*, Song Wang, Junbo Chen, Jianke Zhu
CVPR, 2025 (Highlight, 2.9%)

PaperCode

photo
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM 🔥
Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing
CVPR, 2025
photo
Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan*, Wentong Li*, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang, Jianke Zhu
CVPR, 2024 (Project Leader)
photo
Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution
Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Risheng Yu, Xiansheng Hua, Lei Zhang
T-PAMI, 2024

Full Publications

TokenPacker: Efficient Visual Projector for Multimodal LLM
Wentong Li*, Yuqian Yuan*, Jian Liu, Dongqi Tang, Song Wang, Jie Qin, Jianke Zhu, Lei Zhang
IJCV, 2025.

Reliable and Calibrated Semantic Occupancy Prediction by Hybrid Uncertainty Learning
Song Wang, Zhongdao Wang, Jiawei Yu, Wentong Li, Bailan Feng,Junbo Chen, Jianke Zhu
IJCAI, 2025.

Large Models are Good Annotators for Zero-Shot Learning
Qingzhi He, Yizhen Jia, Wentong Li, Shengcai Liao, Rong Quan, Tong Cui, Jie Qin
SIGIR, 2025.

Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu*, Wentong Li*, Song Wang, Junbo Chen, Jianke Zhu
CVPR, 2025.

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing
CVPR, 2025.

PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Song Wang, Xiaolu Liu, Lingdong Kong, Jianyun Xu, Chunyong Hu, Gongfan Fang, Wentong Li, Jianke Zhu, Xinchao Wang
CVPR, 2025.

Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction
Xiaolu Liu, Ruizi Yang, Song Wang, Wentong Li, Junbo Chen, Jianke Zhu
CVPR, 2025.

Scalable Autoregressive Monocular Depth Estimation
Jinhong Wang, Jian Liu, Dongqi Tang, Weiqiang Wang, Wentong Li, Danny Chen, Jintai Chen, Jian Wu
CVPR, 2025.

Label-efficient Semantic Scene Completion with Scribble Annotations
Song Wang, Jiawei Yu, Wentong Li, Hao Shi, Kailun Yang, Junbo Chen, Jianke Zhu
IJCAI, 2025.

Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan*, Wentong Li*, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang, Jianke Zhu
CVPR, 2024.

Not All Voxels Are Equal: Hardness-aware Semantic Scene Completion with Self-distillation
Song Wang, Jiawei Yu, Wentong Li, Wenyu Liu, Xiaolu Liu, Junbo Chen, Jianke Zhu
CVPR, 2024.

MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
Xiaolu Liu, Song Wang, Wentong Li, Ruizi Yang, Junbo Chen, Jianke Zhu
CVPR, 2024.

Box2mask: Box-supervised instance segmentation via level-set evolution
Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Yu Risheng, Xiansheng Hua, Lei Zhang
T-PAMI, 2024.

Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering
Qijun Gan, Wentong Li, Jinwei Ren, Jianke Zhu
AAAI, 2024.

Label-efficient Segmentation via Affinity Propagation
Wentong Li*, Yuqian Yuan*, Song Wang, Wenyu Liu, Dongqi Tang, Jian Liu, Jianke Zhu, Lei Zhang
NeurIPS, 2023.

Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport
Wentong Li, Yuqian Yuan, Song Wang, Jianke Zhu, Jianshu Li, Jian Liu, Lei Zhang
ICCV, 2023.

Improving Nighttime Driving-scene Segmentation via Dual Image-adaptive Learnable Filters
Wenyu Liu, Wentong Li, Jianke Zhu, Miaomiao Cui, Xuansong Xie, Lei Zhang
T-CSVT, 2023.

LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation
Song Wang, Wentong Li, Wenyu Liu, Xiaolu Liu, Jianke Zhu
CVPR, 2023.

H2RBox: Horizonal Box Annotation is All You Need for Oriented Object Detection
Xue Yang, Gefan Zhang, Wentong Li, Xuehui Wang, Yue Zhou, Junchi Yan
ICLR, 2023.

Box-supervised Instance Segmentation with Level Set Evolution
Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Xian-Sheng Hua, Lei Zhang
ECCV, 2022.

Translational symmetry-aware facade parsing for 3-D building reconstruction
Hantang Liu, Wentong Li, Jianke Zhu
IEEE MultiMedia, 2022.

Oriented Reppoints for Aerial Object Detection
Wentong Li, Yijie Chen, Kaixuan Hu, Jianke Zhu
CVPR, 2022.

Research Experiences

Honors

Academic Services

Tech. Talks

Teaching

People

© Wentong Li | Last update: JUNE 2025