Publications

A collection of my research work.

RoboForge: Physically Optimized Text-guided Whole-Body Locomotion for Humanoids

X Yuan, Zhe Li, B Lyu, K Zuo, Y Lu, G Li, J Yang

arXiv preprint arXiv:2603.17927 2026

Mosaic: Bridging the Sim-to-Real Gap in Generalist Humanoid Motion Tracking and Teleoperation with Rapid Residual Adaptation

Z Sun, BS Huang, Y Peng, X Li, J Ma, Y Sun, Zhe Li, H Jiang, B Gao, Z Bing

arXiv preprint arXiv:2602.08594 2026

Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models

S Bai, J Lyu, W Zhou, Zhe Li, D Wang, L Xing, X Zhao, P Wang, Z Wang

arXiv preprint arXiv:2602.01166 2026

RoboBrain 2.5: Depth in Sight, Time in Mind

H Tan, E Zhou, Zhe Li, Y Xu, Y Ji, X Chen, C Chi, P Wang, H Jia, Y Ao

arXiv preprint arXiv:2601.14352 2026

Do You Have Freestyle? Expressive Humanoid Locomotion via Audio Control

Zhe Li, Cheng Chi, Yangyang Wei, Boan Zhu, Tao Huang, Zhenguo Sun, Yibo Peng, Pengwei Wang, Zhongyuan Wang, Fangzhou Liu, Chang Xu, Shanghang Zhang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

Highlight

From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance

Zhe Li, Yangyang Wei, Boan Zhu, Yibo Peng, Tao Huang, Pengwei Wang, Zhongyuan Wang, Cheng Chi, Shanghang Zhang, Chang Xu

International Conference on Learning Representations (ICLR) 2026

RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion

Zhe Li, C Chi, B Zhu, Y Wei, S Bai, Y Ji, Y Peng, T Huang, P Wang, Z Wang

arXiv preprint arXiv:2512.23649 2025

Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives

S Bai, W Song, J Chen, Y Ji, Z Zhong, J Yang, H Zhao, W Zhou, Zhe Li

arXiv preprint arXiv:2512.22983 2025

OmniMotion: Multimodal Motion Generation with Continuous Masked Autoregression

Zhe Li, Weihao Yuan, Weichao Shen, Siyu Zhu, Zilong Dong, Chang Xu

arXiv preprint arXiv:2510.14954 2025

ANIgs: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction

L Qiu, S Zhu, Q Zuo, X Gu, Y Dong, J Zhang, C Xu, Zhe Li, W Yuan, L Bo

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning

Zhe Li, Weihao Yuan, Yisheng He, Lingteng Qiu, Shenhao Zhu, X Gu, Weichao Shen, Y Dong, Zilong Dong

International Conference on Learning Representations (ICLR) 2025

Interpretable Multimodal Tucker Fusion Model with Information Filtering for Multimodal Sentiment Analysis

X Nie, Laurence T. Yang, Zhe Li, X Deng, F Fan, Z Yang

IEEE Transactions on Computational Social Systems 2024

MCMat: Multiview-consistent and Physically Accurate PBR Material Generation

S Zhu, L Qiu, X Gu, Z Zhao, C Xu, Y He, Zhe Li, X Han, Y Yao, X Cao, S Zhu

arXiv preprint arXiv:2412.14148 2024

MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow

Zhe Li, Y He, L Zhong, W Shen, Q Zuo, L Qiu, Z Dong, Laurence T. Yang, W Yuan

arXiv preprint arXiv:2412.09901 2024

Capturing Detail Variations for Lightweight Neural Radiance Fields

Z Wang, Laurence T. Yang, B Ren, J Zhao, Zhe Li, G Zeng

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

General Point Model Pretraining with Autoencoding and Autoregressive

Zhe Li, Zhangyang Gao, Cheng Tan, Bocheng Ren, Laurence T. Yang, Stan Z. Li

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

Top Ten Outstanding Paper Award, HUST

MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning

Zhe Li, Laurence T. Yang, Bocheng Ren, Xin Nie, Zhangyang Gao, Cheng Tan, Stan Z. Li

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

Enhancing Sentence Representation with Visually-supervised Multimodal Pre-training

Zhe Li, Laurence T. Yang, Xin Nie, Bocheng Ren, Xianjun Deng

ACM International Conference on Multimedia (ACM MM) 2023