Publications
A collection of my research work.
RoboForge: Physically Optimized Text-guided Whole-Body Locomotion for Humanoids
X Yuan, Zhe Li, B Lyu, K Zuo, Y Lu, G Li, J Yang
arXiv preprint arXiv:2603.17927 2026
Mosaic: Bridging the Sim-to-Real Gap in Generalist Humanoid Motion Tracking and Teleoperation with Rapid Residual Adaptation
Z Sun, BS Huang, Y Peng, X Li, J Ma, Y Sun, Zhe Li, H Jiang, B Gao, Z Bing
arXiv preprint arXiv:2602.08594 2026
Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models
S Bai, J Lyu, W Zhou, Zhe Li, D Wang, L Xing, X Zhao, P Wang, Z Wang
arXiv preprint arXiv:2602.01166 2026
RoboBrain 2.5: Depth in Sight, Time in Mind
H Tan, E Zhou, Zhe Li, Y Xu, Y Ji, X Chen, C Chi, P Wang, H Jia, Y Ao
arXiv preprint arXiv:2601.14352 2026
Do You Have Freestyle? Expressive Humanoid Locomotion via Audio Control
Zhe Li, Cheng Chi, Yangyang Wei, Boan Zhu, Tao Huang, Zhenguo Sun, Yibo Peng, Pengwei Wang, Zhongyuan Wang, Fangzhou Liu, Chang Xu, Shanghang Zhang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026
Highlight
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
Zhe Li, Yangyang Wei, Boan Zhu, Yibo Peng, Tao Huang, Pengwei Wang, Zhongyuan Wang, Cheng Chi, Shanghang Zhang, Chang Xu
International Conference on Learning Representations (ICLR) 2026
RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion
Zhe Li, C Chi, B Zhu, Y Wei, S Bai, Y Ji, Y Peng, T Huang, P Wang, Z Wang
arXiv preprint arXiv:2512.23649 2025
Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives
S Bai, W Song, J Chen, Y Ji, Z Zhong, J Yang, H Zhao, W Zhou, Zhe Li
arXiv preprint arXiv:2512.22983 2025
OmniMotion: Multimodal Motion Generation with Continuous Masked Autoregression
Zhe Li, Weihao Yuan, Weichao Shen, Siyu Zhu, Zilong Dong, Chang Xu
arXiv preprint arXiv:2510.14954 2025
ANIgs: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction
L Qiu, S Zhu, Q Zuo, X Gu, Y Dong, J Zhang, C Xu, Zhe Li, W Yuan, L Bo
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
Zhe Li, Weihao Yuan, Yisheng He, Lingteng Qiu, Shenhao Zhu, X Gu, Weichao Shen, Y Dong, Zilong Dong
International Conference on Learning Representations (ICLR) 2025
Interpretable Multimodal Tucker Fusion Model with Information Filtering for Multimodal Sentiment Analysis
X Nie, Laurence T. Yang, Zhe Li, X Deng, F Fan, Z Yang
IEEE Transactions on Computational Social Systems 2024
MCMat: Multiview-consistent and Physically Accurate PBR Material Generation
S Zhu, L Qiu, X Gu, Z Zhao, C Xu, Y He, Zhe Li, X Han, Y Yao, X Cao, S Zhu
arXiv preprint arXiv:2412.14148 2024
MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow
Zhe Li, Y He, L Zhong, W Shen, Q Zuo, L Qiu, Z Dong, Laurence T. Yang, W Yuan
arXiv preprint arXiv:2412.09901 2024
Capturing Detail Variations for Lightweight Neural Radiance Fields
Z Wang, Laurence T. Yang, B Ren, J Zhao, Zhe Li, G Zeng
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
General Point Model Pretraining with Autoencoding and Autoregressive
Zhe Li, Zhangyang Gao, Cheng Tan, Bocheng Ren, Laurence T. Yang, Stan Z. Li
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024
Top Ten Outstanding Paper Award, HUST
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Zhe Li, Laurence T. Yang, Bocheng Ren, Xin Nie, Zhangyang Gao, Cheng Tan, Stan Z. Li
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024
Enhancing Sentence Representation with Visually-supervised Multimodal Pre-training
Zhe Li, Laurence T. Yang, Xin Nie, Bocheng Ren, Xianjun Deng
ACM International Conference on Multimedia (ACM MM) 2023