Hi! I'm a PhD student at AIR, Tsinghua University, studying embodied AI. I am fortunately advised by Prof. Xianyuan Zhan and Prof. Ya-Qin Zhang. I'm driven by the challenge of building truly embodied intelligence and am committed to pushing the boundaries of this field. I'm still on the road!
I am open to collaboration, feel free to reach me out!
Github Twitter / Google Scholar / zhengjl23@mails.tsinghua.edu.cn
News
- ๐Our X-VLA has won 1st place in the AGIBOT World Challenge (Manipulation track) @ IROS 2025.
- One paper (UniAct) on cross-embodiment universal actions is accepted to CVPR 2025.
- ๐Diffusion-Planner is selected as oral presentation at ICLR 2025.
- One papers on autonomous driving (Diffusion-Planner) are accepted to ICLR 2025.
- One paper (Robo-MUTUAL) on embodied representations is accepted to ICRA 2025.
- ๐IVM and DecisionNCE are selected as Outstanding Paper at MFM-EAI workshop @ ICML 2024.
- One paper (IVM) on embodied foundation multimodal models is accepted to NeurIPS 2024.
- One paper (DecisionNCE) on embodied multimodal representations is accepted to ICML 2024.
- ๐One paper (GLID) on unified vision pretraining is accepted to CVPR 2024 .
Publications (* marks equal contribution)
- X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model (1st place ๐ @ AGIBOT World Challenge (Manipulation track), IROS 2025) 2025 Paper | Code | Page
- Universal Actions for Enhanced Embodied Foundation Models CVPR 2025 2025 Paper | Code | Page
- Instruction Guided Visual Masking NeurIPS 2024 (Outstanding Paper @ ICML 2024 MFM-EAI Workshop) 2024 Paper | Code | Page | Dataset | Model
- DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning ICML 2024 (Outstanding Paper @ ICML 2024 MFM-EAI Workshop) 2024 Paper | Code | Page
- GLID: Pre-training a Generalist Encoder-Decoder Vision Model CVPR 2024 2024 Paper |
- Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning ICRA 2025 2025 Paper | Code | Page
- Universal Actions for Enhanced Embodied Foundation Models CVPR 2025 2025 Paper | Code | Page
- Instruction Guided Visual Masking NeurIPS 2024 (Outstanding Paper @ ICML 2024 MFM-EAI Workshop) 2024 Paper | Code | Page | Dataset | Model
- DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning ICML 2024 (Outstanding Paper @ ICML 2024 MFM-EAI Workshop) 2024 Paper | Code | Page
- Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning ICRA 2025 2025 Paper | Code | Page
- GLID: Pre-training a Generalist Encoder-Decoder Vision Model CVPR 2024 2024 Paper |
- Diffusion-Based Planning for Autonomous Driving with Flexible Guidance ICLR 2025 (Oral, Top 2%) 2025 Paper | Code | Page
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment Under Review 2024
- Efficient Robotic Policy Learning via Latent Space Backward Planning Under Review 2025
- GoBigger: A Scalable Platform for Cooperative-Competitive Multi-Agent Interactive Simulation ICLR 2023 2023
- MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers CVPR 2023 2023
Professional Services
Reviewer for ICLR 25, ICML 25, NeurIPS 24-25, CVPR 25, ICCV 25