Yuchen Fan*, Kaiyan Zhang*, Heng Zhou*, Yuxin Zuo, Yanxu Chen, Yu Fu, Xinwei Long, Xuekai Zhu, Che Jiang, Yuchen Zhang, Li Kang, Gang Chen, Cheng Huang, Zhizhou He, Bingning Wang, Lei Bai,#, Ning Ding,#, Bowen Zhou,# (* equal contribution, # corresponding author)
Under review.
TL;DR: LLMs can act as efficient internal search simulators, and with Self-Search RL (SSRL) they reduce reliance on external search engines while enabling more scalable and robust RL agent training.
Yuchen Fan*, Kaiyan Zhang*, Heng Zhou*, Yuxin Zuo, Yanxu Chen, Yu Fu, Xinwei Long, Xuekai Zhu, Che Jiang, Yuchen Zhang, Li Kang, Gang Chen, Cheng Huang, Zhizhou He, Bingning Wang, Lei Bai,#, Ning Ding,#, Bowen Zhou,# (* equal contribution, # corresponding author)
Under review.
TL;DR: LLMs can act as efficient internal search simulators, and with Self-Search RL (SSRL) they reduce reliance on external search engines while enabling more scalable and robust RL agent training.
Li Kang*, Xiufeng Song*, Heng Zhou*, Yiran Qin#, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai#, Zhenfei Yin# (* equal contribution, # corresponding author)
Under review.
TL;DR: Boost VLM’s visual reasoning with GRPO to orchestrate collaboration among heterogeneous embodied robots.
Li Kang*, Xiufeng Song*, Heng Zhou*, Yiran Qin#, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai#, Zhenfei Yin# (* equal contribution, # corresponding author)
Under review.
TL;DR: Boost VLM’s visual reasoning with GRPO to orchestrate collaboration among heterogeneous embodied robots.
Yiran Qin*, Li Kang*, Xiufeng Song*, Zhenfei Yin#, Xiaohong Liu, Xihui Liu, Ruimao Zhang#, Lei Bai# (* equal contribution, # corresponding author)
ICCV 2025 & Best Paper Award at CVPR 2025 MEIS Workshop
TL;DR: Using compositional constraints to coordinate multiple robotic arms for complex—and surprisingly fun—manipulation tasks!
Yiran Qin*, Li Kang*, Xiufeng Song*, Zhenfei Yin#, Xiaohong Liu, Xihui Liu, Ruimao Zhang#, Lei Bai# (* equal contribution, # corresponding author)
ICCV 2025 & Best Paper Award at CVPR 2025 MEIS Workshop
TL;DR: Using compositional constraints to coordinate multiple robotic arms for complex—and surprisingly fun—manipulation tasks!
Heng Zhou*, Hejia Geng*, Xiangyuan Xue, Li Kang, Yiran Qin, Zhiyong Wang, Zhenfei Yin#, Lei Bai# (* equal contribution, # corresponding author)
EMNLP 2025 Main Conference
TL;DR: ReSo, a reward-driven, self-organizing multi-agent system that enables efficient collaboration through dynamic optimization and agent selection.
Heng Zhou*, Hejia Geng*, Xiangyuan Xue, Li Kang, Yiran Qin, Zhiyong Wang, Zhenfei Yin#, Lei Bai# (* equal contribution, # corresponding author)
EMNLP 2025 Main Conference
TL;DR: ReSo, a reward-driven, self-organizing multi-agent system that enables efficient collaboration through dynamic optimization and agent selection.