$\tau_0$-WM: A Unified Video-Action World Model for Robotic Manipulation
Published in arXiv, 2026
Recommended citation: Zhou, P., Chen, S., Chen, D., Wang, J., Jin, R., Zhu, B., Pan, Y., Gu, S., Wang, K., Nan, S., Qiu, X., Qiu, C., Yang, P., Cai, Y., Gao, J., Li, Y., Fu, Y., Yue, X., Chen, Z., & Luo, J. (2026). $\tau_0$-WM: A Unified Video-Action World Model for Robotic Manipulation. arXiv preprint arXiv:2606.01027. https://arxiv.org/abs/2606.01027
Robotic manipulation requires models that generate executable actions while anticipating and evaluating their future consequences. This work presents a unified video-action world model for robotic manipulation, combining policy learning, video prediction, and action evaluation within a future-predictive framework trained on large-scale real-robot and human-interaction data.
