Doe-1 is the first closed-loop autonomous driving model for unified perception, prediction, and planning.
Doe-1 is a unified model to accomplish visual-question answering, future prediction, and motion planning.
We formulate autonomous driving as a unified next-token generation problem and use observation, description, and action tokens to represent each scene. Without additional fine-tuning, Doe-1 accomplishes various tasks by using different input prompts, including visual question-answering, controlled image generation, and end-to-end motion planning.
We explore a new closed-loop autonomous driving paradigm which combines end-to-end model and world model to construct a closed loop.
Doe-1 achieves closed-loop end-to-end autonomous driving for the first time.
Doe-1 produces accurate language descriptions and answers questions about the scene.
Doe-1 generates high-quality videos in consistent with the 3D structures and action conditions.
Doe-1 demonstrates competitive planning performance with existing methods using only question-answering pairs as the auxiliary supervision.
@article{doe, title={Doe-1: Closed-Loop Autonomous Driving with Large World Model}, author={Zheng, Wenzhao and Xia, Zetian and Huang, Yuanhui and Zuo, Sicheng and Zhou, Jie and Lu, Jiwen}, journal={arXiv preprint arXiv:2412.09627}, year={2024} }