Doe-1: Closed-Loop Autonomous Driving with Large World Model

Tsinghua University
* Equal contributions. † Project leader.

Doe-1 is the first closed-loop autonomous driving model for unified perception, prediction, and planning.

Demo: Doe-1 for Closed-Loop Autonomous Driving

Doe-1 is a unified model to accomplish visual-question answering, future prediction, and motion planning.

Overview

We formulate autonomous driving as a unified next-token generation problem and use observation, description, and action tokens to represent each scene. Without additional fine-tuning, Doe-1 accomplishes various tasks by using different input prompts, including visual question-answering, controlled image generation, and end-to-end motion planning.

Closed-Loop Autonomou Driving

We explore a new closed-loop autonomous driving paradigm which combines end-to-end model and world model to construct a closed loop.

Results

Closed-Loop Autonomous Driving

Doe-1 achieves closed-loop end-to-end autonomous driving for the first time.

Visual Question-Answering

Doe-1 produces accurate language descriptions and answers questions about the scene.

Action-Conditioned Video Generation

Doe-1 generates high-quality videos in consistent with the 3D structures and action conditions.

End-to-End Motion Planning

Doe-1 demonstrates competitive planning performance with existing methods using only question-answering pairs as the auxiliary supervision.


Citation

Bibtex

              @article{doe,
                title={Doe-1: Closed-Loop Autonomous Driving with Large World Model},
                author={Zheng, Wenzhao and Xia, Zetian and Huang, Yuanhui and Zuo, Sicheng and Zhou, Jie and Lu, Jiwen},
                journal={arXiv preprint arXiv:2412.09627},
                year={2024}
            }