NVIDIA Logo

Cosmos-Drive-Dreams:
Scalable Synthetic Driving Data Generation with World Foundation Models

Cosmos-Drive-Dreams is a synthetic data generation (SDG) pipeline designed to produce challenging scenarios that enhance downstream tasks for autonomous vehicles.

NVIDIA
* denotes equal contribution
Only the core contributors are listed here. See the full list of contributors in the Contributors section below.
Read Paper Models & Code & Toolkit Dataset

Collecting and annotating real-world data for safety-critical physical AI systems, such as Autonomous Vehicle (AV), is time-consuming and costly. It is especially challenging to capture rare edge cases, which play a critical role in training and testing of an AV system. To address this challenge, we introduce the Cosmos-Drive-Dreams - a synthetic data generation (SDG) pipeline that aims to generate challenging scenarios to facilitate downstream tasks such as perception and driving policy training. Powering this pipeline is Cosmos-Drive, a suite of models specialized from NVIDIA Cosmos-1 world foundation model for the driving domain and are capable of controllable, high-fidelity, multi-view, and spatiotemporally consistent driving video generation. We showcase the utility of these models by applying Cosmos-Drive-Dreams to scale the quantity and diversity of driving datasets with high-fidelity and challenging scenarios. Experimentally, we demonstrate that our generated data helps in mitigating long-tail distribution problems and enhances generalization in downstream tasks such as 3D lane detection, 3D object detection and driving policy learning. We open source our pipeline toolkit, dataset and model weights through the NVIDIA's Cosmos platform.




Explore Generated Videos by Cosmos-Drive


Diverse Generation with Precise Map Control

Click "HDMap Condition Input" to view the HDMap input. Click the other options to view the corresponding rendered videos.

Note: Condition input MAP can range from simple lane boundaries and traffic signs to more complex HDMap representations.

Multi-View Expansion

The middle video in the first row is the originally generated single-view output. The surrounding videos are generated by expanding this original view into multi-view perspectives.

In-the-wild Video Annotation

Our annotation model automatically predicts HDMap and depth information from in-the-wild driving videos.

Note: We take dashcam videos from the public Nexar dataset to showcase that we can also generate variations for videos in the wild.

LiDAR Generation

Left: Conditional Map Input Visualization
Right: Generated LiDAR point cloud

The generated LiDAR point cloud is rendered by overlaying it onto the RGB input frames for visualization.

Cosmos-Drive-Dreams: Synthetic Data Generation (SDG) Pipeline for Autonomous Vehicle Tasks


Our synthetic dataset enhances performance on various downstream autonomous driving tasks by providing diverse and challenging scenarios.

Pipeline diagram

Overview of our Cosmos-Drive-Dreams pipeline. Starting from either structured labels or in-the-wild video, we generated pixel-aligned HDMap condition video (Step ①). Then we leverages a prompt rewriter to generate diverse prompts and synthesize single-view videos (Step ②). Each single-view video is then expanded into multiple views (Step ③). Finally, a Vision-Language Model (VLM) filter performs rejection sampling to automatically discard low-quality samples, yielding a high-quality, diverse SDG dataset (Step ④).

Dataset Waymo Open Dataset RDS-HQ (2k)
All Extreme Weather Night All Rainy Foggy Night
F1 Cate.
Acc.
F1 Cate.
Acc.
F1 Cate.
Acc.
F1 Cate.
Acc.
F1 Cate.
Acc.
F1 Cate.
Acc.
F1 Cate.
Acc.
w/o SDG 0.428 0.847 0.378 0.858 0.402 0.842 0.532 0.852 0.458 0.821 0.524 0.844 0.547 0.867
w/ SDG 0.451 0.855 0.404 0.875 0.455 0.878 0.566 0.871 0.506 0.851 0.572 0.867 0.581 0.885

3D lane detection results with Cosmos-Drive-Dreams on Waymo open dataset and internal RDS-HQ (2k) dataset. Our pipeline significantly improves the 3D lane detection results over baseline. "Cate. Acc." means category accuracy.

Road All Results
Road Corner Results

Cosmos-Drive-Dreams improves F-score across varying amounts of real-world training data on 3D Lane Detection task. SDG clips are mixed with real clips using ratio of Rs2r = 0.5. Left: Results on testing dataset. Under all weather conditions, SDG consistently improves detection performance across varying amounts of real-world training data, with the most significant gain (+6.0%) observed in the low-data regime (2k clips). Right: Results on extreme weather subset of testing dataset. In more challenging settings (Rainy and Foggy), the benefits of SDG are even more pronounced—showing gains of up to +9.4% under foggy conditions with only 2k real clips. This highlights SDG's effectiveness in enhancing model robustness, particularly under adverse or underrepresented conditions.

Dataset RDS-HQ (2k) RDS-HQ (20k)
All Rainy Foggy Night All Rainy Foggy Night
w/o SDG 0.299 0.280 0.265 0.285 0.459 0.432 0.418 0.448
w/ SDG 0.337 0.323 0.308 0.321 0.489 0.468 0.442 0.478

3D object detection performance with Cosmos-Drive-Dreams. When applied to augment training set, Cosmos-Drive-Dreams improves the detection performance in general and extreme weather conditions.

Dataset RDS-HQ (1k) RDS-HQ (2k)
mAP Car Bus Truck mAP Car Bus Truck
w/o SDG 0.240 0.371 0.155 0.195 0.289 0.402 0.225 0.240
w/ SDG 0.250 0.366 0.181 0.203 0.297 0.399 0.248 0.246

LiDAR-based 3D object detection results with Cosmos-Drive-Dreams. Cosmos-Drive-Dreams improves the overall detection performance across different vehicle categories and dataset sizes.

Policy Learning Curves
Policy Learning Ratio
Policy Learning Corner Cases

Policy learning results with Cosmos-Drive-Dreams. Left: Given an amount of real world clips, adding SDG data improves trajectory prediction accuracy (minADE on RDS-Bench[Policy], lower is better). Center: Less real-world data is needed to reach a target minADE. Right: Adding a small amount of targeted SDG data can improve performance for certain corner cases (RDS-Bench[VRU/left], without hurting overall driving performance.

Cosmos-Drive-Dreams: Toolkits


Explore our comprehensive toolkits for working with Cosmos-Drive models, including dataset conversion tools, rendering scripts, and sample utilities.

Demonstration of the Cosmos-Drive-Dreams Toolkits on Interactive 3D Trajectory Editing.

View Toolkits on GitHub

Rendering Scripts

Generate HD map visualizations and LiDAR point cloud renderings with configurable camera models and intrinsics.

Sample Utilities

Access tools for prompt modification, trajectory generation, and environment transformation for diverse scenario creation.

WebUI based 3D Trajectory Editing Tool

Web-based interactive interface for editing 3D trajectories with intuitive controls for scenario customization.

Contributors

*: Equal Contribution, †: Corresponding Authors

Core Contributors

Model Post-training: Xuanchi Ren*, Tianshi Cao*, Amirmojtaba Sabour*, Tianchang Shen*, Jun Gao

Pipeline Development: Xuanchi Ren, Yifan Lu, Tianshi Cao, Jay Zhangjie Wu

Downstream Evaluation: Yifan Lu*, Ruiyuan Gao*, Tobias Pfaff*, Seung Wook Kim

Toolkit Development: Yifan Lu, Xuanchi Ren, Tianshi Cao

LiDARGen Post-Training: Shengyu Huang, Laura Leal-Taixe

LiDARGen Evaluation: Runjian Chen, Shengyu Huang

Data Curation: Yifan Lu, Xuanchi Ren, Tianchang Shen, Mike Chen

Architectural Design: Sanja Fidler†, Huan Ling†

Contributors

Data Pipeline Support: Yuchong Ye, Zhuohao (Chris) Zhang

Engineering Support: Lyne Tchapmi, Mohammad Harrim, Pooya Jannaty

Solution Architect Support: John Shao, Yu Chen, Summer Xiao

Product Manager: Aditya Mahajan, Matt Cragun