Cosmos-Drive-Dreams:
Scalable Synthetic Driving Data Generation with World Foundation Models

Cosmos-Drive-Dreams is a synthetic data generation (SDG) pipeline designed to produce challenging scenarios that enhance downstream tasks for autonomous vehicles.

NVIDIA

* denotes equal contribution

Only the core contributors are listed here. See the full list of contributors in the Contributors section below.

Read Paper Models & Code & Toolkit Dataset

Collecting and annotating real-world data for safety-critical physical AI systems, such as Autonomous Vehicle (AV), is time-consuming and costly. It is especially challenging to capture rare edge cases, which play a critical role in training and testing of an AV system. To address this challenge, we introduce the Cosmos-Drive-Dreams - a synthetic data generation (SDG) pipeline that aims to generate challenging scenarios to facilitate downstream tasks such as perception and driving policy training. Powering this pipeline is Cosmos-Drive, a suite of models specialized from NVIDIA Cosmos-1 world foundation model for the driving domain and are capable of controllable, high-fidelity, multi-view, and spatiotemporally consistent driving video generation. We showcase the utility of these models by applying Cosmos-Drive-Dreams to scale the quantity and diversity of driving datasets with high-fidelity and challenging scenarios. Experimentally, we demonstrate that our generated data helps in mitigating long-tail distribution problems and enhances generalization in downstream tasks such as 3D lane detection, 3D object detection and driving policy learning. We open source our pipeline toolkit, dataset and model weights through the NVIDIA's Cosmos platform.

Explore Generated Videos by Cosmos-Drive

Diverse Generation with Precise Map Control

Click "HDMap Condition Input" to view the HDMap input. Click the other options to view the corresponding rendered videos.

Note: Condition input MAP can range from simple lane boundaries and traffic signs to more complex HDMap representations.

HDMap Condition Input
Cyberpunk
Jungle
New York
On Fire
Sakura
Simpsons

Multi-View Expansion

The middle video in the first row is the originally generated single-view output. The surrounding videos are generated by expanding this original view into multi-view perspectives.

In-the-wild Video Annotation

Our annotation model automatically predicts HDMap and depth information from in-the-wild driving videos.

Note: We take dashcam videos from the public Nexar dataset to showcase that we can also generate variations for videos in the wild.

LiDAR Generation

Left: Conditional Map Input Visualization
Right: Generated LiDAR point cloud

The generated LiDAR point cloud is rendered by overlaying it onto the RGB input frames for visualization.

Cosmos-Drive-Dreams: Synthetic Data Generation (SDG) Pipeline for Autonomous Vehicle Tasks

Our synthetic dataset enhances performance on various downstream autonomous driving tasks by providing diverse and challenging scenarios.

Overview of our Cosmos-Drive-Dreams pipeline. Starting from either structured labels or in-the-wild video, we generated pixel-aligned HDMap condition video (Step ①). Then we leverages a prompt rewriter to generate diverse prompts and synthesize single-view videos (Step ②). Each single-view video is then expanded into multiple views (Step ③). Finally, a Vision-Language Model (VLM) filter performs rejection sampling to automatically discard low-quality samples, yielding a high-quality, diverse SDG dataset (Step ④).

Dataset	Waymo Open Dataset						RDS-HQ (2k)
	All		Extreme Weather		Night		All		Rainy		Foggy		Night
	F1	Cate. Acc.	F1	Cate. Acc.	F1	Cate. Acc.	F1	Cate. Acc.	F1	Cate. Acc.	F1	Cate. Acc.	F1	Cate. Acc.
w/o SDG	0.428	0.847	0.378	0.858	0.402	0.842	0.532	0.852	0.458	0.821	0.524	0.844	0.547	0.867
w/ SDG	0.451	0.855	0.404	0.875	0.455	0.878	0.566	0.871	0.506	0.851	0.572	0.867	0.581	0.885

3D lane detection results with Cosmos-Drive-Dreams on Waymo open dataset and internal RDS-HQ (2k) dataset. Our pipeline significantly improves the 3D lane detection results over baseline. "Cate. Acc." means category accuracy.

Cosmos-Drive-Dreams improves F-score across varying amounts of real-world training data on 3D Lane Detection task. SDG clips are mixed with real clips using ratio of R_s2r = 0.5. Left: Results on testing dataset. Under all weather conditions, SDG consistently improves detection performance across varying amounts of real-world training data, with the most significant gain (+6.0%) observed in the low-data regime (2k clips). Right: Results on extreme weather subset of testing dataset. In more challenging settings (Rainy and Foggy), the benefits of SDG are even more pronounced—showing gains of up to +9.4% under foggy conditions with only 2k real clips. This highlights SDG's effectiveness in enhancing model robustness, particularly under adverse or underrepresented conditions.

Dataset	RDS-HQ (2k)				RDS-HQ (20k)
Dataset	All	Rainy	Foggy	Night	All	Rainy	Foggy	Night
w/o SDG	0.299	0.280	0.265	0.285	0.459	0.432	0.418	0.448
w/ SDG	0.337	0.323	0.308	0.321	0.489	0.468	0.442	0.478

3D object detection performance with Cosmos-Drive-Dreams. When applied to augment training set, Cosmos-Drive-Dreams improves the detection performance in general and extreme weather conditions.

Dataset	RDS-HQ (1k)				RDS-HQ (2k)
Dataset	mAP	Car	Bus	Truck	mAP	Car	Bus	Truck
w/o SDG	0.240	0.371	0.155	0.195	0.289	0.402	0.225	0.240
w/ SDG	0.250	0.366	0.181	0.203	0.297	0.399	0.248	0.246

LiDAR-based 3D object detection results with Cosmos-Drive-Dreams. Cosmos-Drive-Dreams improves the overall detection performance across different vehicle categories and dataset sizes.

Policy learning results with Cosmos-Drive-Dreams. Left: Given an amount of real world clips, adding SDG data improves trajectory prediction accuracy (minADE on RDS-Bench[Policy], lower is better). Center: Less real-world data is needed to reach a target minADE. Right: Adding a small amount of targeted SDG data can improve performance for certain corner cases (RDS-Bench[VRU/left], without hurting overall driving performance.

Cosmos-Drive-Dreams: Toolkits

Explore our comprehensive toolkits for working with Cosmos-Drive models, including dataset conversion tools, rendering scripts, and sample utilities.

Demonstration of the Cosmos-Drive-Dreams Toolkits on Interactive 3D Trajectory Editing.

View Toolkits on GitHub

Rendering Scripts

Generate HD map visualizations and LiDAR point cloud renderings with configurable camera models and intrinsics.

Sample Utilities

Access tools for prompt modification, trajectory generation, and environment transformation for diverse scenario creation.

WebUI based 3D Trajectory Editing Tool

Web-based interactive interface for editing 3D trajectories with intuitive controls for scenario customization.

Cosmos-Drive-Dreams:Scalable Synthetic Driving Data Generation with World Foundation Models