RadarGen: Automotive Radar Point Cloud Generation from Cameras

1Technion 2MIT 3NVIDIA 4University of Toronto 5Vector Institute
Teaser

TL;DR: RadarGen generates sparse radar point clouds from multi-view camera images.

Given multi-view camera images, RadarGen generates radar point clouds that align with real-world statistics and can be consumed by downstream perception models. The generated point clouds preserve scene geometry and handle occlusions. For example, modifying the input scene with an off-the-shelf image editing tool (e.g., replacing a distant car with a closer truck) updates the radar response, removing returns from newly occluded regions and reflecting the new object geometry.

Abstract

We present RadarGen, a diffusion model for synthesizing realistic automotive radar point clouds from multi-view camera imagery. RadarGen adapts efficient image-latent diffusion to the radar domain by representing radar measurements in bird’s-eye-view form that encodes spatial structure together with radar cross section (RCS) and Doppler attributes. A lightweight recovery step reconstructs point clouds from the generated maps. To better align generation with the visual scene, RadarGen incorporates BEV-aligned depth, semantic, and motion cues extracted from pretrained foundation models, which guide the stochastic generation process toward physically plausible radar patterns. Conditioning on images makes the approach broadly compatible, in principle, with existing visual datasets and simulation frameworks, offering a scalable direction for multimodal generative simulation. Evaluations on large-scale driving data show that RadarGen captures characteristic radar measurement distributions and reduces the gap to perception models trained on real data, marking a step toward unified generative simulation across sensing modalities.

Method

1. Representing radar as images

View Transformation

We begin by transforming radar point clouds into a grid-based representation suitable for training image-based diffusion models. We rasterize points into bird’s-eye-view (BEV) maps: a Point Density Map created via Gaussian kernels, and RCS/Doppler maps formed using Voronoi tessellation.

2. BEV Scene Conditioning

BEV Scene Conditioning

To bridge the gap between cameras and radar, we convert multi-view images into an aligned BEV representation. By leveraging foundation models for metric depth, semantic segmentation, and optical flow, we project visual cues into BEV, providing the generative model with semantic and geometric context.

3. Conditional Radar Maps Denoising

Diffusion Process

At the core of RadarGen is an efficient latent diffusion model (based on SANA) that learns to synthesize these radar maps. Guided by the aligned visual cues, it iteratively denoises a random latent maps to produce a realistic distribution of radar returns.

4. Recovering Radar PCL

Point Recovery

Finally, we reconstruct the sparse point cloud by converting the generated dense maps back into a discrete format. An IRL1 Solver deconvolves the density map to find sparse point locations, while RCS and Doppler attributes are sampled from their respective maps at these coordinates.

Videos

Sound On Recommended

This narrated video showcases various scenarios generated by RadarGen, including handling occlusions, dynamic objects, and challenging weather conditions like rain.

Quantitative Evaluation
Method Entire Area
CD Loc.
(↓)
CD Full
(↓)
IoU@1m
(↑)
DA Rec.
(↑)
DA Prec.
(↑)
DA F1
(↑)
MMD Loc.
(↓)
MMD RCS
(↓)
MMD Dopp.
(↓)
Baseline 1.84 ± 0.48 0.038 ± 0.009 0.23 ± 0.10 0.15 ± 0.10 0.14 ± 0.10 0.14 ± 0.09 0.368 ± 0.151 0.36 ± 0.25 0.65 ± 0.64
RadarGen 1.68 ± 0.39 0.040 ± 0.008 0.31 ± 0.11 0.23 ± 0.12 0.26 ± 0.12 0.24 ± 0.12 0.056 ± 0.062 0.09 ± 0.15 0.31 ± 0.74
Method Foreground
CD Loc.
(↓)
CD Full
(↓)
Dens. Sim.
(↑)
Hit Rate
(↑)
MMD Car (↓) MMD Truck (↓) MMD Trailer (↓)
Loc. RCS Dopp. Loc. RCS Dopp. Loc. RCS Dopp.
Baseline 1.32 ± 0.79 0.075 ± 0.049 0.35 ± 0.43 0.37 0.035 0.753 0.549 0.167 0.202 0.485 0.0459 0.064 0.607
RadarGen 0.95 ± 0.65 0.069 ± 0.049 0.51 ± 0.41 0.66 0.037 0.006 0.014 0.024 0.031 0.060 0.0069 0.022 0.046

RadarGen broadly outperforms the baseline on geometric fidelity (CD, IoU, Density Similarity, Hit Rate), radar attribute fidelity (DA Recall, Precision, F1), and distribution similarity (MMD).

Qualitative Comparison
Qualitative Comparison: Input vs Baseline vs Ours vs Real

RadarGen's generated point clouds closely match the ground truth in shape, distribution, and count, demonstrating a significant advantage over the baseline. RadarGen uses inputs t and t + ∆t, while the baseline uses only t. Ground truth bounding boxes are highlighted in color.

Scene Editing
Controllable Generation: Edited Input to New Radar

Modifying the input images using an off-the-shelf image editing tool updates the radar response, demonstrating object removal (left) and insertion (right).

BibTeX
@article{borreda2025radargen,
      title={RadarGen: Automotive Radar Point Cloud Generation from Cameras}, 
      author={Borreda, Tomer and Ding, Fangqiang and Fidler, Sanja and Huang, Shengyu and Litany, Or},
      journal={arXiv preprint arXiv:2512.17897},
      year={2025}
}