Mixed-View Panorama Synthesis using Geospatially Guided Diffusion

1Washington University in St. Louis, 2University of Nebraska Omaha, 3DZYNE Technologies
Transactions on Machine Learning Research(TMLR), 2025
Mixed-view panorama synthesis framework

We propose a new task, mixed-view panorama synthesis, in which a satellite image and a set of nearby panoramas (blue, yellow, and green) are used to render a panorama at a novel location (red). Our approach uses diffusion-based modeling and attention to enable flexible, multimodal control.

Abstract

We introduce the task of mixed-view panorama synthesis, where the goal is to synthesize a novel panorama given a small set of input panoramas and a satellite image of the area. This contrasts with previous work which only uses input panoramas (same-view synthesis), or an input satellite image (cross-view synthesis). We argue that the mixed-view setting is the most natural to support panorama synthesis for arbitrary locations worldwide. A critical challenge is that the spatial coverage of panoramas is uneven, with few panoramas available in many regions of the world. We introduce an approach that utilizes diffusion-based modeling and an attention-based architecture for extracting information from all available input imagery. Experimental results demonstrate the effectiveness of our proposed method. In particular, our model can handle scenarios when the available panoramas are sparse or far from the location of the panorama we are attempting to synthesize.

Methods

Method

Our mixed-view panorama synthesis approach consists of three main components: (1) A satellite image encoder that extracts global spatial features, (2) A panorama encoder that processes nearby ground-level views, and (3) A diffusion-based synthesis module that generates novel panoramas by combining both sources of information.

Overall framework

Overview of our mixed-view panorama synthesis framework. The model takes as input a satellite image and a set of nearby panoramas, and generates a novel panorama at the target location using diffusion-based synthesis.

Local attention details

Visualization of local geospatial attention. The target location is represented by a green square in the satellite image. The nearby street-level panoramas (color-coded borders) are represented by same-colored circles in the satellite image.

Global attention mechanism

Visualization of global geospatial attention. The color-coded attention maps for two target locations are shown, corresponding to the same-colored dots in the satellite image. Darker colors represent more salient regions.

Experimental Results

We evaluate our approach on diverse locations worldwide, demonstrating its ability to synthesize realistic panoramas even when input panoramas are sparse or distant from the target location. Our method significantly outperforms baseline approaches that use only satellite imagery or only ground-level panoramas.

Comparison with Baselines

Satellite Input

Satellite Input

Pix2Pix

Pix2Pix

PanoGAN

PanoGAN

Sat2Density

Sat2Density

Ours

Ours

Ground Truth

Ground Truth

Comparison with baseline methods. The cross-view synthesis methods that we compare with are trained on our collected center-aligned satellite images. Our approach, which integrates nearby street-level panoramas, not only generates more realistic results when compared to baselines, but more accurate results both semantically and geometrically when compared to the ground truth.

Experimental Results

We evaluate our approach on diverse locations worldwide, demonstrating its ability to synthesize realistic panoramas even when input panoramas are sparse or distant from the target location. Our method significantly outperforms baseline approaches that use only satellite imagery or only ground-level panoramas.

Ablation Study

Ablation study results

Ablation study showing the effectiveness of different components in our mixed-view panorama synthesis approach. Our method effectively combines information from both satellite imagery and ground-level panoramas to generate realistic novel views.

BibTeX

@article{xiong2024mixed,
  title={Mixed-View Panorama Synthesis using Geospatially Guided Diffusion},
  author={Xiong, Zhexiao and Xing, Xin and Workman, Scott and Khanal, Subash and Jacobs, Nathan},
  journal={arXiv preprint arXiv:2407.09672},
  year={2024}
}