VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics

Daniel Cher*, Brian Wei*, Srikumar Sastry, Nathan Jacobs

Washington University in St. Louis
WACV, 2026

*Equal Contribution

Abstract

We introduce VectorSynth, a satellite image synthesis model conditioned on polygonal geographic annotations with semantic attributes. Unlike prior text- or layout-conditioned models, VectorSynth learns dense cross-modal correspondences that align imagery and semantic vector geometry, enabling fine-grained, spatially grounded edits. VectorSynth supports interactive workflows that mix language prompts with geometry-aware conditioning, allowing rapid what-if simulations, spatial edits, and map-informed content generation.

💾 Data

OSM-Satellite Dataset. We assemble a collection of satellite scenes paired with OSM and Building Footprint polygon annotations spanning diverse urban scenes with both built and natural features.

🚀 Contrastive OSM-Satellite Alignment (COSA)

Semantic distinction. Similarity heatmaps given different text queries highlighting the fine-grained understanding of the polygonal contrastive training.

🎯 Fine Grained Semantic Edits

Each set shows the local caption used for editing across models: (a) GeoSynth, (b) GeoSynth w/ Inpainting and (c) VectorSynth.

Examples of fine-grained semantic edits.

BibTeX

@inproceedings{cher2025vectorsynth,
  title={VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics},
  author={Cher, Daniel and Wei, Brian and Sastry, Srikumar and Jacobs, Nathan},
  booktitle={Winter Conference on Applications of Computer Vision},
  year={2026},
  organization={IEEE/CVF}
}