Tessellating the Earth (TTE)

Learnable Spherical Voronoi Partitions for Location Encoding

Daniel Cher · Hamza Iqbal · Eric Xing · Brian Wei · Nathan Jacobs
Washington University in St. Louis · MVRL
ECCV 2026
📄 Paper Coming Soon 📃 arXiv Coming Soon Code Coming Soon 🤗 Models Coming Soon

Explore the learned tessellation

TTE places learnable sites on the sphere which migrate during training toward visually discriminative regions. Press play to watch the partition form; drag to spin the globe.

Site trajectories
Location field (ICA)
loading…

Abstract

TL;DR: TTE is a location encoder that uses a learnable Spherical Voronoi partition to concentrate representational capacity where it is needed, and global semantic tokens to bridge local spatial structure and global semantic understanding, setting a new state of the art for location encoders.

Geolocation encoders map geographic coordinates to learned representations that capture visual and non-visual characteristics from a latitude–longitude pair alone. Existing approaches project coordinates onto fixed bases (e.g., spherical harmonics), allocating representational capacity uniformly across the globe, devoting equal resources to the open ocean and to a developing city.

We introduce Tessellating the Earth (TTE), a location encoder built from learnable Spherical Voronoi partitions that concentrates representational capacity where it is needed, in a fully differentiable, end-to-end manner. Each Voronoi site carries its own embedding and migrates during training toward discriminative areas. To bridge local spatial structure and global semantic understanding, we introduce global semantic tokens: shared learnable concept tokens that distill semantic knowledge from satellite imagery into a compact vocabulary the location encoder can reference at inference, letting geographically distant sites covering similar environments share semantics.

Results

TTE sets a new SOTA for location encoders on a variety of geospatial benchmarks.

Geospatial benchmark results iNaturalist-2018 geographic prior results

Method

A coordinate is mapped onto the sphere and soft-assigned to the learnable Voronoi sites. The resulting embedding attends over a shared set of global semantic tokens. A frozen ViT encodes the co-located satellite image and supervises the tokens during training.

TTE architecture overview
Architecture. Location pathway: coordinate → S² → soft Voronoi assignment → semantic-token attention → location embedding. Image pathway: frozen ViT on the co-located image, entering the contrastive objective and supervising the tokens (dashed).
Global semantic tokens schematic
Global semantic tokens. A shared vocabulary of learnable concept tokens that each site attends over, factoring local spatial structure from global semantic alignment so distant sites covering similar environments share semantics.

Learned semantic concepts

Shared, learnable concept tokens distill the pretrained image encoder's semantic knowledge into a compact vocabulary the location encoder can reference at inference, so distant sites covering similar environments share representational capacity.

Global semantic token attention maps
Global semantic tokens learn coherent visual concepts. For each token: the Sentinel-2 imagery it most attends to (top) and its attention map across the globe (bottom).

BibTeX

@inproceedings{cher2026tte,
  title     = {Tessellating the Earth: Learnable Spherical Voronoi
               Partitions for Location Encoding},
  author    = {Cher, Daniel and Iqbal, Hamza and Xing, Eric and Wei, Brian and Jacobs, Nathan},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026}
}