GeoBrix is a high-performance spatial library for Databricks that delivers the next generation of product-augmenting capabilities — raster, discrete global grids, and vector format I/O — and is built to drive you deeper into Databricks-native GEOMETRY/GEOGRAPHY and ST/H3 functions, not replace them. It is the modern successor to DBLabs Mosaic (now in maintenance).
Full docs: https://databrickslabs.github.io/geobrix/ — this README is the 2-minute tour.
- Lightweight tier — pure Python (+ SQL bindings) on rasterio/pyogrio/shapely, no JAR, no init script, no native GDAL bundle. Runs on Serverless, standard (shared), Lakeflow pipelines, and ARM — where the heavyweight tier can't.
- Heavyweight tier — Scala (Python and SQL bindings) + native GDAL for distributed processing on classic (x86) clusters. Same function names across tiers — switching is a one-line import change.
- RasterX — raster I/O and analytics (gap-filling; the platform has no built-in raster). Both tiers — lightweight
pyrxand heavyweight Scala. - GridX — BNG, Quadbin, and custom grids (pairs with native H3 for global hex). Both tiers — lightweight
pygxand heavyweight Scala. - VectorX — MVT tiles, TIN surfaces, and legacy-geometry migration on top of native ST. Both tiers — lightweight
pyvxand heavyweight Scala.
All SQL functions register with a gbx_ prefix (e.g. gbx_rst_clip, gbx_bng_cellarea, gbx_st_asmvt) so usage is clearly attributable to GeoBrix on classic compute. Python/Scala bindings mirror the names. See benchmarks for light-vs-heavy timings.
GeoBrix supports both current Databricks Runtime LTS releases:
| DBR LTS | Ubuntu | Spark | Python | Scala | Java | GeoBrix |
|---|---|---|---|---|---|---|
| 17.3 LTS | 24.04 | 4.0.0 | 3.12.3 | 2.13.16 | 17 | ✅ Supported |
| 18 LTS | 24.04 | 4.1.0 | 3.12.3 | 2.13.16 | 21 | ✅ Supported |
A single wheel + single JAR runs on both: Scala 2.13.16 matches both runtimes, the JAR is compiled to Java-17 bytecode so it loads on both JVMs, and Spark is a provided dependency.
DBR 19 LTS is coming soon, built on Ubuntu 26.04. The lightweight tier (pure-Python, rasterio's bundled GDAL) will be unaffected; the heavyweight tier's native GDAL/OGR libraries are compiled against the cluster OS, so they will need to be rebuilt for the new base image.
Stage the wheel (a Releases artifact, not on PyPI) in a Unity Catalog Volume, then install the [light] extra:
%pip install "geobrix[light] @ file:///Volumes/<catalog>/<schema>/<volume>/geobrix-<version>-py3-none-any.whl"Use the quoted
geobrix[light] @ file://…form (PEP 508, one argument). Don't put the extra on the path ('/Volumes/…/…whl[light]') — on Serverless,%pipkeeps the surrounding quotes and pip reads[light]as part of the filename, failing with "Expected package name at the start of dependency specifier." The named form installs cleanly on Serverless, standard/shared, and ARM.
from databricks.labs.gbx.ds.register import register # *_gbx readers/writers
from databricks.labs.gbx.pyrx import functions as rx # gbx_rst_* functions
register(spark)
rx.register(spark) # optional — only to call the gbx_rst_* SQL functions
# Read a GeoTIFF and compute with RasterX
rasters = spark.read.format("gtiff_gbx").load("/Volumes/<catalog>/<schema>/<volume>/*.tif")
rasters.select(rx.rst_width("tile"), rx.rst_srid("tile")).show()
# Vector read -> write (round-trips with the matching reader)
boroughs = spark.read.format("geojson_gbx").load("/Volumes/.../boroughs.geojson")
boroughs.write.format("geojson_gbx").mode("overwrite").save("/Volumes/.../out.geojson")Heavyweight is the same code with from databricks.labs.gbx.rasterx import functions as rx, plus the JAR and a GDAL init script — see Installing & Choosing a Tier.
Lightweight formats use the *_gbx suffix; heavyweight use *_ogr (vector) / gdal (raster). Light and heavy emit the same schema, so they are drop-in swaps. Full options and examples: Readers · Writers.
Raster & tiles
| Format | Read (light / heavy) | Write (light / heavy) |
|---|---|---|
| Raster (any GDAL driver) | raster_gbx / gdal |
raster_gbx / gdal |
| GeoTIFF | gtiff_gbx / gtiff_gdal |
gtiff_gbx / gtiff_gdal |
| PMTiles | — | pmtiles_gbx / pmtiles |
Vector — single-file vector writes are lightweight-only; the sharded GeoJSONL writer (multi-file, one shard per partition, no driver merge — the recommended writer at any scale) is available in both tiers.
| Format | Read (light / heavy) | Write |
|---|---|---|
| Vector (any OGR driver) | vector_gbx / ogr |
vector_gbx (light) |
| Shapefile | shapefile_gbx / shapefile_ogr |
shapefile_gbx (light) |
| GeoJSON | geojson_gbx / geojson_ogr |
geojson_gbx (light) |
| GeoPackage | gpkg_gbx / gpkg_ogr |
gpkg_gbx (light) |
| File Geodatabase | file_gdb_gbx / file_gdb_ogr |
file_gdb_gbx (light) ¹ |
| GeoJSONL — sharded, multi-file | read via geojson_gbx (multi=true) |
geojsonl_gbx / geojsonl (light and heavy) |
¹ file_gdb_gbx write is a hybrid: it encodes the .gdb via the native GDAL (osgeo) from the heavyweight GDAL init script, because pyogrio's bundled GDAL ships a read-only OpenFileGDB driver. On compute with those natives it writes natively; otherwise it raises a clear error (use gpkg_gbx / geojson_gbx). FileGDB reading is lightweight-only.
Light vector readers/writers exchange geometry as WKB/WKT with companion *_srid columns — convert to/from Databricks GEOMETRY with st_geomfromwkb / st_aswkb (see Databricks Spatial).
- Native Databricks
GEOMETRY/GEOGRAPHYare not produced directly yet — geometries are exchanged as WKB/WKT (+*_srid); convert with the native ST functions (Databricks Spatial). - Spatial KNN is not yet ported; nor is H3 for geometry-based k-ring / k-loop.
See the scripts folder and the docs.
Databricks Labs projects are provided AS-IS, for exploration only, and are not covered by Databricks SLAs. Please file issues as GitHub Issues; they are reviewed as time permits. Do not file Databricks support tickets for these projects.




