Skip to content

Available Datasets

Datasets are served from the public STAC catalog at:

https://s3-west.nrp-nautilus.io/public-data/stac/catalog.json

The agent discovers dataset paths and schemas dynamically — use the browse_stac_catalog and get_stac_details tools rather than hardcoding any S3 paths.

Current datasets

DatasetDescription
GLWDGlobal Lakes and Wetlands Database
Vulnerable CarbonConservation International carbon vulnerability data
NCPNature Contributions to People biodiversity scores
Countries & RegionsGlobal administrative boundaries (Overture Maps)
WDPAWorld Database on Protected Areas
Ramsar SitesWetlands of International Importance
HydroBASINSGlobal watershed boundaries (levels 3–6)
iNaturalistSpecies occurrence range maps
Corruption Index 2024Transparency International data

H3 spatial indexing

All datasets are indexed using Uber's H3 hexagonal grid system.

ColumnResolutionCell area
h8Resolution 8~0.74 km²
h4Resolution 4~1,771 km²
h0Resolution 0~4,357,449 km²

Area calculations

sql
-- Area in km² using H3 hex counts
SELECT APPROX_COUNT_DISTINCT(h8) * 0.737327598 AS area_km2
FROM read_parquet('s3://...')
WHERE ...

Cross-dataset joins

Always include h0 in join conditions to enable partition pruning:

sql
SELECT a.h8, a.value, b.other_value
FROM read_parquet('s3://dataset-a/**') a
JOIN read_parquet('s3://dataset-b/**') b
  ON a.h8 = b.h8 AND a.h0 = b.h0   -- h0 required for pruning

Omitting h0 forces a full scan of both datasets and is 5–20× slower.

Released under the MIT License.