Raster Processing

Create Cloud-Optimized GeoTIFFs (COGs) and H3-indexed parquet from raster datasets.

Overview

The raster processing module provides tools to:

  • Create Cloud-Optimized GeoTIFFs optimized for cloud rendering (titiler)

  • Convert rasters to H3-indexed parquet files

  • Auto-detect optimal H3 resolution from pixel size

  • Process global rasters by h0 regions for memory efficiency

Basic Usage

Python API

from cng_datasets.raster import RasterProcessor

# Process raster to COG and H3-indexed parquet
processor = RasterProcessor(
    input_path="wetlands.tif",
    output_cog_path="s3://bucket/wetlands-cog.tif",
    output_parquet_path="s3://bucket/wetlands/hex/",
    h3_resolution=None,  # Auto-detect
    parent_resolutions=[8, 0],
    value_column="wetland_class",
    nodata_value=255,
)

# Create COG
processor.create_cog()

# Convert to H3-indexed parquet
processor.process_all_h0_regions()

Command-Line Interface

# Create COG only
cng-datasets raster \
    --input wetlands.tif \
    --output-cog s3://bucket/wetlands-cog.tif \
    --compression zstd

# Raster to H3 parquet (auto-detect resolution)
cng-datasets raster \
    --input wetlands.tif \
    --output-parquet s3://bucket/wetlands/hex/ \
    --parent-resolutions "8,0" \
    --value-column wetland_class \
    --nodata 255

# COG + H3 in one command
cng-datasets raster \
    --input data.tif \
    --output-cog s3://bucket/data-cog.tif \
    --output-parquet s3://bucket/data/hex/ \
    --resolution 10 \
    --parent-resolutions "9,8,0"

Auto-Detection of H3 Resolution

The processor can automatically detect the optimal H3 resolution based on the raster’s pixel resolution:

from cng_datasets.raster import detect_optimal_h3_resolution

# Get recommended H3 resolution
h3_res = detect_optimal_h3_resolution("high-res-raster.tif")
print(f"Recommended H3 resolution: {h3_res}")

Resolution Mapping

Pixel Size

Recommended H3

Use Case

0.5-2m

h14-h15

High-res imagery

7-25m

h12-h13

Sentinel/aerial

30-300m

h9-h10

Landsat/regional

1-12km

h7-h9

Climate/global

The processor provides helpful feedback when you choose a resolution different from the detected one:

  • Finer resolution: “Using h12 instead of detected h10 - will create more cells”

  • Coarser resolution: “Using h8 instead of detected h10 - will aggregate more pixels”

Parameters

RasterProcessor

  • input_path (str): Path to input raster file (supports /vsis3/ URLs)

  • output_cog_path (str, optional): Path to output COG

  • output_parquet_path (str, optional): Path to output parquet directory

  • h3_resolution (int, optional): H3 resolution (None for auto-detect)

  • parent_resolutions (list[int]): Parent resolutions for aggregation (default: [0])

  • h0_index (int, optional): Process specific h0 region (0-121)

  • value_column (str): Name for raster value column (default: “value”)

  • nodata_value (float, optional): NoData value to exclude

  • compression (str): COG compression method (default: “zstd”)

  • blocksize (int): COG tile size (default: 512)

  • resampling (str): Resampling method (default: “nearest”)

Cloud-Optimized GeoTIFF (COG)

COGs are optimized for cloud rendering with titiler:

processor = RasterProcessor(
    input_path="data.tif",
    output_cog_path="s3://bucket/data-cog.tif",
    compression="zstd",  # or "deflate", "lzw"
    blocksize=512,  # Tile size
    resampling="bilinear"  # or "nearest", "cubic"
)

cog_path = processor.create_cog()

COGs include:

  • Internal tiling (configurable blocksize)

  • Overview pyramids for zoom levels

  • Optimized compression

  • EPSG:4326 reprojection if needed

  • Multi-threaded processing

H3 Processing by h0 Regions

For global rasters, process by h0 regions (0-121) for memory efficiency:

# Process all h0 regions
processor = RasterProcessor(
    input_path="s3://bucket/global.tif",
    output_parquet_path="s3://bucket/global/hex/",
    h3_resolution=8,
    parent_resolutions=[0],
)
output_files = processor.process_all_h0_regions()

# Or process specific h0 region (useful for K8s jobs)
processor = RasterProcessor(
    input_path="s3://bucket/global.tif",
    output_parquet_path="s3://bucket/global/hex/",
    h0_index=42,  # Process only h0 region 42
    h3_resolution=8,
)
processor.process_h0_region()

This enables:

  • Memory-efficient processing of large rasters

  • Parallel processing via Kubernetes

  • Independent failure handling per region

Kubernetes Processing

Process global rasters in parallel using Kubernetes:

apiVersion: batch/v1
kind: Job
metadata:
  name: raster-processing
spec:
  completions: 122  # One per h0 region
  parallelism: 61
  completionMode: Indexed
  template:
    spec:
      containers:
      - name: processor
        image: ghcr.io/boettiger-lab/datasets:latest
        command:
        - python
        - /app/job.py
        - --i
        - $(JOB_COMPLETION_INDEX)
        - --input-url
        - /vsis3/bucket/data.tif
        - --output-url
        - s3://bucket/output/

Or use the Python API:

from cng_datasets.k8s import K8sJobManager

manager = K8sJobManager()
job = manager.generate_chunked_job(
    job_name="wetlands-raster-h3",
    script_path="/app/wetlands/glwd/job.py",
    num_chunks=122,  # One per h0 region
    base_args=[
        "--input-url", "s3://bucket/wetlands.tif",
        "--output-url", "s3://bucket/wetlands/hex/",
        "--parent-resolutions", "8,0",
    ],
    parallelism=61,
    cpu="4",
    memory="34Gi",
)
manager.save_job_yaml(job, "wetlands-job.yaml")

Output Format

Output is partitioned by h0 (continent-scale) H3 cells:

s3://bucket/dataset/
├── dataset-cog.tif          # Cloud-Optimized GeoTIFF
└── hex/                     # H3-indexed parquet
    └── h0=0/
        └── h0_0.parquet
    └── h0=1/
        └── h0_1.parquet
    ...

Each parquet file contains:

  • h3_cell: H3 cell ID at specified resolution

  • value: Raster value (customizable column name)

  • Parent H3 cells if parent_resolutions specified

  • Excludes nodata values if specified

Examples

See the following directories for complete examples:

  • wetlands/glwd/ - Raster to H3 conversion with global h0 processing

  • iucn/ - Species range maps raster processing

  • ncp/ - Nature contributions to people raster data