Quick Start¶
This guide will get you started with the CNG Datasets toolkit.
Vector Processing Example¶
Process polygon datasets to H3-indexed parquet:
from cng_datasets.vector import H3VectorProcessor
# Create processor
processor = H3VectorProcessor(
input_url="s3://my-bucket/polygons.parquet",
output_url="s3://my-bucket/h3-indexed/",
h3_resolution=10,
parent_resolutions=[9, 8, 0],
chunk_size=500,
)
# Process all chunks
output_files = processor.process_all_chunks()
Command-Line¶
cng-datasets vector \
--input s3://bucket/input.parquet \
--output s3://bucket/output/ \
--resolution 10 \
--chunk-size 500
Raster Processing Example¶
Create Cloud-Optimized GeoTIFFs and H3-indexed parquet:
from cng_datasets.raster import RasterProcessor
# Create processor
processor = RasterProcessor(
input_path="wetlands.tif",
output_cog_path="s3://bucket/wetlands-cog.tif",
output_parquet_path="s3://bucket/wetlands/hex/",
h3_resolution=None, # Auto-detect
parent_resolutions=[8, 0],
)
# Create COG
processor.create_cog()
# Convert to H3-indexed parquet
processor.process_all_h0_regions()
Command-Line¶
# Create COG + H3 parquet
cng-datasets raster \
--input data.tif \
--output-cog s3://bucket/data-cog.tif \
--output-parquet s3://bucket/data/hex/ \
--resolution 10 \
--parent-resolutions "9,8,0"
Kubernetes Workflow Example¶
Generate and run a complete K8s workflow:
# Generate workflow files
cng-datasets workflow \
--dataset my-dataset \
--source-url https://example.com/data.gpkg \
--bucket public-my-dataset \
--h3-resolution 10 \
--namespace biodiversity \
--output-dir my-dataset/
# Apply RBAC
kubectl apply -f my-dataset/workflow-rbac.yaml
# Run workflow
kubectl apply -f my-dataset/workflow.yaml
# Monitor
kubectl logs -f job/my-dataset-workflow -n biodiversity
Next Steps¶
Learn more about Vector Processing
Learn more about Raster Processing
Set up Kubernetes Workflows
Configure S3 Credentials