Contributing¶
Thank you for your interest in contributing to the CNG Datasets toolkit!
Development Setup¶
Clone the repository:
git clone https://github.com/boettiger-lab/datasets.git
cd datasets
Create a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
Install in development mode:
pip install -e ".[dev]"
Install pre-commit hooks (optional):
pip install pre-commit
pre-commit install
Code Style¶
We use:
black for code formatting (line length: 100)
ruff for linting
mypy for type checking
Format your code before committing:
black cng_datasets/
ruff check cng_datasets/
mypy cng_datasets/
Testing¶
Run tests with pytest:
# Run all tests
pytest
# Run with coverage
pytest --cov=cng_datasets
# Run specific test file
pytest tests/test_vector.py
# Run specific test
pytest tests/test_vector.py::test_h3_processor
Writing Tests¶
Place tests in the
tests/directoryName test files
test_*.pyUse descriptive test names:
test_processor_handles_empty_inputUse fixtures for common setup
Mock external dependencies (S3, Kubernetes)
Example:
import pytest
from cng_datasets.vector import H3VectorProcessor
def test_processor_validates_resolution():
with pytest.raises(ValueError):
processor = H3VectorProcessor(
input_url="test.parquet",
output_url="output/",
h3_resolution=20, # Invalid resolution
)
Documentation¶
Documentation is built with Sphinx and hosted on GitHub Pages.
Build Documentation Locally¶
cd docs/
pip install sphinx furo myst-parser
make html
View at docs/_build/html/index.html
Documentation Guidelines¶
Use Markdown for user guides
Use reStructuredText for API docs
Include code examples
Add docstrings to all public functions/classes
Follow Google docstring style
Example docstring:
def process_chunk(self, chunk_id: int) -> str:
"""Process a specific chunk of the dataset.
Args:
chunk_id: Zero-based chunk index to process
Returns:
Path to the output parquet file
Raises:
ValueError: If chunk_id is out of range
Example:
>>> processor = H3VectorProcessor(...)
>>> output = processor.process_chunk(0)
"""
Pull Request Process¶
Create a feature branch:
git checkout -b feature/my-feature
Make your changes:
Write clean, documented code
Add tests for new functionality
Update documentation
Run tests and linting:
pytest
black cng_datasets/
ruff check cng_datasets/
Commit your changes:
git add .
git commit -m "Add feature: description"
Push and create PR:
git push origin feature/my-feature
Then create a Pull Request on GitHub.
PR Checklist¶
Code follows style guidelines
Tests pass
New tests added for new features
Documentation updated
CHANGELOG.md updated
Commit messages are clear
Reporting Issues¶
Use GitHub Issues to report bugs or request features.
Bug Reports¶
Include:
Description of the bug
Steps to reproduce
Expected behavior
Actual behavior
Python version and OS
Relevant logs or error messages
Feature Requests¶
Include:
Clear description of the feature
Use cases
Example API (if applicable)
Code of Conduct¶
Be respectful and inclusive
Welcome newcomers
Focus on constructive feedback
Assume good intentions
Questions?¶
Open a GitHub Issue for bugs/features
Start a Discussion for questions
Check existing issues before creating new ones
License¶
By contributing, you agree that your contributions will be licensed under the MIT License.