GitHub Actions Runners#
Deploy self-hosted GitHub Actions runners on Kubernetes using Actions Runner Controller (ARC).
Overview#
The Actions Runner Controller (ARC) enables autoscaling of self-hosted GitHub Actions runners on Kubernetes. This allows you to run GitHub Actions workflows on your own infrastructure with access to:
- GPUs for ML/AI workloads
- Large memory instances
- Custom network access
- Local data and resources
Features#
- Autoscaling: Automatically scale runners based on workflow demand
- Resource Management: Control CPU, memory, and GPU allocation
- Custom Environments: Use custom container images with pre-installed tools
- Cost Effective: Utilize existing infrastructure instead of cloud runners
Prerequisites#
- K3s installed
- NVIDIA GPU support (optional, for GPU workflows)
- GitHub repository or organization with admin access
- GitHub Personal Access Token (PAT) or GitHub App
Installation#
Create GitHub PAT#
Create a Personal Access Token with the following scopes:
repo(for repository-level runners)admin:org(for organization-level runners)
Install Actions Runner Controller#
Follow the official documentation for setup.
Basic installation:
# Install cert-manager (if not already installed)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
# Install ARC
helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller
helm repo update
helm install arc \
--namespace actions-runner-system \
--create-namespace \
actions-runner-controller/actions-runner-controllerConfigure Runner#
Create a RunnerDeployment for your repository:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: my-runner
namespace: actions-runner-system
spec:
replicas: 1
template:
spec:
repository: owner/repo
labels:
- self-hosted
- linux
- x64
resources:
limits:
cpu: "4"
memory: "8Gi"
requests:
cpu: "2"
memory: "4Gi"Configuration Examples#
Basic CPU Runner#
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: cpu-runner
namespace: actions-runner-system
spec:
replicas: 2
template:
spec:
repository: owner/repo
labels:
- self-hosted
- linux
- cpu-only
resources:
limits:
cpu: "4"
memory: "8Gi"GPU-Enabled Runner (EFI Configuration)#
See cirrus-efi-values.yaml:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: efi-cirrus
namespace: actions-runner-system
spec:
replicas: 1
template:
spec:
repository: owner/repo
labels:
- efi-cirrus
- gpu
resources:
limits:
nvidia.com/gpu: "1"
memory: "45Gi"
cpu: "8"
requests:
nvidia.com/gpu: "1"
memory: "32Gi"
cpu: "4"Usage in Workflows#
Basic Usage#
In your .github/workflows/main.yml:
name: CI
on: [push]
jobs:
build:
runs-on: self-hosted # Use your self-hosted runner
steps:
- uses: actions/checkout@v3
- name: Run tests
run: make testWith Custom Labels#
jobs:
gpu-job:
runs-on: efi-cirrus # Use specific runner with GPU
steps:
- uses: actions/checkout@v3
- name: Run GPU workload
run: python train.pyWith Container and Resource Limits#
Important: ARC doesn’t automatically handle resource limits when users specify containers. Actions can always opt into a container, so set limits in the workflow:
jobs:
forecasting:
runs-on: efi-cirrus
container:
image: eco4cast/rocker-neon4cast:latest
options: --memory="15g" # IMPORTANT: Set memory limit
steps:
- uses: actions/checkout@v3
- name: Run forecast
run: Rscript forecast.RNote: Always set a memory limit less than or equal to the node’s capacity (e.g., ≤ 45GB for EFI configuration).
Configuration Files#
Repository includes example configurations:
cirrus-efi-values.yaml- EFI forecasting workstation (GPU, 45GB RAM)cirrus-espm157-values.yaml- ESPM 157 class workloadscirrus-espm288-values.yaml- ESPM 288 class workloads
Deploy with:
bash github-actions/helm.shResource Management#
Setting Resource Limits#
Individual workflows should specify resource limits:
runs-on: efi-cirrus
container:
image: your-image:latest
options: --memory="15g" --cpus="4"Monitoring Resource Usage#
# View runner pods
kubectl get pods -n actions-runner-system
# Check resource usage
kubectl top pods -n actions-runner-system
# View pod details
kubectl describe pod <runner-pod> -n actions-runner-systemTroubleshooting#
Runner Not Picking Up Jobs#
- Check runner registration:
kubectl logs -n actions-runner-system <runner-pod>Verify labels match: Ensure workflow
runs-onmatches runner labelsCheck GitHub connection:
- Verify PAT is valid
- Check runner appears in GitHub settings
Insufficient Resources#
If jobs fail due to resources:
- Check available resources:
kubectl describe nodesAdjust workflow limits: Reduce memory/CPU in container options
Scale runners: Increase runner replicas if multiple jobs queue
GPU Not Available#
- Verify GPU allocation:
kubectl describe node | grep nvidia.com/gpu- Check device plugin:
kubectl get pods -n kube-system | grep nvidia- Test GPU in runner:
kubectl exec -it <runner-pod> -n actions-runner-system -- nvidia-smiContainer Pull Errors#
- Check image exists: Verify image name and tag
- Verify registry access: Add image pull secrets if using private registry
- Check logs:
kubectl describe pod <runner-pod> -n actions-runner-systemBest Practices#
- Resource Limits: Always set memory limits in workflow containers
- Label Strategy: Use descriptive labels to target specific runners
- Monitoring: Regularly check runner logs and resource usage
- Security:
- Use least-privilege PATs
- Isolate runners by namespace
- Review workflow code before running
- Cleanup: Configure job cleanup to remove completed workflows
- Scaling: Set appropriate min/max replicas based on demand
Security Considerations#
Runner Isolation#
- Runners have access to cluster resources in their namespace
- Consider running untrusted workflows in isolated namespaces
- Use network policies to restrict runner network access
Secrets Management#
- Store GitHub tokens in Kubernetes secrets
- Use GitHub encrypted secrets for workflow variables
- Avoid hardcoding sensitive data in workflows
Image Security#
- Use trusted base images
- Regularly update images for security patches
- Scan images for vulnerabilities
Monitoring and Maintenance#
View Runner Status#
# List runners
kubectl get runners -n actions-runner-system
# List runner deployments
kubectl get runnerdeployments -n actions-runner-system
# Check autoscaler status
kubectl get horizontalrunnerautoscaler -n actions-runner-systemUpdate Runners#
# Update runner image
kubectl set image deployment/<runner-name> runner=<new-image> -n actions-runner-system
# Or edit directly
kubectl edit runnerdeployment <runner-name> -n actions-runner-systemLogs#
# View runner logs
kubectl logs -n actions-runner-system <runner-pod>
# View controller logs
kubectl logs -n actions-runner-system deployment/arc-actions-runner-controllerAdvanced Configuration#
Autoscaling#
Configure Horizontal Runner Autoscaler:
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: my-runner-autoscaler
namespace: actions-runner-system
spec:
scaleTargetRef:
name: my-runner
minReplicas: 1
maxReplicas: 10
metrics:
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
repositoryNames:
- owner/repoPersistent Volumes#
Add persistent storage for caching:
spec:
template:
spec:
volumeMounts:
- name: work
mountPath: /runner/_work
volumes:
- name: work
ephemeral:
volumeClaimTemplate:
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi