Deployment¶
This page documents how Gnosis Analytics services are built, deployed, and updated. All services follow the same pattern: multi-stage Docker builds, GitHub Actions CI/CD, Kubernetes deployments, and secrets injected from AWS SSM Parameter Store.
Docker Builds¶
Multi-Stage Build Pattern¶
All services use multi-stage Docker builds to minimize image size and attack surface. The pattern separates build-time dependencies from the runtime image:
# Stage 1: Build
FROM python:3.11-slim AS builder
WORKDIR /build
RUN apt-get update && apt-get install -y --no-install-recommends gcc
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt
# Stage 2: Runtime
FROM python:3.11-slim
WORKDIR /code
RUN useradd -m -u 1000 appuser
COPY --from=builder /root/.local /home/appuser/.local
ENV PATH=/home/appuser/.local/bin:$PATH
COPY ./app /code/app
RUN chown -R appuser:appuser /code
USER appuser
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/ || exit 1
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--proxy-headers"]
Key practices:
- Non-root user -- All containers run as
appuser(UID 1000) - Health checks -- Built into the Dockerfile for Kubernetes liveness probes
- Minimal runtime -- Only runtime dependencies in the final image (no compilers, build tools)
- ARM64 target -- All images are built for
linux/arm64to run on Graviton nodes
Service-Specific Builds¶
| Service | Base Image | Language | Notes |
|---|---|---|---|
| cerebro-api | python:3.11-slim | Python | FastAPI + uvicorn |
| cerebro-mcp | python:3.11-slim | Python | FastMCP server |
| metrics-dashboard | node:20-slim | TypeScript | React build, served via nginx |
| cryo-indexer | cryo-base (custom) | Rust | Built on custom ARM64 Cryo base image |
| beacon-indexer | golang:1.23-alpine | Go | Compiled binary |
| nebula | golang:1.23-alpine | Go | Compiled binary |
| ip-crawler | python:3.12-slim | Python | Lightweight crawler |
| click-runner | python:3.12-slim | Python | SQL execution toolkit |
CI/CD Pipeline¶
GitHub Actions¶
The CI/CD pipeline runs on GitHub Actions. Each repository has its own workflow that triggers on pushes to main.
flowchart LR
PUSH[Push to main] --> BUILD[Build Docker image]
BUILD --> TEST[Run tests]
TEST --> PUSH_IMG[Push to GHCR]
PUSH_IMG --> DEPLOY[Deploy to EKS] Image Publishing¶
Images are published to GitHub Container Registry (GHCR) with two tags:
latest-- Always points to the most recent build frommainsha-{commit}-- Git commit SHA for traceability and rollback
# Example GitHub Actions workflow step
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
platforms: linux/arm64
push: true
tags: |
ghcr.io/gnosischain/cerebro-api:latest
ghcr.io/gnosischain/cerebro-api:sha-${{ github.sha }}
Kubernetes Deployment¶
Deployments¶
Long-running services (API, dashboard, crawlers) are deployed as Kubernetes Deployments:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cerebro-api
namespace: cerebro
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: cerebro-api
template:
metadata:
labels:
app: cerebro-api
spec:
containers:
- name: cerebro-api
image: ghcr.io/gnosischain/cerebro-api:latest
ports:
- containerPort: 8000
envFrom:
- secretRef:
name: cerebro-api-secrets
env:
- name: DBT_MANIFEST_URL
value: "https://gnosischain.github.io/dbt-cerebro/manifest.json"
- name: DBT_MANIFEST_REFRESH_ENABLED
value: "true"
- name: DBT_MANIFEST_REFRESH_INTERVAL_SECONDS
value: "300"
readinessProbe:
httpGet:
path: /
port: 8000
initialDelaySeconds: 10
periodSeconds: 15
livenessProbe:
httpGet:
path: /
port: 8000
initialDelaySeconds: 30
periodSeconds: 30
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: "1"
memory: 1Gi
nodeSelector:
kubernetes.io/arch: arm64
imagePullSecrets:
- name: ghcr-pull-secret
Services¶
Each deployment is fronted by a Kubernetes Service:
apiVersion: v1
kind: Service
metadata:
name: cerebro-api
namespace: cerebro
spec:
selector:
app: cerebro-api
ports:
- port: 8000
targetPort: 8000
protocol: TCP
type: ClusterIP
Ingress¶
External access is configured via Ingress resources with the AWS Load Balancer Controller:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: cerebro-api
namespace: cerebro
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:...
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
alb.ingress.kubernetes.io/ssl-redirect: "443"
spec:
rules:
- host: api.analytics.gnosis.io
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: cerebro-api
port:
number: 8000
CronJobs¶
Periodic data ingestion tasks run as Kubernetes CronJobs:
apiVersion: batch/v1
kind: CronJob
metadata:
name: click-runner-ember
namespace: crawlers
spec:
schedule: "0 2 * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
backoffLimit: 2
template:
spec:
containers:
- name: click-runner
image: ghcr.io/gnosischain/click-runner:latest
command: ["python", "run_queries.py"]
args: ["--ingestor=csv", "--create-table-sql=...", "--insert-sql=..."]
envFrom:
- secretRef:
name: clickhouse-credentials
restartPolicy: OnFailure
nodeSelector:
kubernetes.io/arch: arm64
Rolling Updates Strategy¶
The API uses a rolling update strategy that ensures zero-downtime deployments:
| Parameter | Value | Effect |
|---|---|---|
maxSurge | 1 | One additional pod is created before old pods are terminated |
maxUnavailable | 0 | No existing pods are terminated until the new pod is ready |
| Readiness probe | HTTP GET / | New pod must pass health check before receiving traffic |
This means during a deployment:
- A new pod is started with the updated image
- Kubernetes waits for the readiness probe to pass
- The new pod starts receiving traffic
- The old pod is terminated
- This repeats for each replica
Secrets Management¶
Secrets follow a chain from AWS SSM Parameter Store through the External Secrets Operator into Kubernetes Secrets, which are injected into pods as environment variables.
flowchart LR
SSM[AWS SSM\nParameter Store] --> ESO[External Secrets\nOperator]
ESO --> K8S[Kubernetes\nSecret]
K8S --> POD[Pod\nEnv Vars] AWS SSM Parameter Store¶
All secrets are stored as SecureString parameters in AWS Systems Manager Parameter Store:
| Parameter Path | Description |
|---|---|
/cerebro/clickhouse/host | ClickHouse Cloud hostname |
/cerebro/clickhouse/user | ClickHouse username |
/cerebro/clickhouse/password | ClickHouse password |
/cerebro/api/keys | API key registry (JSON) |
/cerebro/ipinfo/token | ipinfo.io API token |
External Secrets Operator¶
The External Secrets Operator (ESO) runs in the cluster and synchronizes SSM parameters into Kubernetes Secrets:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: cerebro-api-secrets
namespace: cerebro
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-ssm
kind: ClusterSecretStore
target:
name: cerebro-api-secrets
creationPolicy: Owner
data:
- secretKey: CLICKHOUSE_URL
remoteRef:
key: /cerebro/clickhouse/host
- secretKey: CLICKHOUSE_USER
remoteRef:
key: /cerebro/clickhouse/user
- secretKey: CLICKHOUSE_PASSWORD
remoteRef:
key: /cerebro/clickhouse/password
Environment Variable Injection¶
Secrets are injected into pods via envFrom on the container spec:
This makes all keys in the Kubernetes Secret available as environment variables in the container (e.g., CLICKHOUSE_URL, CLICKHOUSE_PASSWORD).
Secret rotation
When secrets are updated in SSM Parameter Store, ESO synchronizes them to Kubernetes Secrets on the configured refreshInterval (default: 1 hour). Pods must be restarted to pick up updated secrets since environment variables are set at pod creation time.
Manifest Auto-Refresh¶
The cerebro-api has a built-in mechanism to auto-discover new dbt models without redeployment:
- The API periodically polls the
DBT_MANIFEST_URL(default: every 5 minutes) - It uses HTTP conditional requests (
ETag,If-Modified-Since) to avoid unnecessary downloads - When the manifest changes, it rebuilds the FastAPI route table with new/updated/removed endpoints
- No restart or redeployment is required
| Environment Variable | Default | Description |
|---|---|---|
DBT_MANIFEST_URL | https://gnosischain.github.io/dbt-cerebro/manifest.json | Remote manifest URL |
DBT_MANIFEST_REFRESH_ENABLED | true | Enable periodic refresh |
DBT_MANIFEST_REFRESH_INTERVAL_SECONDS | 300 | Refresh interval (5 minutes) |
Internal users with tier3 access can force an immediate refresh:
curl -X POST "https://api.analytics.gnosis.io/v1/system/manifest/refresh" \
-H "X-API-Key: sk_live_internal_key"
Deployment Checklist¶
When deploying a new service or updating an existing one:
- Docker image builds successfully for
linux/arm64 - Image is pushed to GHCR with
latestand SHA tags - Kubernetes manifests are updated (Deployment, Service, Ingress, CronJob as applicable)
- Secrets are provisioned in SSM Parameter Store
- ExternalSecret resource is created/updated
- Resource requests and limits are set appropriately
- Health checks (readiness and liveness probes) are configured
- Node selector includes
kubernetes.io/arch: arm64 - Rolling update strategy is configured for zero-downtime deployment
Next Steps¶
- Infrastructure -- Underlying AWS infrastructure
- Monitoring -- Observability and alerting
- Troubleshooting -- Common deployment issues