dbt-cerebro¶
dbt-cerebro is the core data transformation project for the Gnosis Analytics platform. It is a dbt project containing approximately 400 SQL models that transform raw blockchain data from ClickHouse into analytics-ready datasets.
Overview¶
dbt-cerebro transforms Gnosis Chain data across four domains:
- Execution Layer -- Transaction analysis, smart contract interactions, DeFi protocols, gas usage
- Consensus Layer -- Validator activity, block proposals, attestations, sync committee participation
- P2P Network -- Peer-to-peer interactions, client distributions, network topology
- ESG / Sustainability -- Environmental metrics including power consumption and carbon emissions
All data is unified in ClickHouse Cloud for high-performance analytics.
Architecture¶
Data Sources dbt Transformation Pipeline Output
───────────── ────────────────────────── ──────
execution.blocks --> stg_execution__blocks \
execution.transactions --> stg_execution__transactions |
execution.logs --> stg_execution__logs |
consensus.blocks --> stg_consensus__blocks |--> int_* models
nebula.visits --> stg_p2p__visits | (incremental)
crawlers_data.* --> stg_crawlers__* /
|
v
int_execution_blocks_clients_daily \
int_consensus_validators_active_daily |
int_p2p_peers_geo_daily |--> fct_* models
int_bridges_transfers_daily / (views)
|
v
fct_execution_gas_daily \
fct_consensus_attestation_rates |--> api_* models
fct_p2p_client_diversity / (views)
|
v
api_execution_transactions_daily --> REST API
api_consensus_validators_active_daily --> REST API
api_p2p_client_distribution_daily --> REST API
Model Organization¶
Models are organized by domain module, each with its own staging, intermediate, and marts subdirectories:
models/
├── execution/ # Execution layer (blocks, txs, contracts)
│ ├── staging/ # stg_execution__*
│ ├── intermediate/ # int_execution_*
│ └── marts/ # fct_execution_*, api_execution_*
├── consensus/ # Consensus layer (validators, attestations)
│ ├── staging/
│ ├── intermediate/
│ └── marts/
├── contracts/ # Decoded smart contract data
│ └── {protocol}/ # One subdirectory per protocol
├── p2p/ # P2P network analysis
├── ESG/ # Environmental sustainability
├── bridges/ # Cross-chain bridge activity
├── probelab/ # ProbeLab network metrics
└── crawlers/ # Crawler data transformations
Incremental Processing¶
Most intermediate models are incrementally materialized. The same model SQL supports four distinct invocation modes — daily catch-up, microbatch slicing, full-refresh batching, and refill recovery — selected by which dbt vars are set when the model runs.
A typical incremental config:
{{
config(
materialized='incremental',
incremental_strategy=(
'append'
if (var('start_month', none) or var('incremental_end_date', none))
else 'delete+insert'
),
engine='ReplacingMergeTree()',
order_by='(block_timestamp, transaction_hash)',
unique_key='(block_timestamp, transaction_hash)',
partition_by='toStartOfMonth(block_timestamp)'
)
}}
SELECT ...
FROM {{ source('execution', 'transactions') }}
{% if var('start_month', none) and var('end_month', none) %}
WHERE toStartOfMonth(block_timestamp) >= toDate('{{ var("start_month") }}')
AND toStartOfMonth(block_timestamp) <= toDate('{{ var("end_month") }}')
{% else %}
{{ apply_monthly_incremental_filter('block_timestamp', 'date', false) }}
{% endif %}
Key building blocks:
delete+insertfor daily runs (no vars set) — delete the affected window, insert fresh rows. Cheap on small windows.appendwhen eitherstart_month(full-refresh / refill) orincremental_end_date(microbatch) is set — no mutation. RMT collapses duplicates on background merges or via on-demandOPTIMIZE.ReplacingMergeTreehandles deduplication onunique_keyso re-runs are idempotent.- Monthly partitioning on
toStartOfMonth(...)enables efficient partition-level deletes, queries, andOPTIMIZEoperations. apply_monthly_incremental_filterproduces the WHERE clause for the daily and microbatch paths; thestart_monthbranch is written inline because it filters source rows.
For the full picture see:
- Incremental Strategies — the four invocation modes, the macro routing logic, and the
refill_appendtag for heavy aggregates - Running Models — the four production runners with end-to-end examples
- Recovering from a Prices Gap — incident response when a prices source skips a day
Contract ABI Decoding¶
dbt-cerebro includes a system for decoding raw blockchain transaction data into human-readable function calls and events. See ABI Decoding for the full workflow.
Macros¶
The project includes custom macros in the macros/ directory:
| Category | Key Macros | Purpose |
|---|---|---|
db/ | apply_monthly_incremental_filter | Monthly incremental processing — see Incremental Strategies |
db/ | refill_safe_pre_hook / refill_safe_post_hook | Memory contract for tag:refill_append models — caps memory at 8 GiB and enables disk spill |
db/ | optimize_partition_final | OPTIMIZE TABLE … PARTITION '<m>' FINAL DEDUPLICATE — used by the refill script |
db/ | kill_failed_mutations | Cleanup poisoned mutations from killed delete+insert runs |
decoding/ | decode_calls, decode_logs | Contract ABI decoding |
execution/ | Various helpers | Execution layer utilities |
Seeds¶
Static reference data is loaded via dbt seeds:
| Seed File | Description |
|---|---|
contracts_abi.csv | Contract ABIs fetched from Blockscout |
event_signatures.csv | Event topic signatures generated from ABIs |
function_signatures.csv | Function selector signatures generated from ABIs |
Configuration¶
Environment Variables¶
| Variable | Description |
|---|---|
CLICKHOUSE_URL | ClickHouse Cloud hostname |
CLICKHOUSE_PORT | HTTP port (typically 8443) |
CLICKHOUSE_USER | Username (typically default) |
CLICKHOUSE_PASSWORD | Authentication password |
CLICKHOUSE_SECURE | Use HTTPS (True) |
CLICKHOUSE_DATABASE | Target database for dbt models |
ClickHouse Requirements¶
- ClickHouse version 24.1 or later
- Permissions to create/drop tables, read source schemas, and write to target schemas
Docker Deployment¶
dbt-cerebro ships as a Docker container with all dependencies pre-installed:
# Start the container (includes documentation server on port 8080)
docker-compose up -d
# Enter the container
docker exec -it dbt /bin/bash
# Test the connection
dbt debug
# Run all models
dbt run
# Run specific module
dbt run --select execution
# Run specific model with upstream dependencies
dbt build --select +api_execution_transactions_daily
# Full refresh of an incremental model
dbt run --select int_execution_blocks_clients_version_daily --full-refresh
# Generate and serve documentation
dbt docs generate
# Documentation is automatically served on port 8080
Production Runs¶
# Run all models and tests
docker exec dbt bash -c "dbt run && dbt test"
# Or use the production cron script
docker exec dbt /app/cron.sh
For the full set of runners (daily cron, microbatch catch-up, full-refresh batched, refill recovery) see Running Models.
Development Workflow¶
- Choose the appropriate layer based on the type of transformation needed
- Follow naming conventions:
stg_{source}__{table},int_{domain}_{metric}_{grain},fct_{domain}_{metric}_{grain},api_{domain}_{metric}_{grain} - Configure materialization appropriately (views for staging/facts/API, incremental for intermediate)
- Add tests using dbt's built-in testing framework
- Generate documentation with
dbt docs generate
See Model Layers for detailed guidance on each layer.