beacon-indexer¶
The beacon-indexer is a consensus layer indexer that fetches data from a beacon node REST API and stores it in ClickHouse or Parquet files. It follows an ELT (Extract, Load, Transform) pattern with fork-aware parsing and automatic network detection.
Purpose¶
beacon-indexer handles the acquisition of all consensus layer data:
- Validator sets, balances, and status changes
- Block proposals and attestations
- Sync committee participation
- Rewards and penalties
- Deposits, withdrawals, and voluntary exits
Architecture¶
The indexer operates in two distinct phases:
Phase 1: Load (Extract + Load)
Beacon Node REST API --> Raw JSON --> ClickHouse raw_* tables
Phase 2: Transform
raw_* tables --> Fork-aware parsing --> Structured tables
This ELT separation means raw data is preserved exactly as received from the beacon API. Transformation logic can be updated and re-run independently without re-fetching data from the beacon node.
Storage Backends¶
beacon-indexer supports two storage backends, selected via the STORAGE_BACKEND environment variable:
| Backend | Best For | Setup Required |
|---|---|---|
| ClickHouse | Production systems, real-time analytics, complex queries | Database migration |
| Parquet | Data analysis, ETL pipelines, archival storage, development | None |
The ClickHouse backend provides fork-aware chunk-based processing with concurrent access and monitoring. The Parquet backend produces portable columnar files compatible with pandas, DuckDB, and Spark.
Fork Awareness¶
The beacon chain has undergone multiple protocol upgrades (forks), each introducing new data fields and structures. beacon-indexer automatically detects the fork version for each slot and applies the appropriate parser.
| Fork | Key Changes for Indexing |
|---|---|
| Phase 0 | Base beacon chain: blocks, attestations, deposits, voluntary exits |
| Altair | Sync committee data, participation metrics |
| Bellatrix | Execution payloads (The Merge) |
| Capella | Validator withdrawals, BLS-to-execution changes |
| Deneb | Blob KZG commitments, blob sidecar data |
| Electra | Execution layer requests (deposit, withdrawal, consolidation) |
Supported Networks¶
| Network | Slot Time | Slots per Epoch | Detection |
|---|---|---|---|
| Mainnet | 12 seconds | 32 | Automatic |
| Gnosis Chain | 5 seconds | 16 | Automatic |
| Holesky | 12 seconds | 32 | Automatic |
| Sepolia | 12 seconds | 32 | Automatic |
Network parameters are auto-detected from the beacon chain genesis data. No manual configuration is required.
Operations¶
Migration¶
Creates the database schema (ClickHouse backend only):
Backfill¶
Loads historical raw data from the beacon API for a specified slot range:
Transform¶
Parses raw data into structured tables with fork-appropriate field extraction:
Real-time¶
Continuously follows the chain head, loading new slots as they are produced:
Maintenance¶
The maintenance system provides tools for data integrity checking and recovery:
# Check system health
make maintenance-status
# Find and fix failed chunks
make maintenance-fix
# Fix recent failures (last week)
make maintenance-fix-recent
# Analyze data gaps
make maintenance-gaps
Maintenance supports dry-run mode (--dry-run) to preview what would be changed without modifying data.
Configuration¶
Required Settings¶
| Variable | Description |
|---|---|
BEACON_NODE_URL | Beacon node REST API endpoint |
STORAGE_BACKEND | clickhouse or parquet |
ClickHouse Settings (when using ClickHouse backend)¶
| Variable | Default | Description |
|---|---|---|
CLICKHOUSE_HOST | -- | ClickHouse server hostname |
CLICKHOUSE_PASSWORD | -- | Authentication password |
CLICKHOUSE_PORT | 8443 | HTTP port |
CLICKHOUSE_USER | default | Username |
CLICKHOUSE_DATABASE | consensus | Target database |
CLICKHOUSE_SECURE | true | Use HTTPS |
Parquet Settings (when using Parquet backend)¶
| Variable | Default | Description |
|---|---|---|
PARQUET_OUTPUT_DIR | ./parquet_data | Output directory for Parquet files |
Processing Settings¶
| Variable | Description |
|---|---|
ENABLED_LOADERS | Comma-separated list of data loaders to enable (blocks, validators, rewards, etc.) |
CHUNK_SIZE | Number of slots per processing chunk |
Data Flow¶
- Load phase -- The backfill or realtime command fetches raw beacon API responses and stores them in
raw_*tables (ClickHouse) or raw Parquet files. - Transform phase -- The transform command reads raw data, detects the fork version for each slot, and writes parsed data into structured tables (blocks, validators, attestations, etc.).
- State tracking -- Chunk-based state tracking records which slot ranges have been loaded and transformed, enabling resumability.
Docker Deployment¶
# Build the image
make build
# Run the full pipeline
make migration # Create schema (ClickHouse only)
make backfill # Load historical data
make transform # Parse raw data
make realtime # Follow chain head
# Maintenance
make maintenance-status
make maintenance-fix
ClickHouse Table Schemas¶
All tables are stored in the consensus database.
Table: consensus.blocks
Engine: ReplacingMergeTree(insert_version) ORDER BY: (slot)
| Column | Type | Notes |
|---|---|---|
slot | UInt64 | Beacon chain slot number |
proposer_index | UInt64 | Validator that proposed the block |
parent_root | String | Parent block root hash |
state_root | String | State root hash |
signature | String | Block signature |
version | String | Fork version (phase0, altair, etc.) |
graffiti | String | Proposer graffiti field |
eth1_deposit_count | UInt64 | Cumulative ETH1 deposit count |
sync_aggregate_participation | UInt64 | Sync committee participation bits |
withdrawals_count | UInt32 | Number of withdrawals in block |
blob_kzg_commitments_count | UInt32 | Number of blob KZG commitments |
slot_timestamp | DateTime64 | Materialized from slot and genesis time |
insert_version | UInt64 | Materialized; deduplication version |
Table: consensus.attestations
Engine: ReplacingMergeTree(insert_version) ORDER BY: (slot, attestation_index, committee_index)
| Column | Type | Notes |
|---|---|---|
slot | UInt64 | Slot the attestation was included in |
attestation_index | UInt64 | Position within the block |
aggregation_bits | String | Bitfield of participating validators |
signature | String | Aggregated BLS signature |
attestation_slot | UInt64 | Slot being attested to |
committee_index | UInt64 | Committee index |
beacon_block_root | String | Attested beacon block root |
source_epoch | UInt64 | Justified checkpoint epoch |
target_epoch | UInt64 | Target checkpoint epoch |
source_root | String | Justified checkpoint root |
target_root | String | Target checkpoint root |
slot_timestamp | DateTime64 | Materialized from slot |
Table: consensus.validators
Engine: ReplacingMergeTree(insert_version) ORDER BY: (slot, validator_index)
| Column | Type | Notes |
|---|---|---|
slot | UInt64 | Snapshot slot |
validator_index | UInt32 | Unique validator index |
balance | UInt64 | Current balance in Gwei |
status | String | Validator lifecycle status |
pubkey | String | BLS public key |
effective_balance | UInt64 | Effective balance in Gwei |
slashed | UInt8 | 1 = slashed, 0 = not slashed |
activation_epoch | UInt64 | Epoch when validator activated |
exit_epoch | UInt64 | Epoch when validator exited |
withdrawable_epoch | UInt64 | Epoch when balance is withdrawable |
slot_timestamp | DateTime64 | Materialized from slot |
Table: consensus.rewards
Engine: ReplacingMergeTree(insert_version) ORDER BY: (slot, proposer_index)
| Column | Type | Notes |
|---|---|---|
slot | UInt64 | Slot number |
proposer_index | UInt64 | Proposing validator index |
total | UInt64 | Total reward amount |
attestations | UInt64 | Reward from attestation inclusion |
sync_aggregate | UInt64 | Reward from sync committee |
proposer_slashings | UInt64 | Reward from proposer slashings |
attester_slashings | UInt64 | Reward from attester slashings |
Table: consensus.raw_blocks / consensus.raw_validators / consensus.raw_rewards
Engine: ReplacingMergeTree ORDER BY: (slot, payload_hash)
| Column | Type | Notes |
|---|---|---|
slot | UInt64 | Beacon chain slot number |
payload | String | Full JSON response from beacon API |
payload_hash | String | Hash of the payload for deduplication |
retrieved_at | DateTime | Timestamp when data was fetched |
Table: consensus.specs
Engine: ReplacingMergeTree(updated_at) ORDER BY: (parameter_name)
| Column | Type | Notes |
|---|---|---|
parameter_name | String | Beacon chain spec parameter name |
parameter_value | String | Parameter value |
updated_at | DateTime64 | Last update timestamp |
Table: consensus.genesis
Engine: ReplacingMergeTree() ORDER BY: (genesis_time)
| Column | Type | Notes |
|---|---|---|
genesis_time | DateTime64 | Chain genesis timestamp |
genesis_validators_root | String | Merkle root of genesis validators |
genesis_fork_version | String | Fork version at genesis |