Simulation Sandboxes¶
DuckDB + Parquet sandboxes for counterfactual SQL — UPDATE, INSERT, DELETE — without ever touching production ClickHouse.
What it is¶
ClickHouse is readonly=1 for analyst safety. That's the right default but it leaves simulator agents with nothing but prose for "what if we boosted cashback 30%?" questions. Sandboxes solve this:
- Cerebro exports a CH
SELECTto~/.cerebro/sandboxes/<id>/snapshot.parquet. - An in-memory DuckDB connection mounts the parquet as a regular table.
- The agent runs any SQL inside DuckDB — read or write — until the question is answered.
- The sandbox is destroyed (or evicted by TTL/LRU).
Production CH never sees a write. The parquet is auditable and replayable.
When to use it¶
- Counterfactuals that touch >10 rows or >2 dimensions ("what if cashback was +30% across all February?").
- Forecasting experiments with synthetic future rows.
- Multi-step UPDATE/INSERT chains that build a hypothetical state.
- MMM simulator (
mmm_simulatorpersona) and forecasting analyst flows.
For pure-formula deltas (single multiplier on a single aggregate) skip the sandbox and compute in raw CH SQL.
Step-by-step tutorial¶
1. Fork the relevant CH slice¶
create_simulation_sandbox(
sandbox_id="gpay_q2_baseline",
source_query='''
SELECT day,
sum(payment_volume_usd) AS volume,
sum(cashback_usd) AS cashback,
count(distinct user_id) AS active_users
FROM dbt.fct_execution_gpay_kpi_daily
WHERE day >= today() - 90
GROUP BY day
''',
table_name="baseline",
)
# → returns {sandbox_id, table, row_count, bytes, parquet_path}
The export is bounded by SANDBOX_MAX_BYTES_PER_EXPORT (default 2 GB).
2. Mutate the snapshot¶
query_sandbox(
sandbox_id="gpay_q2_baseline",
sql="UPDATE baseline SET cashback = cashback * 1.3",
)
# → {rows_affected: 90}
3. Re-aggregate / compare¶
query_sandbox(
sandbox_id="gpay_q2_baseline",
sql='''
SELECT
sum(cashback) AS new_cashback_total,
(sum(cashback) - sum(cashback) / 1.3) AS cashback_delta_usd,
sum(cashback) / nullif(sum(volume), 0) * 100 AS new_cashback_pct
FROM baseline
''',
)
4. Tear down¶
Worked example: forecasting with synthetic rows¶
create_simulation_sandbox(
sandbox_id="validators_zerogrowth_q3",
source_query='''
SELECT day, count(*) AS active_validators
FROM dbt.api_consensus_validators_active_daily
WHERE day >= today() - 365
GROUP BY day
''',
table_name="series",
)
query_sandbox(
sandbox_id="validators_zerogrowth_q3",
sql='''
INSERT INTO series
SELECT t.day, last_value(active_validators)
FROM (SELECT * FROM series ORDER BY day DESC LIMIT 1) AS last_row
CROSS JOIN UNNEST(generate_series(today() + 1, today() + 90, INTERVAL 1 DAY)) AS t(day)
''',
)
# now run the same forecasting SQL you'd run on CH against `series`
query_sandbox(
sandbox_id="validators_zerogrowth_q3",
sql="SELECT … FROM series",
)
destroy_sandbox("validators_zerogrowth_q3")
Tool reference¶
| Tool | Purpose |
|---|---|
create_simulation_sandbox(sandbox_id, source_query, table_name="data", database="dbt") | Fork CH data into an in-memory DuckDB sandbox with a parquet snapshot on disk. |
query_sandbox(sandbox_id, sql, max_rows=200) | Run any SQL (read or write) against the sandbox. |
list_sandboxes() | Diagnostic listing. |
destroy_sandbox(sandbox_id) | Tear down. Idempotent. |
Operational characteristics¶
| Property | Value |
|---|---|
| Sandbox state | local disk (parquet) + in-memory DuckDB |
| Production CH writes | zero — readonly=1 enforced at CH client level |
| Per-sandbox disk cap | SANDBOX_MAX_BYTES_PER_EXPORT (default 2 GB) |
| Concurrency | RLock per manager (different sandboxes parallel; same one serialized) |
| Eviction | LRU at SANDBOX_MAX_CONCURRENT (default 4) |
| Idle TTL | SANDBOX_TTL_SECONDS (default 30 min) |
| Crash safety | atexit teardown; parquet survives, DuckDB connection does not |
Type sanitizer¶
ClickHouse → Arrow → DuckDB is not 1:1. Some CH types crash pyarrow or read back as opaque BLOBs. Cerebro wraps every export in a sanitising outer SELECT:
| ClickHouse type | Cast applied | Reason |
|---|---|---|
Enum8 / Enum16 | CAST(col AS String) | Default arrow surface is int — loses labels. |
UUID | toString(col) | fixed_size_binary[16] reads as BLOB. |
IPv4 / IPv6 | toString(col) | Same BLOB problem. |
DateTime64(N) for N>6 | toDateTime64(col, 6) | Arrow ns precision unreliable in some builds. |
Decimal(P, S) for P>38 | CAST(col AS Float64) | Arrow decimal128 caps at 38. |
Date / Date32 | toDate32(col) | Avoids USMALLINT mishap in DuckDB. |
Array(Tuple(...)) deeply nested | toString(col) | DuckDB chokes. |
Nullable(T) / LowCardinality(T) | wrappers stripped, then re-evaluate T | Casts still produce nullable output. |
You don't have to think about this — it's automatic. But if a sandbox query fails with Binder Error: No function matches…, check whether the CH source column is one of the above types.
Best practices¶
- Source query should already aggregate/filter. Don't fork a million-row raw table — fork the aggregate you'll mutate.
- Use real names for sandboxes.
gpay_q2_baselinebeatss1. The id appears in resume hints and audit logs. - Always
destroy_sandboxwhen done. TTL + LRU will clean up but explicit teardown is hygienic. - Test the same SQL on CH first. A query that fails against the live data won't be fixed by sandboxing.
- Reuse table names across sandboxes. Each sandbox is isolated —
data,baseline,seriesare fine to repeat.
Pitfalls¶
- Trying to write back to ClickHouse. Production CH stays
readonly=1. The sandbox is the only mutable surface. - Expecting persistence across server restart. DuckDB is in-memory. The parquet survives but auto-remount is not implemented — re-create the sandbox on the next session.
- Multi-table joins inside a sandbox. Each
create_simulation_sandboxproduces one parquet → one DuckDB table. Multi-table mounts are not yet exposed; combine them into the source query if needed. - Hitting the 2 GB cap. Bump
SANDBOX_MAX_BYTES_PER_EXPORTif your source query needs more, or aggregate first.
See also¶
- Memory & Resume — sandbox lifecycle hooks
- Phase 2 design doc (cerebro-mcp repo) — the original sprint write-up
- MMM — the simulator persona that drives this most heavily