Gnosis Pay → Mixpanel Bridge¶
This page is the deep dive on how Cerebro joins Gnosis Pay's on-chain Safe stack to Mixpanel product analytics. Read the Gnosis Pay protocol page first for the on-chain context, and the Privacy & Pseudonyms deep dive for the keyed-hash pattern that makes the join possible without leaking raw addresses.
Why this is harder than it looks¶
The naive question — "is this Mixpanel user a Gnosis Pay cardholder?" — has a deceptively non-trivial answer because the same person can appear in three different identity forms across the GP product flows:
- As the original setup owner of the Safe — the EOA they signed in with via passkey or wallet during onboarding, before ownership was transferred to the inaccessible sentinel
0x...0002. After onboarding this EOA only has access through the Delay module. - As a spender delegate of the Safe — a delegate key authorized via the Roles module's
AssignRolesevent. This is what the GP card-flow infrastructure uses to trigger spends within the user's allowance. It may or may not be the same key as the original owner. - As the Safe address itself — some GP frontend flows call
mixpanel.identify(safeAddress)rather thanmixpanel.identify(eoaAddress), so the smart-account address shows up as the user'sdistinct_id.
A bridge that picks one of these and ignores the other two will silently miss most of its target audience. Matching only on the EOA owner — the most obvious choice — collapses the match rate to near-zero, because the Mixpanel distinct_id for many users is not the EOA. The three forms must all be tried.
The solution: pseudonymize all three candidate identifiers, store them in a single union table, and treat any match on any of the three as "this Mixpanel user is a GP cardholder". This is the union-match pattern from the pseudonyms doc.
The identity model¶
int_execution_gpay_safe_identities is the catalog of every (GP Safe, identity role, pseudonym) triple. It's a long-form table — one row per identity, not one row per Safe — so a Safe with two initial owners and three delegates produces six rows (2 + 3 + 1 self).
-- models/execution/gpay/intermediate/int_execution_gpay_safe_identities.sql
WITH gp_safes AS (
SELECT lower(address) AS gp_safe FROM {{ ref('stg_gpay__wallets') }}
),
initial_owners AS (
SELECT
oe.safe_address AS gp_safe,
'initial_owner' AS identity_role,
{{ pseudonymize_address('oe.owner') }} AS user_pseudonym
FROM {{ ref('int_execution_safes_owner_events') }} oe
INNER JOIN gp_safes gs ON lower(oe.safe_address) = gs.gp_safe
WHERE oe.event_kind = 'safe_setup' AND oe.owner IS NOT NULL
),
delegates AS (
SELECT
d.gp_safe,
'delegate' AS identity_role,
{{ pseudonymize_address('d.delegate_address') }} AS user_pseudonym
FROM {{ ref('int_execution_gpay_spender_delegates_current') }} d
),
safe_self AS (
SELECT
gp_safe,
'safe_self' AS identity_role,
{{ pseudonymize_address('gp_safe') }} AS user_pseudonym
FROM gp_safes
)
SELECT * FROM initial_owners
UNION ALL SELECT * FROM delegates
UNION ALL SELECT * FROM safe_self
Three identity roles, three sub-queries, one union. The privacy invariant — every column derived from a wallet address goes through pseudonymize_address — is enforced inline in each sub-query. The only column that is not pseudonymized is gp_safe, which we treat as a smart-account identifier rather than as PII (it identifies a card, not a person; if that classification ever changes, the bridge model is the single point to flip).
The pseudonym flow end-to-end¶
graph LR
EOA1["EOA<br/>(initial owner)"] -- pseudonymize_address --> P1[user_pseudonym]
EOA2["EOA<br/>(delegate)"] -- pseudonymize_address --> P2[user_pseudonym]
SAFE["Safe address"] -- pseudonymize_address --> P3[user_pseudonym]
P1 --> ID[int_execution_gpay_safe_identities]
P2 --> ID
P3 --> ID
DID["Mixpanel distinct_id<br/>(at ingest)"] -- pseudonymize_address --> UIH[user_id_hash]
UIH --> SMP[stg_mixpanel_ga__events]
ID -- "JOIN ON<br/>id.user_pseudonym =<br/>mp.user_id_hash" --> FCT[fct_mixpanel_ga_gpay_users]
SMP --> FCT The same pseudonymize_address macro is used on every edge of this diagram. The same CEREBRO_PII_SALT is used everywhere, automatically — there is no second salt, no second macro, no way to introduce a mismatch by accident. If the join produces zero rows, the cause is always one of:
- The salt was different in two environments (someone forgot to set
CEREBRO_PII_SALTin one of them). - The
distinct_idis not in any of the three identity buckets (i.e., the user is genuinely not a GP cardholder, or the GP frontend uses a fourth identifier we haven't accounted for).
The per-user fact¶
fct_mixpanel_ga_gpay_users is the workhorse table for downstream analytics. One row per (user_id_hash, gp_safe) pair, with denormalized columns for module topology, current daily limit, and recent delay activity.
-- models/mixpanel_ga/marts/fct_mixpanel_ga_gpay_users.sql (excerpt)
WITH mp_users AS (
SELECT DISTINCT user_id_hash
FROM {{ ref('stg_mixpanel_ga__events') }}
WHERE is_production = 1 AND is_identified = 1
),
matches AS (
SELECT
mp.user_id_hash,
id.gp_safe,
groupArray(DISTINCT id.identity_role) AS matched_roles
FROM mp_users mp
INNER JOIN {{ ref('int_execution_gpay_safe_identities') }} id
ON mp.user_id_hash = id.user_pseudonym
GROUP BY mp.user_id_hash, id.gp_safe
)
SELECT
m.user_id_hash,
m.gp_safe,
m.matched_roles, -- which of (initial_owner, delegate, safe_self) matched
groupArray(sm.contract_type) AS enabled_modules,
any(al.refill) AS daily_limit,
any(al.period) AS allowance_period_seconds,
coalesce(any(da.delay_tx_30d), 0) AS delay_txs_last_30d
FROM matches m
LEFT JOIN {{ ref('int_execution_gpay_safe_modules') }} sm ON sm.gp_safe = m.gp_safe
LEFT JOIN {{ ref('int_execution_gpay_allowances_current') }} al ON al.gp_safe = m.gp_safe
LEFT JOIN (
SELECT gp_safe, sum(tx_added_count) AS delay_tx_30d
FROM {{ ref('int_execution_gpay_delay_activity_daily') }}
WHERE date >= today() - 30
GROUP BY gp_safe
) da ON da.gp_safe = m.gp_safe
GROUP BY m.user_id_hash, m.gp_safe, m.matched_roles
Output schema¶
| Column | Type | Source | Notes |
|---|---|---|---|
user_id_hash | UInt64 | stg_mixpanel_ga__events | Pseudonym; identical join key on both sides. |
gp_safe | String | stg_gpay__wallets | Smart-account identifier. Lowercase, 0x-prefixed. |
matched_roles | Array(String) | derived | Which of initial_owner, delegate, safe_self matched. Almost always one element; can be more for unusual setups. |
enabled_modules | Array(String) | int_execution_gpay_safe_modules | Always ['DelayModule', 'RolesModule', 'SpenderModule'] for a fully onboarded GP Safe. Missing values flag a partial onboarding. |
daily_limit | UInt128 | int_execution_gpay_allowances_current | The refill field from the latest SetAllowance event; in token base units. |
allowance_period_seconds | UInt64 | int_execution_gpay_allowances_current | The period field. Almost always 86400 (24h). |
delay_txs_last_30d | UInt64 | int_execution_gpay_delay_activity_daily | Count of TransactionAdded events in the trailing 30 days for this Safe. Privacy-respecting "user did something admin-y recently" signal. |
The model is materialized as a table and rebuilt fully on each run — it's small (one row per GP cardholder × per Safe they own, typically a 1:1 relationship).
The daily rollup¶
fct_mixpanel_ga_gpay_crossdomain_daily already existed before this work — the original version produced (date, mp_dau, onchain_active_users, matched_users, match_rate_pct). The expansion adds dimensional columns derived from fct_mixpanel_ga_gpay_users:
| New column | What it counts |
|---|---|
matched_by_initial_owner_users | Distinct MP users matched via the initial_owner role |
matched_by_delegate_users | Matched via the delegate role |
matched_by_safe_self_users | Matched via the safe_self role |
users_with_delay_activity_7d | Matched users whose gp_safe had TransactionAdded events in the last 7 days |
users_with_allowance_changes_30d | Matched users whose gp_safe had SetAllowance events in the last 30 days |
The original matched_users column becomes matched_users_any — the union of the three role buckets — to preserve backward compatibility while reflecting the new reality.
Match-rate diagnostics¶
Once the bridge is built, the headline number to watch is the match rate by role:
SELECT
countIf(has(matched_roles, 'initial_owner')) AS via_initial_owner,
countIf(has(matched_roles, 'delegate')) AS via_delegate,
countIf(has(matched_roles, 'safe_self')) AS via_safe_self,
count() AS total_matched_pairs,
uniqExact(user_id_hash) AS distinct_mp_users
FROM dbt.fct_mixpanel_ga_gpay_users;
What to expect:
via_initial_ownershould dominate — the original setup owner is the most likely identifier the GP frontend uses.via_safe_selfshould be a small but non-zero tail — some flows do callmixpanel.identify(safeAddress).via_delegateshould be the smallest tail — delegate keys are usually backend-managed and not user-facing, so they rarely show up in Mixpanel.distinct_mp_users÷ (count ofis_identified=1MP users) gives the overall GP cardholder coverage of Mixpanel — the headline KPI for product analytics.
If via_initial_owner is near-zero, the bridge is broken, not the data. Investigate by sampling raw distinct_id values from a recent GP-era Mixpanel session and comparing them byte-for-byte against the three identity candidates for the corresponding Safe. The most common cause is a fourth identity form (a delegate key not yet in AssignRoles, a session-scoped key, etc.) that needs to be added as a new sub-query in int_execution_gpay_safe_identities.
Operational rules¶
These are not optional — they protect the privacy guarantees the bridge depends on.
- Never project a raw EOA into the marts layer. If you need a raw address inside an intermediate model for join purposes, do the pseudonymization in the same model — never push the raw column downstream.
- Never write
sipHash64(some_address)directly. Always usepseudonymize_address. Repo-grep guard: - Salt rotation is permanent damage. Rotating
CEREBRO_PII_SALTinvalidates every pseudonym in the warehouse. We do not rotate. - Audit
block_timestampsemantics on consumers ofint_execution_gpay_wallet_owners. Refer to the breaking semantics callout on the GP protocol page. - If a new identity form is discovered (a fourth bucket), add it as a new sub-query in
int_execution_gpay_safe_identities. Don't change the existing three — they're all known-good and downstreammatched_rolesanalysis depends on the role names being stable.
Related pages¶
- Gnosis Pay protocol overview — the on-chain side: modules, allowances, delegates, the cross-referenced module registry.
- Privacy & Pseudonyms — the keyed-hash pattern, the salt contract, and the rules for new cross-domain work.
- Safe protocol — the foundation models (
int_execution_safes_owner_events, etc.) that this bridge reads from. - Gnosis App heuristic sector — the parallel bridge for a heuristic-derived user list, where Mixpanel is the check rather than the source of truth.