Connecting Snowflake to Your Growth Analytics

At some point in a SaaS company's growth, the MRR spreadsheet breaks. It doesn't break dramatically — no single error message, no obvious failure. It breaks gradually: more tabs added for each new cohort analysis someone requested, a second owner who makes changes without telling the first, a monthly ritual where two analysts spend a day reconciling numbers that should match but don't. By the time the CFO notices the MRR waterfall is stale by the time it reaches them, the spreadsheet has 47 tabs and a half-dozen formulas referencing cells that were deleted in Q2.

This piece is about the migration path: getting growth analytics off spreadsheets and into a Snowflake-native workflow that is live, auditable, and doesn't require two analyst-days of preparation before every board meeting. Not a general data warehouse guide — specifically the cohort retention, MRR waterfall, and channel attribution use cases that growth and finance teams actually need to run.

Why Snowflake is the right destination (and why it's not automatic)

If your SaaS company is past $10M ARR and has a data team, your billing and customer data is almost certainly already in Snowflake — or queued for migration. Stripe subscription events, Salesforce opportunity records, Segment behavioral events, and internal subscription state tables flow into your warehouse through managed connectors (Fivetran, Airbyte) and dbt transformation models. The raw data you need for cohort analysis is there.

What's not automatic is the query structure. Raw Stripe data in Snowflake tells you when every invoice was created, what the line items were, and what the customer status was at that moment. It does not tell you, out of the box, each customer's MRR retention at M6 by acquisition cohort. That calculation requires a series of window function queries that most growth teams don't know how to write and that most data teams haven't prioritized building because there are always more urgent requests in the backlog.

The spreadsheet persists not because analysts prefer it, but because it's faster to extend a spreadsheet than to request a new dbt model. The migration to Snowflake-native growth analytics succeeds when the warehouse version is faster to use for ad hoc questions — not just more accurate, but faster. That's the design constraint the migration has to solve.

What cohort analysis requires from your data model

Before migrating, map what you actually need. Cohort retention analysis requires a customer-level fact table with, at minimum:

A stable customer or account identifier that is consistent across billing (Stripe/Chargebee), CRM (HubSpot/Salesforce), and product events
An acquisition date — typically first paid charge date or first active subscription start, depending on whether you have a trial period
Monthly MRR by account, derived from subscription line items and ideally decomposed into base plan + expansion components
Subscription state changes with timestamps: cancellations, upgrades, downgrades, pauses, reactivations
Acquisition channel attribution at the customer level (pulled from CRM opportunity source or marketing attribution tables)

Most Snowflake environments at growing SaaS companies have all of this data — but scattered across multiple tables with different granularities and different entity definitions. The billing system thinks in terms of subscriptions. The CRM thinks in terms of accounts. The product analytics system thinks in terms of users. These three entity types often don't share primary keys, which means every cohort query requires a join that someone has to maintain.

The first real migration task is building a unified customer-level table that resolves these entity definitions into a single canonical record per customer, with acquisition date, plan tier, channel attribution, and MRR time series. Once that table exists, 80% of your cohort queries become simple aggregations on top of it.

The dbt model problem and how to navigate it

Teams using dbt to transform Snowflake data typically maintain a set of clean analytical models: a customer dimension, a subscription fact, an MRR waterfall. These models encode specific business logic decisions — how to treat churned customers who reactivated, how to handle mid-month plan changes, whether trials count as MRR. They represent months of iterative refinement and are the canonical source of truth for finance reporting.

The problem is that cohort analysis frequently needs to cut across the grain of these models in ways that weren't anticipated when they were built. "What is the 12-month retention rate of customers who used Feature X before their M3 mark?" requires joining the subscription MRR model to a product event table at the customer level. If there isn't an existing dbt model that joins those tables with the right grain, answering the question means a new transform — which means a new JIRA ticket, a sprint planning discussion, and a deployment cycle. Typically 1–3 weeks. The growth team runs the analysis in a spreadsheet instead.

Two approaches to this problem. The first is to invest in building a comprehensive cohort mart in dbt — a set of models specifically designed for cohort retention queries, covering the dimensions your growth team most commonly needs (by channel, by plan, by acquisition period, by product behavior). This is a real project but a worthwhile one: a well-designed cohort mart in dbt is the foundation for all downstream analysis and reporting. The limitation is that it doesn't handle fully ad hoc questions that weren't anticipated in the model design.

The second approach is a mapping layer in a growth analytics tool that reads your existing Snowflake models and generates cohort queries at runtime from a column mapping configuration. You tell the tool which columns in your existing tables correspond to the concepts it needs — customer ID, subscription date, MRR amount, status field — and it handles the cohort query logic without requiring you to write new dbt models. This approach is faster to get live but has a ceiling: it can only ask questions your existing tables can answer, so behavioral cohorts that require product event joins are still a dbt project.

We're not saying one approach is always better than the other. For teams where the data team has capacity to build out a cohort mart, dbt-first is the right long-term investment. For teams where the data team backlog is full and the growth team needs answers this quarter, the mapping layer approach is the faster path to the first analysis.

Migration sequencing: what to move first

The practical migration sequence that produces value fastest: move the MRR waterfall first, then add cohort retention curves, then layer in channel attribution.

The MRR waterfall (new + expansion + contraction + churn + resurrection = net new MRR) is the highest-value migration because it eliminates the most manual work. A live MRR waterfall connected to Stripe billing data replaces the monthly reconciliation process immediately. It requires one core Snowflake model: a monthly MRR movement table that classifies each customer's MRR change as new, expansion, contraction, churn, or resurrection, derived from subscription event history. Building this model from Stripe data in Snowflake typically takes a data engineer 2–4 days, or can be sourced from dbt packages in the community that implement the standard SaaS metrics logic.

Cohort retention curves layer on top of the MRR movement table: for each acquisition month cohort, compute the MRR retained at M1, M3, M6, M9, M12 as a percentage of that cohort's original MRR. The query logic is a window function over the monthly MRR movement table, grouped by acquisition month. Once the underlying table is live, the cohort retention query is straightforward SQL that most analytics engineers can write in a few hours.

Channel attribution requires a third data source: the CRM acquisition channel field joined to the customer-level table on a shared identifier. If your Salesforce or HubSpot lead source is flowing into Snowflake (which it should be if you have an active Fivetran or Airbyte connection), this join is a matter of configuring the key mapping. The result is cohort retention curves segmented by channel — the analysis that reveals whether your paid search cohorts and your organic content cohorts are showing materially different 12-month retention, which is the input to channel budget allocation decisions.

The queries that Snowflake handles well (and the ones it doesn't)

Snowflake's columnar storage architecture is well-suited for cohort retention calculations at scale. Window functions, lateral joins, and set-based aggregations all perform efficiently on large billing fact tables. A batch cohort retention calculation — "for every acquisition month cohort in the last 24 months, compute MRR retention at M1, M3, M6, M12" — runs in seconds on most production Snowflake environments, regardless of customer base size.

The queries that perform poorly on Snowflake are iterative row-by-row calculations: attribution models that need to traverse a sequence of touchpoint events in chronological order for each customer, or time-series reconstruction that builds a complete monthly state table from raw event logs. These are better handled in a dbt transform layer before being queried. Multi-touch attribution in particular — which requires scoring each touchpoint in a customer's pre-conversion event sequence — is best computed as a derived table in dbt and read from Snowflake, rather than computed in Snowflake at query time.

Understanding which computations belong in the warehouse and which belong in the transform layer helps scope the migration correctly. Retention curves, MRR waterfall decomposition, quick ratio by cohort, and channel-level GRR calculations are natural Snowflake queries that scale cleanly. Attribution scoring, event path analysis, and time-to-expansion sequencing belong in dbt models that feed the warehouse. When the migration tries to push attribution scoring into Snowflake at query time, the query performance degrades and the team retreats back to spreadsheets. Build the layers correctly and the performance argument for Snowflake-native analytics becomes obvious.

What success looks like after migration

The signal that the migration has succeeded is not the architecture. It's a change in behavior: the growth team asks a cohort question on a Tuesday afternoon and has an answer by Tuesday afternoon, without filing a ticket or opening a spreadsheet. The finance team runs the monthly MRR waterfall on the first business day after month-end, not the fifth. The CFO reviews the board package knowing that the MRR figures are derived from the same Snowflake tables that the data team maintains, not a spreadsheet that someone exported two weeks ago and hasn't been touched since.

The 47-tab spreadsheet doesn't disappear overnight. The migration is a replacement process, not a switchover. But once the first cohort retention dashboard is live and self-updating, the case for maintaining the parallel spreadsheet collapses quickly. Analysts stop updating the spreadsheet when the Snowflake version is more accurate and requires no maintenance. That's the practical test of whether the migration is complete.

Moving Your Growth Analytics Off Spreadsheets Into Snowflake