Data Transformation
On this page, you will:
- Understand where dbt fits in the modern data stack
- Learn how the transformation layer connects raw data to analytics
- Plan the infrastructure and repository structure needed
Overview
This section covers building the transformation layer using dbt (data build tool). dbt transforms the raw data loaded by Kafka Connect, dlt, Snowpipe, and Airbyte into clean, tested, analytics-ready models.
The transformation layer lives in its own repository - separate from your Prefect pipelines. This separation reflects the different ownership and release cadence of analytics engineering work (iterative model development, schema changes) versus data engineering work (pipeline reliability, infrastructure).
┌────────────────────────────────────────────────────────────────────────────────┐
│ DATA TRANSFORMATION LAYER │
├────────────────────────────────────────────────────────────────────────────────┤
│ │
│ Raw Data (Snowflake) dbt-transform repo Analytics │
│ ──────────────────── ────────────────── ───────── │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ STREAMING database │ Staging models │ │
│ │ • purchases │─────────▶│ stg_streaming__* │ │
│ └──────────────────┘ │ stg_dlt__* │ ┌────────────────┐ │
│ │ stg_airbyte__* │ │ ANALYTICS │ │
│ ┌──────────────────┐ └──────────────────┘ │ database │ │
│ │ DLT database │ │ │ │ │
│ │ • currencies │─────────▶ ▼ │ STAGING │ │
│ │ • exchange_rates │ ┌──────────────────┐ │ INTERMEDIATE │ │
│ │ • products │ │ Intermediate │────────▶│ MARTS │ │
│ └──────────────────┘ │ int_* │ │ REPORTING │ │
│ └──────────────────┘ └────────────────┘ │
│ ┌──────────────────┐ │ │
│ │ SNOWPIPE database│─────────▶ ▼ │
│ │ • exchange_rates │ ┌──────────────────┐ │
│ └──────────────────┘ │ Mart models │ │
│ │ fct_*, dim_* │ │
│ ┌──────────────────┐ └──────────────────┘ │
│ │ AIRBYTE database │─────────▶ │
│ │ • hubspot │ │
│ └──────────────────┘ │
│ │
│ Orchestration │
│ ───────────── │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Prefect (data-pipelines repo) │ │
│ │ • Waits for ingestion flows to complete │ │
│ │ • Triggers dbt run via dbt Core CLI or dbt Cloud API │ │
│ │ • Alerts on failure │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────────────┘
The Transformation Repository
The dbt project lives in a dedicated dbt-transform repository. This is separate from the data-pipelines repository that contains Prefect flows. Prefect can trigger dbt runs, but it does not own the dbt codebase.
GitHub Organisation
├── terraform/ ← Infrastructure as code
├── data-pipelines/ ← Prefect flows (ingestion orchestration)
└── dbt-transform/ ← dbt project (this section)
This separation means:
- Analytics engineers can develop models, run tests, and deploy changes without touching Prefect code
- Data engineers can update pipeline infrastructure without affecting model logic
- CI/CD is independent: dbt's slim CI runs on dbt code changes; Prefect deployments run on flow changes
Schema Model
The transformation layer writes to four schemas in the ANALYTICS database (and their ANALYTICS_DEV counterparts for development):
| Schema | Created by | Contains | Access |
|---|---|---|---|
STAGING |
dbt | One-to-one views/tables over raw sources | ANALYTICS_DEVELOPER |
INTERMEDIATE |
dbt | Business logic, joined datasets | ANALYTICS_DEVELOPER |
MARTS |
dbt | Final analytics tables (fct/dim) | ANALYTICS_DEVELOPER |
REPORTING |
Terraform (already exists) | Curated subset for BI tools | ANALYTICS_REPORTER |
The STAGING, INTERMEDIATE, and MARTS schemas are created automatically by dbt when models run. Terraform's database-level future grants on ANALYTICS already cover permissions for these schemas - no additional Terraform configuration is needed.
The REPORTING schema was created in the Schemas page and has schema-level grants that restrict ANALYTICS_REPORTER to just this schema. dbt publishes final BI-ready models here.
What You Will Build
By the end of this section:
dbt-transform repository
├── models/
│ ├── staging/
│ │ ├── streaming/
│ │ │ └── stg_streaming__purchases.sql
│ │ ├── dlt/
│ │ │ ├── stg_dlt__currencies.sql
│ │ │ ├── stg_dlt__exchange_rates.sql
│ │ │ └── stg_dlt__products.sql
│ │ └── airbyte/
│ │ └── stg_airbyte__contacts.sql
│ ├── intermediate/
│ │ └── int_exchange_rates__unioned.sql
│ └── marts/
│ ├── core/
│ │ ├── fct_purchases.sql
│ │ ├── fct_exchange_rates.sql
│ │ └── dim_products.sql
│ ├── crm/
│ │ └── dim_customers.sql
│ └── sales/
│ └── sales.sql
├── tests/
├── macros/
├── seeds/
├── dbt_project.yml
└── packages.yml
Snowflake infrastructure:
Snowflake
├── SVC_DBT service account
│ └── SVC_DBT dedicated role
│ └── Granted ANALYTICS_TRANSFORMER
│
└── ANALYTICS database (existing)
├── STAGING schema ← created by dbt
├── INTERMEDIATE schema ← created by dbt
├── MARTS schema ← created by dbt
└── REPORTING schema ← already in Terraform
Deployment Options
You have two options for running dbt:
| dbt Core | dbt Cloud | |
|---|---|---|
| Cost | Free (open source) | Free (1 developer seat), $100/seat/month (Team) |
| IDE | Your own editor + dbt CLI | Browser-based IDE |
| Scheduling | Via Prefect | Via dbt Cloud jobs or Prefect |
| Docs site | Self-hosted (S3 + CloudFront) | Hosted by dbt Cloud |
| Deferral | --defer --state with S3 artifacts |
Native in dbt Cloud |
| Semantic layer | Not available | Team/Enterprise only |
| Best for | Code-first teams with existing Prefect | Teams wanting a managed analytics platform |
Both options are covered in full. Start with dbt Core vs dbt Cloud to choose your approach.
Section Contents
| Page | What You Will Do |
|---|---|
| dbt Concepts | Learn models, sources, tests, macros, and materialisations |
| dbt Core vs dbt Cloud | Compare options and choose your deployment |
| Project Setup | Create the dbt-transform repo and configure packages |
| Snowflake Infrastructure | Create SVC_DBT service account via Terraform |
| Sources and Staging | Define sources and build staging models |
| Intermediate Models | Build business logic layer |
| Mart Models | Build final analytics tables |
| Testing and Documentation | Add tests, descriptions, and enforce project standards |
| dbt Core Deployment | CI/CD, state deferral, and hosted docs site |
| dbt Cloud Setup | Environments, jobs, IDE, and native deferral |
| Prefect Orchestration | Trigger dbt from Prefect after ingestion completes |
| Finishing Up | Verification, architecture summary, next steps |
Prerequisites
Before starting this section, ensure you have completed:
- Data Warehouse - ANALYTICS and ANALYTICS_DEV databases, ANALYTICS_TRANSFORMER role, TRANSFORMING warehouse, REPORTING schema
- Batch Data Ingestion - DLT and SNOWPIPE databases with raw data
- SaaS Ingestion - AIRBYTE database with HubSpot data
- Streaming Data Ingestion - STREAMING database with purchase events
- Orchestration - Prefect for triggering dbt runs
Cost Summary
| Approach | Monthly Cost | Notes |
|---|---|---|
| dbt Core | $0 | Open source; S3 for state artifacts (~$0.01/month) |
| dbt Cloud (Developer) | $0 | 1 seat free, limited to one project |
| dbt Cloud (Team) | $100/seat/month | Multi-seat, semantic layer, SSO |
Detailed cost breakdowns are in dbt Core vs dbt Cloud and the Costs page.
Get Started
Continue to dbt Concepts →