Skip to content

Project Setup

On this page, you will:

  • Create the dbt-transform GitHub repository with README, PR template, and CODEOWNERS
  • Install dbt locally and configure credentials
  • Initialise the project with dbt init and scaffold the directory structure
  • Configure dbt_project.yml with custom schema naming
  • Add required packages to packages.yml
  • Configure sqlfluff and yamllint with pre-commit hooks

Overview

The dbt project lives in its own repository, separate from data-pipelines. This keeps analytics engineering work — model development, schema changes, test coverage — independent from Prefect infrastructure.

GitHub Organisation
├── terraform/               ← Infrastructure as code
├── data-pipelines/          ← Prefect flows
└── dbt-transform/           ← dbt project  ← you are here

Create the Repository

In your GitHub organisation, create a new repository named dbt-transform through the GitHub UI:

  • Name: dbt-transform
  • Visibility: Private
  • Initialise with: README

README

Replace the generated README.md with a developer setup guide. This is the first thing a new team member reads when onboarding to the project:

# dbt-transform

dbt transformation project for the [Company] data platform.

## Prerequisites

- [direnv](https://direnv.net/) installed globally — see the [local environment guide](https://docs.your-company.com/getting-started/local-environment/)
- [uv](https://docs.astral.sh/uv/) installed globally
- Access to Snowflake (SSO via your company identity provider)

## Getting Started

1. Clone the repository:

   ```sh
   git clone git@github.com:your-org/dbt-transform.git
   cd dbt-transform
   ```

2. Install Python dependencies:

   ```sh
   uv sync
   ```

3. Set up your local environment:

   ```sh
   cp .envrc.example .envrc
   # Edit .envrc with your Snowflake credentials and personal schema prefix
   direnv allow
   ```

4. Install dbt packages and verify the connection:

   ```sh
   dbt deps
   dbt debug
   ```

   `dbt debug` should report `All checks passed!`.

5. Install pre-commit hooks:

   ```sh
   pre-commit install
   ```

## Common Commands

```sh
dbt build                              # build all models and run all tests
dbt build --select staging             # build all staging models
dbt build --select my_model+           # build a model and all downstream
dbt build --select +my_model           # build a model and all upstream
dbt test --select my_model             # run tests for a specific model
dbt source freshness                   # check whether sources are up to date
```

## Quality Checks

Run project structure checks and custom rules:

```sh
dbt build --selector ci_quality_checks
```

## Project Structure

Models are organised into four layers:

| Layer | Folder | Prefix | Purpose |
|-------|--------|--------|---------|
| Staging | `models/staging/` | `stg_` | Clean and rename raw source data |
| Intermediate | `models/intermediate/` | `int_` | Business logic joins and transformations |
| Marts | `models/marts/` | `fct_`, `dim_` | Final analytics-ready tables |
| Utilities | `models/utilities/` | — | Date spines, shared lookup tables |

PR Template

Create .github/pull_request_template.md:

## Summary

<!-- What does this PR do? -->

## Changes

-

## Test Plan

- [ ] `dbt build --select state:modified+` runs without errors
- [ ] All new models have YAML documentation
- [ ] New tests pass locally
- [ ] `dbt build --selector ci_quality_checks` passes

## Notes

<!-- Any context reviewers need to know? Breaking changes? Deprecations? -->

CODEOWNERS

Create .github/CODEOWNERS:

# Global owners
*       @your-org/data-platform-admins   @your-org/data-engineers

Branch Protection

Go to SettingsBranchesAdd branch protection rule for main:

Require a pull request before merging - Required approvals: 1 - Require review from Code Owners - Dismiss stale pull request approvals when new commits are pushed

Require status checks to pass before merging - Require branches to be up to date before merging - Add the CI workflow check once CI is configured (see dbt Core Deployment)

Require conversation resolution before merging

Do not allow bypassing the above settings

Merge Settings

Go to SettingsGeneralPull Requests:

  • ❌ Uncheck "Allow merge commits"
  • ❌ Uncheck "Allow rebase merging"
  • ✅ Check "Allow squash merging"
  • ✅ Enable "Always suggest updating pull request branches"
  • ✅ Enable "Automatically delete head branches"

These are the same settings used across all repositories in this project. See GitHub Setup for the rationale.

Set Up dbt

Install

Install dbt and the Snowflake adapter into the project:

uv add dbt-snowflake
uv sync

uv add installs dbt-snowflake (which includes dbt-core) and records it in pyproject.toml. uv sync creates the .venv virtual environment.

Configure direnv

direnv activates the virtual environment and exports Snowflake credentials automatically when you cd into the project. Install it if you have not already — see Local Environment.

Commit .envrc.example to the repository as a template for the team:

# .envrc.example — commit this file
# Copy to .envrc and fill in your values (never commit .envrc)

# Auto-activate the Python virtual environment
source .venv/bin/activate

# Snowflake credentials
export SNOWFLAKE_ACCOUNT="your-account.snowflakecomputing.com"
export SNOWFLAKE_USER="your.name@company.com"

# Your personal development schema prefix
# Schemas will be named e.g. DBT_JBLOGGS_STAGING, DBT_JBLOGGS_MARTS
export DBT_TARGET_SCHEMA="dbt_yourname"

Each developer copies .envrc.example to .envrc and fills in their own values:

cp .envrc.example .envrc
# Edit .envrc with your values
direnv allow

With direnv allow, the virtual environment activates automatically and dbt is available directly — no uv run prefix needed.

Configure profiles.yml

profiles.yml references environment variables for all per-developer values, so it contains no secrets and can be committed to the repository.

Create profiles.yml in the project root:

dbt_transform:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      authenticator: externalbrowser  # SSO login
      role: ANALYTICS_DEVELOPER
      warehouse: DEVELOPER
      database: ANALYTICS_DEV
      schema: "{{ env_var('DBT_TARGET_SCHEMA') }}"
      threads: 4

    prod:
      type: snowflake
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      user: SVC_DBT
      private_key_path: "{{ env_var('SNOWFLAKE_PRIVATE_KEY_PATH') }}"
      private_key_passphrase: "{{ env_var('SNOWFLAKE_PRIVATE_KEY_PASSPHRASE', '') }}"
      role: SVC_DBT
      warehouse: TRANSFORMING
      database: ANALYTICS
      schema: ANALYTICS
      threads: 8

Developer authentication

Developers connect using SSO (externalbrowser) with their personal Snowflake user and the ANALYTICS_DEVELOPER role. The SVC_DBT service account with key-pair authentication is only used in CI/CD and production runs.

Project Structure

Clone the new repository and initialise the dbt project:

git clone git@github.com:your-org/dbt-transform.git
cd dbt-transform

Use dbt init to scaffold the standard project structure:

dbt init . --skip-profile-setup

dbt init creates the standard directory layout and a dbt_project.yml. Remove the example models it generates and create the source-specific directories for this project:

# Remove generated examples
rm -rf models/example

# Create source-specific staging directories
mkdir -p models/staging/dlt
mkdir -p models/staging/snowpipe
mkdir -p models/staging/airbyte

# Create remaining directories
mkdir -p models/intermediate
mkdir -p models/utilities/dbt_project_evaluator
mkdir -p models/marts/core
mkdir -p models/marts/crm
mkdir -p models/marts/reporting
mkdir -p .github/workflows

The final layout:

dbt-transform/
├── .github/
│   ├── CODEOWNERS
│   ├── pull_request_template.md
│   └── workflows/
│       ├── ci.yml          ← slim CI on pull requests
│       └── deploy.yml      ← production run on merge to main
├── analyses/               ← ad-hoc analyses (not materialised)
├── macros/
│   └── generate_schema_name.sql
├── models/
│   ├── staging/
│   │   ├── dlt/
│   │   │   └── sources.yml
│   │   ├── snowpipe/
│   │   │   └── sources.yml
│   │   └── airbyte/
│   │       └── sources.yml
│   ├── intermediate/
│   ├── utilities/
│   └── marts/
│       ├── core/
│       ├── crm/
│       └── reporting/
├── seeds/
├── .envrc.example
├── .gitignore
├── .pre-commit-config.yaml
├── .sqlfluff
├── .sqlfluffignore
├── .yamllint
├── dbt_project.yml
├── packages.yml
├── profiles.yml            ← safe to commit — uses env vars only
├── README.md
└── selectors.yml

Configure .gitignore

Create .gitignore to exclude generated files and credentials:

# dbt generated files
target/
dbt_packages/
logs/

# Local environment variables (never commit — contains personal credentials)
.envrc

# State artifacts (fetched at runtime, not stored in repo)
.artifacts/

# Python
__pycache__/
*.pyc
.venv/

Never commit .envrc

The .envrc file contains your personal Snowflake credentials. Always keep it in .gitignore. Commit .envrc.example instead so other developers have a template. In CI/CD, credentials are injected via GitHub Actions secrets — not a .envrc file.

Configure dbt_project.yml

Replace the dbt_project.yml generated by dbt init:

name: dbt_transform
version: '1.0.0'
config-version: 2

# Profile name — matches the profile in profiles.yml
profile: dbt_transform

# Paths
model-paths: ["models"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
analysis-paths: ["analyses"]
target-path: "target"

clean-targets:
  # These are the folders whose contents are deleted when `dbt clean` is run
  - "target"
  - "dbt_packages"

# Model configurations
models:
  dbt_transform:
    staging:
      +materialized: view
      +schema: staging
      dlt:
        +tags: ["staging", "dlt"]
      snowpipe:
        +tags: ["staging", "snowpipe"]
      airbyte:
        +tags: ["staging", "airbyte"]
    intermediate:
      +materialized: ephemeral
      +schema: intermediate
      +tags: ["intermediate"]
    utilities:
      +materialized: table
      +schema: utilities
      +tags: ["utilities"]
    marts:
      +materialized: table
      +schema: marts
      +tags: ["marts"]
      reporting:
        +schema: reporting
        +tags: ["reporting"]

# Seed configurations
seeds:
  dbt_transform:
    +schema: seeds

# Snapshot configurations
snapshots:
  dbt_transform:
    +schema: snapshots
    +strategy: timestamp
    +updated_at: updated_at

Add the Custom Schema Macro

By default, dbt appends a custom schema to the target schema. This project overrides that behaviour: in production, only the custom schema name is used, giving clean schema names. In all other environments (dev, CI), the target schema is prepended so schemas are clearly scoped to each developer or PR.

Create macros/generate_schema_name.sql:

{% macro generate_schema_name(custom_schema_name, node) -%}

    {%- if custom_schema_name is none -%}

        {{ target.schema | trim }}

    {%- elif target.name == 'prod' -%}

        {{ custom_schema_name | trim | upper }}

    {%- else -%}

        {{ target.schema | trim }}_{{ custom_schema_name | trim | upper }}

    {%- endif -%}

{%- endmacro %}

With this macro:

Environment Target Name Database Target Schema Model Config Resulting Schema
prod prod ANALYTICS ANALYTICS +schema: staging ANALYTICS.STAGING
prod prod ANALYTICS ANALYTICS +schema: reporting ANALYTICS.REPORTING
dev dev ANALYTICS_DEV dbt_jbloggs +schema: staging ANALYTICS_DEV.DBT_JBLOGGS_STAGING
dev dev ANALYTICS_DEV dbt_jbloggs +schema: reporting ANALYTICS_DEV.DBT_JBLOGGS_REPORTING
CI ci ANALYTICS_DEV CI_PR_123 +schema: staging ANALYTICS_DEV.CI_PR_123_STAGING

REPORTING schema in dev

In production, dbt writes to ANALYTICS.REPORTING — the Terraform-managed schema with BI tool access. In development and CI, dbt writes to scoped schemas such as ANALYTICS_DEV.DBT_JBLOGGS_REPORTING or ANALYTICS_DEV.CI_PR_123_REPORTING. These are created dynamically by dbt and do not affect the Terraform-managed production schema.

Configure packages.yml

Create packages.yml at the repository root:

packages:
  - package: dbt-labs/dbt_utils
    version: [">=1.3.0", "<2.0.0"]

  - package: metaplane/dbt_expectations
    version: [">=0.10.0", "<1.0.0"]

  - package: dbt-labs/dbt_project_evaluator
    version: [">=1.2.0", "<2.0.0"]

Install packages:

dbt deps

This creates a dbt_packages/ directory (excluded by .gitignore). Always run dbt deps after cloning the repo or updating packages.yml.

Package Overview

dbt-utils provides utility macros used throughout the project:

-- Generate a surrogate key from multiple columns
{{ dbt_utils.generate_surrogate_key(['date', 'currency']) }}

-- Date spine for building a date dimension
{{ dbt_utils.date_spine(
    datepart="day",
    start_date="cast('2020-01-01' as date)",
    end_date="current_date"
) }}

-- Union all tables matching a pattern
{{ dbt_utils.union_relations(
    relations=[ref('stg_dlt__exchange_rates'), ref('stg_snowpipe__exchange_rates')]
) }}

dbt_expectations adds column-level tests beyond the four built-in generic tests:

columns:
  - name: rate
    tests:
      - dbt_expectations.expect_column_values_to_be_between:
          min_value: 0
          strictly: true
      - dbt_expectations.expect_column_values_to_not_be_null

dbt_project_evaluator enforces project structure rules in CI. Full configuration is covered in Testing and Documentation.

Linting and Formatting

Two tools enforce code quality across SQL and YAML files:

  • sqlfluff — lints and auto-fixes SQL, using the dbt templater so it understands {{ ref() }} and {{ source() }} expressions
  • yamllint — lints YAML files including dbt_project.yml, packages.yml, and schema files

Both are run automatically on commit via pre-commit. pre-commit itself is already installed globally — see Local Environment. This section adds sqlfluff and yamllint as project-level dev dependencies.

Install Dev Tools

uv add --dev sqlfluff yamllint

These are added as dev dependencies in pyproject.toml and are available after uv sync.

Configure sqlfluff

Create .sqlfluff in the project root:

[sqlfluff]
templater = dbt
dialect = snowflake

[sqlfluff:templater:dbt]
project_dir = .

[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = upper

[sqlfluff:rules:capitalisation.identifiers]
capitalisation_policy = upper

[sqlfluff:rules:capitalisation.functions]
capitalisation_policy = upper

[sqlfluff:rules:capitalisation.literals]
capitalisation_policy = upper

Create .sqlfluffignore to exclude generated directories:

dbt_packages/
target/

Configure yamllint

Create .yamllint in the project root:

extends: default
rules:
  line-length:
    max: 120
    level: warning
  truthy:
    allowed-values: ['true', 'false']

Configure pre-commit

Create .pre-commit-config.yaml in the project root:

repos:
  - repo: https://github.com/sqlfluff/sqlfluff
    rev: 3.4.0
    hooks:
      - id: sqlfluff-lint
      - id: sqlfluff-fix
        args: [--force]

  - repo: https://github.com/adrienverge/yamllint
    rev: v1.35.1
    hooks:
      - id: yamllint

Keep hook versions up to date

Run pre-commit autoupdate periodically to bump the rev values to the latest releases.

Install the hooks into your local git repository:

pre-commit install

From this point on, sqlfluff and yamllint run automatically on every git commit. Files that fail linting are flagged before the commit proceeds; sqlfluff will also attempt to auto-fix fixable issues.

Run Manually

To lint all files at any time without committing:

pre-commit run --all-files

To lint or fix SQL files directly:

# Lint all models
sqlfluff lint models/

# Fix all models
sqlfluff fix models/

Verify the Setup

With dbt installed and credentials configured, verify the project:

dbt deps    # install packages
dbt debug   # check connection and project configuration

Expected output from dbt debug:

dbt version: 1.8.x
python version: 3.11.x
...
Connection test: [OK connection ok]

All checks passed!

Commit the Initial Structure

git add .
git commit -m "Initial dbt project structure

- README with developer setup guide
- .github/pull_request_template.md
- .github/CODEOWNERS
- dbt_project.yml with custom schema naming
- packages.yml (dbt-utils, dbt_expectations, dbt_project_evaluator)
- profiles.yml using env vars (safe to commit — no secrets)
- .envrc.example template for local developer setup
- Custom generate_schema_name macro
- .sqlfluff, .sqlfluffignore, .yamllint, .pre-commit-config.yaml
- .gitignore excluding .envrc and generated files"
git push

Summary

You've set up the dbt-transform repository:

  • Created the repository with README, PR template, CODEOWNERS, and branch protection
  • Installed dbt locally and configured credentials via direnv and profiles.yml
  • Initialised the project with dbt init
  • Configured dbt_project.yml with custom schema naming
  • Added packages: dbt-utils, dbt_expectations, dbt_project_evaluator
  • Added the custom schema macro for clean schema names
  • Configured sqlfluff and yamllint with pre-commit hooks

What's Next

Create the Snowflake service account that dbt will use for production runs.

Continue to Snowflake Infrastructure