Skip to content

Data Quality with dbt

On this page, you will:

  • Master dbt's four generic tests (unique, not_null, accepted_values, relationships)
  • Install and use dbt_expectations for statistical and distribution tests
  • Write custom singular tests for complex business logic
  • Configure test severity levels (warn vs error)
  • Add where clauses and incremental model testing strategies
  • Integrate tests into CI/CD to fail builds on test failures

Overview

dbt tests are the foundation of data quality in the modern data stack. They run SQL queries that return failing rows — if any rows are returned, the test fails. Tests run after every dbt build or dbt test command, providing immediate feedback on data quality.

This page builds on the basic dbt tests covered in Data Transformation, adding advanced patterns and packages.

┌─────────────────────────────────────────────────────────────────────────┐
│                       dbt TESTING HIERARCHY                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Generic Tests              dbt_expectations           Singular Tests   │
│  ──────────────             ─────────────────          ───────────────  │
│  (Built-in)                 (Package)                  (Custom SQL)     │
│                                                                         │
│  • unique                   • expect_column_values_    • Complex        │
│  • not_null                   to_be_between              business      │
│  • accepted_values          • expect_column_values_      logic         │
│  • relationships              to_match_regex           • Multi-table   │
│                             • expect_table_row_          validation    │
│                               count_to_be_between      • Custom        │
│                             • expect_column_stddev_      aggregations  │
│                               to_be_between                            │
│                                                                         │
│  All tests run via: dbt test                                           │
│  Failed tests → Alert (SEV2) or Warn (SEV3) depending on config        │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Recap: The Four Generic Tests

dbt ships with four built-in generic tests. These cover 80% of common data quality checks.

1. unique

Ensures: Column values are unique (no duplicates).

Use case: Primary keys, natural keys, unique identifiers.

# models/marts/core/fct_exchange_rates.yml
columns:
  - name: exchange_rate_id
    tests:
      - unique

Generated SQL:

SELECT exchange_rate_id, COUNT(*) AS count
FROM analytics.marts.fct_exchange_rates
GROUP BY exchange_rate_id
HAVING COUNT(*) > 1;

If any rows are returned (duplicates exist), the test fails.

2. not_null

Ensures: Column has no NULL values.

Use case: Required fields, foreign keys, critical business data.

columns:
  - name: customer_id
    tests:
      - not_null

Generated SQL:

SELECT *
FROM analytics.marts.fct_orders
WHERE customer_id IS NULL;

3. accepted_values

Ensures: Column values are in a predefined list.

Use case: Enums, status fields, categorical data.

columns:
  - name: order_status
    tests:
      - accepted_values:
          values: ['pending', 'shipped', 'delivered', 'cancelled']

Generated SQL:

SELECT *
FROM analytics.marts.fct_orders
WHERE order_status NOT IN ('pending', 'shipped', 'delivered', 'cancelled');

4. relationships

Ensures: Foreign key values exist in the referenced table.

Use case: Referential integrity, joins.

columns:
  - name: customer_id
    tests:
      - relationships:
          to: ref('dim_customers')
          field: customer_id

Generated SQL:

SELECT *
FROM analytics.marts.fct_orders
WHERE customer_id NOT IN (
    SELECT customer_id FROM analytics.marts.dim_customers
);

dbt_expectations Package

dbt_expectations is a community package that adds 50+ tests inspired by Great Expectations. These tests cover statistical validation, distribution checks, and regex patterns.

Installation

Add to packages.yml:

# packages.yml
packages:
  - package: calogica/dbt_expectations
    version: 0.10.3

Install:

dbt deps

Statistical Tests

expect_column_values_to_be_between

Ensures: Numeric values fall within a range.

# models/marts/core/fct_exchange_rates.yml
columns:
  - name: exchange_rate
    tests:
      - dbt_expectations.expect_column_values_to_be_between:
          min_value: 0.01
          max_value: 10.0
          row_condition: "base_currency = 'GBP'"  # Optional filter

Use case: Validate exchange rates are within realistic bounds.

expect_column_mean_to_be_between

Ensures: Column mean is within expected range.

columns:
  - name: order_total
    tests:
      - dbt_expectations.expect_column_mean_to_be_between:
          min_value: 50
          max_value: 200
          group_by: [order_date]  # Calculate mean per day

Use case: Detect anomalies in average order value.

expect_column_stdev_to_be_between

Ensures: Standard deviation is within expected range (detects volatility changes).

columns:
  - name: exchange_rate
    tests:
      - dbt_expectations.expect_column_stdev_to_be_between:
          min_value: 0.001
          max_value: 0.1
          group_by: [target_currency]

Use case: Alert if exchange rate volatility suddenly increases.

Row Count Tests

expect_table_row_count_to_be_between

Ensures: Table has expected number of rows.

# models/marts/core/fct_exchange_rates.yml
models:
  - name: fct_exchange_rates
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1000  # At least 1000 rows
          max_value: 1000000  # No more than 1M rows

Use case: Detect data pipeline failures (too few rows) or duplication (too many rows).

expect_table_row_count_to_equal_other_table

Ensures: Two tables have the same row count.

models:
  - name: fct_exchange_rates
    tests:
      - dbt_expectations.expect_table_row_count_to_equal_other_table:
          compare_model: source('raw', 'exchange_rates')

Use case: Validate no rows were lost in transformation.

Format Validation

expect_column_values_to_match_regex

Ensures: Values match a regex pattern.

columns:
  - name: email
    tests:
      - dbt_expectations.expect_column_values_to_match_regex:
          regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'

Use case: Validate email format, phone numbers, postcodes.

expect_column_values_to_match_like_pattern

Ensures: Values match a SQL LIKE pattern.

columns:
  - name: product_id
    tests:
      - dbt_expectations.expect_column_values_to_match_like_pattern:
          like_pattern: 'PROD-%'

Use case: Validate IDs follow naming conventions.

Distribution Tests

expect_column_quantile_values_to_be_between

Ensures: Percentiles fall within expected ranges.

columns:
  - name: order_total
    tests:
      - dbt_expectations.expect_column_quantile_values_to_be_between:
          quantile: 0.95  # 95th percentile
          min_value: 100
          max_value: 500

Use case: Detect outliers in order totals.

Custom Singular Tests

For complex business logic that can't be expressed with generic tests, write custom SQL tests.

Example: Revenue Reconciliation

Business rule: Total revenue in fct_orders must match sum of line items in fct_order_lines.

-- tests/revenue_reconciliation.sql
WITH order_totals AS (
    SELECT
        order_id,
        order_total
    FROM {{ ref('fct_orders') }}
),
line_item_totals AS (
    SELECT
        order_id,
        SUM(quantity * unit_price) AS line_item_total
    FROM {{ ref('fct_order_lines') }}
    GROUP BY order_id
)
SELECT
    o.order_id,
    o.order_total,
    l.line_item_total,
    ABS(o.order_total - l.line_item_total) AS difference
FROM order_totals o
INNER JOIN line_item_totals l
    ON o.order_id = l.order_id
WHERE ABS(o.order_total - l.line_item_total) > 0.01  -- Allow 1 cent rounding
;

If any rows are returned (mismatches exist), the test fails.

Example: Temporal Consistency

Business rule: created_at must always be before updated_at.

-- tests/created_before_updated.sql
SELECT *
FROM {{ ref('dim_customers') }}
WHERE created_at > updated_at
;

Example: Cross-Table Consistency

Business rule: Every order must have at least one line item.

-- tests/orders_have_line_items.sql
SELECT o.order_id
FROM {{ ref('fct_orders') }} o
LEFT JOIN {{ ref('fct_order_lines') }} l
    ON o.order_id = l.order_id
WHERE l.order_id IS NULL
;

Test Severity: Warn vs Error

Not all test failures require immediate action. Use severity to control behavior.

Error (Default)

Behavior: Test failure causes dbt build or dbt test to exit with error code 1.

Use case: Critical data quality checks (primary keys, not_null on critical fields).

columns:
  - name: customer_id
    tests:
      - unique:
          config:
            severity: error  # Default; can be omitted

CI/CD impact: Pull request fails if test fails.

Warn

Behavior: Test failure logs a warning but doesn't fail the build.

Use case: Non-critical checks, exploratory tests, tests in development.

columns:
  - name: product_description
    tests:
      - not_null:
          config:
            severity: warn  # Build succeeds even if test fails

CI/CD impact: Pull request passes, but warning is visible in logs.

When to Use Each

Severity Use When Example
error Test failure blocks business customer_id is unique (duplicate customers break billing)
error Referential integrity Foreign keys exist (broken joins = wrong reports)
warn Nice-to-have quality Product descriptions present (missing OK, but not ideal)
warn Test in development New test; not confident in thresholds yet
warn Statistical anomalies Row count 10% lower than average (investigate, but don't block)

Where Clauses and Conditional Testing

Apply tests to a subset of rows using where or row_condition.

where Clause (dbt Native)

Filter rows before testing:

columns:
  - name: customer_id
    tests:
      - not_null:
          where: "order_status != 'cancelled'"

Generated SQL:

SELECT *
FROM analytics.marts.fct_orders
WHERE order_status != 'cancelled'  -- Applied first
    AND customer_id IS NULL;        -- Then test

Use case: "customer_id must be not null for non-cancelled orders."

row_condition (dbt_expectations)

Same as where, but for dbt_expectations tests:

columns:
  - name: order_total
    tests:
      - dbt_expectations.expect_column_values_to_be_between:
          min_value: 0
          max_value: 10000
          row_condition: "order_status IN ('shipped', 'delivered')"

Use case: "order_total must be between 0 and 10,000 for completed orders."

Testing Incremental Models

Incremental models only process new rows (is_incremental()). Test strategies differ.

Strategy 1: Test Full Table

Approach: Test the entire table, not just new rows.

# Default behavior — tests run against full table
columns:
  - name: exchange_rate_id
    tests:
      - unique  # Checks all rows, including historical

Pros: Catches issues in historical data. Cons: Slow for large tables (millions of rows).

Strategy 2: Test Recent Data Only

Approach: Use where to test only recent rows.

columns:
  - name: exchange_rate_id
    tests:
      - unique:
          where: "loaded_at >= CURRENT_DATE() - INTERVAL '7 days'"

Pros: Fast (tests only last 7 days). Cons: Misses issues in older data.

Strategy 3: Test on dbt build --full-refresh

Approach: Run full table tests only on full refresh (weekly).

columns:
  - name: exchange_rate_id
    tests:
      - unique:
          config:
            enabled: "{{ target.name == 'prod' and not is_incremental() }}"

Pros: Fast incremental runs, thorough weekly validation. Cons: Complex configuration.

Recommended: Strategy 2 (test recent data) for daily runs + Strategy 3 (full test) for weekly full refresh.

CI/CD Integration

Fail builds on test failures to prevent bad data from reaching production.

GitHub Actions Workflow

# .github/workflows/dbt_ci.yml (in dbt-transform repo)
name: dbt CI

on:
  pull_request:
    branches:
      - main

jobs:
  dbt-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dbt
        run: uv add dbt-snowflake dbt-expectations

      - name: Add venv to PATH
        run: echo "$GITHUB_WORKSPACE/.venv/bin" >> $GITHUB_PATH

      - name: Install dbt packages
        run: dbt deps

      - name: Run dbt tests
        run: |
          dbt test --select state:modified+ \
            --defer --state .artifacts/ \
            --fail-fast
        env:
          DBT_SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
          DBT_SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }}
          DBT_SNOWFLAKE_PRIVATE_KEY: ${{ secrets.SNOWFLAKE_PRIVATE_KEY }}

Behavior: - Pull request triggers CI - dbt runs tests on modified models only (state:modified+) - If any test fails (severity: error), CI fails - Pull request cannot merge until tests pass

Test Selectors

Run specific subsets of tests:

# Run all tests
dbt test

# Run tests on one model
dbt test --select fct_exchange_rates

# Run tests on modified models + downstream
dbt test --select state:modified+

# Run only dbt_expectations tests
dbt test --select test_type:generic,package:dbt_expectations

# Run only error-severity tests
dbt test --select config.severity:error

Best Practices

1. Test Every Model

Rule: Every model should have at least one test (typically on the primary key).

# Minimum viable testing
models:
  - name: fct_exchange_rates
    columns:
      - name: exchange_rate_id
        tests:
          - unique
          - not_null

2. Test Critical Columns

Focus on: - Primary keys (unique, not_null) - Foreign keys (relationships) - Required business fields (not_null) - Enums (accepted_values)

3. Use Severity Appropriately

error: Breaks business logic, blocks dashboards warn: Nice-to-have, exploratory

4. Document Test Logic

Explain non-obvious tests:

columns:
  - name: order_total
    tests:
      - dbt_expectations.expect_column_values_to_be_between:
          min_value: 0
          max_value: 1000000
          config:
            meta:
              description: "Order totals capped at $1M to catch data entry errors"

5. Monitor Test Results Over Time

Use Elementary (next page) to track: - Test pass rate trends - Which tests fail most frequently - When tests started failing (correlate with code changes)

Example: Complete fct_exchange_rates Testing

# models/marts/core/fct_exchange_rates.yml
version: 2

models:
  - name: fct_exchange_rates
    description: Daily exchange rates for GBP to various currencies

    # Table-level tests
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1000
          max_value: 1000000
          config:
            severity: warn

    columns:
      - name: exchange_rate_id
        description: Surrogate key
        tests:
          - unique:
              config:
                severity: error
          - not_null:
              config:
                severity: error

      - name: rate_date
        description: Date of the exchange rate
        tests:
          - not_null
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: "'2020-01-01'"
              max_value: "CURRENT_DATE() + INTERVAL '1 day'"
              config:
                severity: error

      - name: base_currency
        tests:
          - accepted_values:
              values: ['GBP']

      - name: target_currency
        tests:
          - not_null
          - dbt_expectations.expect_column_values_to_be_in_set:
              value_set: ['USD', 'EUR', 'JPY', 'AUD', 'CAD', 'CHF', 'NZD', 'SEK']
              config:
                severity: warn  # New currencies added occasionally

      - name: exchange_rate
        tests:
          - not_null
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0.01
              max_value: 1000
              config:
                severity: error
          - dbt_expectations.expect_column_mean_to_be_between:
              min_value: 0.5
              max_value: 2.0
              group_by: [target_currency]
              config:
                severity: warn

Summary

You've mastered dbt data quality testing:

  • Generic tests — unique, not_null, accepted_values, relationships
  • dbt_expectations — Statistical validation, distribution checks, regex patterns
  • Custom singular tests — Complex business logic in SQL
  • Severity levels — error (blocks builds) vs warn (logs only)
  • Where clauses — Test subsets of rows
  • Incremental model testing — Test recent data or full table on refresh
  • CI/CD integration — Fail builds on test failures

dbt tests provide the foundation. Next, add automated anomaly detection with Elementary.

What's Next

Install Elementary to track test results over time and detect anomalies automatically.

Continue to Elementary Setup