Maintain Your Data Stack

You have built a complete, production-ready data stack spanning infrastructure, ingestion, transformation, analytics, and observability. This section covers how to keep it running, evolve it over time, and handle routine maintenance tasks - with or without AI agents.

What You'll Learn

This section focuses on day-to-day operations - adding new resources, onboarding team members, and keeping your platform healthy:

Adding new users, data sources, and models
Backfill strategies and performance optimisation
Disaster recovery and security hardening
Troubleshooting common issues

Claude Code Setup

If you haven't already, set up CLAUDE.md files and skills for your repositories so Claude Code can assist with these maintenance tasks. See the Claude Code Setup page in the Getting Started section.

Your Three Repositories

By now you have three repositories that work together:

GitHub Organisation
├── terraform/           Infrastructure as code
│   ├── github/          GitHub organisation, teams, users
│   ├── aws/             S3, IAM, VPC, Secrets Manager
│   └── snowflake/       Warehouses, databases, roles, users
│
├── data-pipelines/      Ingestion and orchestration
│   ├── sources/         dlt source definitions
│   ├── pipelines/       dlt pipeline configurations
│   └── flows/           Prefect flow definitions
│
└── dbt-transform/       Data transformation
    └── models/
        ├── staging/     Clean raw data (views)
        ├── intermediate/ Business logic (ephemeral)
        ├── marts/       Analytics tables (tables/incremental)
        └── reporting/   BI-facing subset (views)

Each repository has its own conventions, module patterns, and safety rules. Maintaining them means understanding these patterns - or having an AI agent that already does.

Runbooks

These pages follow the runbook structure - each one is a step-by-step operational procedure with verification, rollback, and escalation paths.

Runbook	When to Use
Adding Users	New team member or service account needs access
Adding Data Sources	New API, database, or SaaS tool to ingest
Backfills	Historical data needs reprocessing
Performance Optimisation	Slow queries, long dbt runs, or credit spikes
Disaster Recovery	Data loss, state corruption, or service outages
Security Hardening	Key rotation, access reviews, or audit
Upgrades	Snowflake, dbt, Prefect, or provider version updates
Troubleshooting	Something is broken and the cause is unclear

Prerequisites

Before starting this section, ensure you have completed:

Data Warehouse - Terraform modules for Snowflake resources
Orchestration - Prefect flows and deployments
Batch Data Ingestion - dlt pipelines
At least one of: SaaS Ingestion, Data Transformation, or Streaming