Maintain Your Data Stack
You have built a complete, production-ready data stack spanning infrastructure, ingestion, transformation, analytics, and observability. This section covers how to keep it running, evolve it over time, and handle routine maintenance tasks - with or without AI agents.
What You'll Learn
This section focuses on day-to-day operations - adding new resources, onboarding team members, and keeping your platform healthy:
- Adding new users, data sources, and models
- Backfill strategies and performance optimisation
- Disaster recovery and security hardening
- Troubleshooting common issues
Claude Code Setup
If you haven't already, set up CLAUDE.md files and skills for your repositories so Claude Code can assist with these maintenance tasks. See the Claude Code Setup page in the Getting Started section.
Your Three Repositories
By now you have three repositories that work together:
GitHub Organisation
├── terraform/ Infrastructure as code
│ ├── github/ GitHub organisation, teams, users
│ ├── aws/ S3, IAM, VPC, Secrets Manager
│ └── snowflake/ Warehouses, databases, roles, users
│
├── data-pipelines/ Ingestion and orchestration
│ ├── sources/ dlt source definitions
│ ├── pipelines/ dlt pipeline configurations
│ └── flows/ Prefect flow definitions
│
└── dbt-transform/ Data transformation
└── models/
├── staging/ Clean raw data (views)
├── intermediate/ Business logic (ephemeral)
├── marts/ Analytics tables (tables/incremental)
└── reporting/ BI-facing subset (views)
Each repository has its own conventions, module patterns, and safety rules. Maintaining them means understanding these patterns - or having an AI agent that already does.
Runbooks
These pages follow the runbook structure - each one is a step-by-step operational procedure with verification, rollback, and escalation paths.
| Runbook | When to Use |
|---|---|
| Adding Users | New team member or service account needs access |
| Adding Data Sources | New API, database, or SaaS tool to ingest |
| Backfills | Historical data needs reprocessing |
| Performance Optimisation | Slow queries, long dbt runs, or credit spikes |
| Disaster Recovery | Data loss, state corruption, or service outages |
| Security Hardening | Key rotation, access reviews, or audit |
| Upgrades | Snowflake, dbt, Prefect, or provider version updates |
| Troubleshooting | Something is broken and the cause is unclear |
Prerequisites
Before starting this section, ensure you have completed:
- Data Warehouse - Terraform modules for Snowflake resources
- Orchestration - Prefect flows and deployments
- Batch Data Ingestion - dlt pipelines
- At least one of: SaaS Ingestion, Data Transformation, or Streaming