Skip to content

Orchestration with Prefect

In this section, you'll set up Prefect as the central orchestration layer for your data platform. Prefect coordinates all your data workflows - from ingestion pipelines to transformations to ML preprocessing.

What is Orchestration?

Orchestration is the coordination layer that ties your data stack together. It answers questions like:

  • When should pipelines run? (Schedules, triggers, events)
  • What happens when something fails? (Retries, alerts, recovery)
  • How do we track what happened? (Logging, lineage, observability)
  • Where do workloads execute? (Workers, containers, serverless)

Without orchestration, you end up with a collection of cron jobs, manual scripts, and tribal knowledge. With orchestration, you get a single control plane for your entire data platform.

Why Prefect?

Prefect is a modern workflow orchestration platform that replaces traditional tools like Airflow. Key advantages:

Feature Prefect Airflow
Setup Minutes (Cloud) or simple Docker Complex (scheduler, webserver, workers, database)
DAG definition Pure Python decorators DSL with operators
Dynamic workflows Native support Limited, requires workarounds
Local development Run flows locally, deploy when ready Requires Airflow environment
Failure handling Automatic retries, caching, timeouts Manual configuration
UI Modern, real-time Functional but dated

Prefect's philosophy is "orchestration as code" — your workflows are Python functions with decorators. This makes them straightforward to test, version, and debug.

Role in the Stack

Prefect sits at the centre of your data platform, orchestrating everything:

┌─────────────────────────────────────────────────────────────────────────────┐
│                              PREFECT                                        │
│                       (Orchestration Layer)                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │     dlt     │  │   Airbyte   │  │     dbt     │  │  ML/Python  │         │
│  │  Pipelines  │  │    Syncs    │  │   Models    │  │   Scripts   │         │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘         │
│         │                │                │                │                │
│         ▼                ▼                ▼                ▼                │
│  ┌─────────────────────────────────────────────────────────────────┐        │
│  │                         Prefect Flows                           │        │
│  │  • Scheduling      • Retries       • Logging      • Alerts      │        │
│  │  • Dependencies    • Caching       • Lineage      • Triggers    │        │
│  └─────────────────────────────────────────────────────────────────┘        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
                    ┌───────────────────────────────┐
                    │          Snowflake            │
                    │       (Data Warehouse)        │
                    └───────────────────────────────┘

Prefect tasks wrap your:

  • dlt pipelines - API and database ingestion
  • Airbyte syncs - SaaS source ingestion
  • dbt models - Data transformations
  • Python scripts - ML preprocessing, custom logic

Deployment Options

This section covers three deployment options to suit different needs:

Option Control Plane Cost Worker Cost Best For
Prefect Cloud Free - $100+/mo + ~$15-50/mo (EC2/ECS) Most teams (recommended to start)
Docker Compose ~$17/mo (includes worker) Included Budget-conscious, data sovereignty
ECS + RDS ~$67/mo (includes worker) Included Production HA requirements

Worker Infrastructure

With all options, workers run on your infrastructure. Prefect Cloud manages the control plane (API, UI, scheduling) but you provide compute for flow execution. The self-hosted options include a worker in the base cost estimate.

All three options use the same Prefect concepts - flows, tasks, deployments, work pools. The only difference is where the control plane runs.

Start with Prefect Cloud

We recommend starting with Prefect Cloud's free tier. It gets you running quickly with zero infrastructure management. You can migrate to self-hosted later if needed.

What You'll Build

By the end of this section, you'll have:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         ORCHESTRATION SETUP                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Control Plane (choose one)                                                 │
│  ┌─────────────────────────────────────────────────────────────────┐        │
│  │  Prefect Cloud    OR    Docker Compose    OR    ECS + RDS       │        │
│  └─────────────────────────────────────────────────────────────────┘        │
│                                                                             │
│  Work Pools & Workers                                                       │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐              │
│  │  Process Pool   │  │   Docker Pool   │  │    ECS Pool     │              │
│  │  (Local/EC2)    │  │  (Containers)   │  │  (Serverless)   │              │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘              │
│                                                                             │
│  Infrastructure as Code                                                     │
│  ┌─────────────────────────────────────────────────────────────────┐        │
│  │  Terraform: Work pools, service accounts, EC2/ECS resources     │        │
│  └─────────────────────────────────────────────────────────────────┘        │
│                                                                             │
│  Secrets Integration                                                        │
│  ┌─────────────────────────────────────────────────────────────────┐        │
│  │  AWS Secrets Manager: Snowflake, API keys, credentials          │        │
│  └─────────────────────────────────────────────────────────────────┘        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Prerequisites

Before starting this section, ensure you have completed:

If using Prefect Cloud:

Section Overview

Core path (recommended):

Page What You'll Learn
1. Prefect Concepts Core concepts: flows, tasks, deployments, work pools
2. Choosing a Deployment Compare options, understand costs and trade-offs
3. Prefect Cloud Setup Set up Prefect Cloud with Terraform (recommended)
6. Work Pools & Workers Reference for work pool types and configuration
7. Your First Flow Create a repository and deploy your first flow
8. Secrets & Blocks Integrate with AWS Secrets Manager and S3
11. Finishing Up Summary and next steps

Self-hosted options (advanced - requires VPC):

Page What You'll Learn
4. Docker Compose Setup Simple self-hosted on EC2 (~$17/mo)
5. ECS Production Setup Production-grade self-hosted on AWS (~$67/mo)

Self-Hosted Requirements

The self-hosted options require AWS VPC and networking infrastructure not covered in this guide. Most users should use Prefect Cloud.

Cost Considerations

Orchestration costs vary significantly by deployment option. See Choosing a Deployment for detailed cost breakdowns, or the Cost Overview for a summary.

Quick reference:

Option Infrastructure Operational Burden
Prefect Cloud Free $0 None
Prefect Cloud Starter $100/mo None
Docker Compose ~$17/mo Medium (2-4 hrs/mo)
ECS + RDS ~$67/mo Low-Medium

What's Next

Start by understanding the core Prefect concepts - flows, tasks, deployments, and work pools. This foundation applies to all deployment options.

Continue to Prefect Concepts