Skip to content

Self-Hosted: ECS Production

Advanced - Requires VPC and Networking

This page covers production self-hosted Prefect on ECS, which requires AWS VPC with public and private subnets. See VPC Networking to set this up first.

Most users should use Prefect Cloud - it's simpler, has no infrastructure to manage, and provides the same features.

Self-hosting on ECS is recommended only if you have strict data sovereignty requirements.

On this page, you will:

  • Deploy Prefect server on ECS Fargate
  • Configure RDS PostgreSQL for state storage
  • Set up Application Load Balancer for access
  • Configure auto-scaling and monitoring

Overview

This is the production-grade self-hosted option - Prefect server runs on ECS Fargate with RDS PostgreSQL for persistence. This provides high availability, managed scaling, and reduced operational burden compared to Docker Compose.

┌─────────────────────────────────────────────────────────────────────────────┐
│                              AWS INFRASTRUCTURE                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────┐        │
│  │                    Application Load Balancer                    │        │
│  │                         (HTTPS :443)                            │        │
│  └────────────────────────────┬────────────────────────────────────┘        │
│                               │                                             │
│  ┌────────────────────────────▼────────────────────────────────────┐        │
│  │                      ECS Cluster (Fargate)                      │        │
│  │  ┌─────────────────────────────────────────────────────────┐    │        │
│  │  │              Prefect Server Service                     │    │        │
│  │  │  ┌─────────────┐  ┌─────────────┐                       │    │        │
│  │  │  │   Task 1    │  │   Task 2    │  (Multi-AZ)           │    │        │
│  │  │  └─────────────┘  └─────────────┘                       │    │        │
│  │  └─────────────────────────────────────────────────────────┘    │        │
│  │                                                                 │        │
│  │  ┌─────────────────────────────────────────────────────────┐    │        │
│  │  │              Prefect Worker Service                     │    │        │
│  │  │  ┌─────────────┐  ┌─────────────┐  (Auto-scaling)       │    │        │
│  │  │  │   Task 1    │  │   Task N    │                       │    │        │
│  │  │  └─────────────┘  └─────────────┘                       │    │        │
│  │  └─────────────────────────────────────────────────────────┘    │        │
│  └─────────────────────────────────────────────────────────────────┘        │
│                               │                                             │
│  ┌────────────────────────────▼────────────────────────────────────┐        │
│  │                      RDS PostgreSQL                             │        │
│  │                    (Multi-AZ optional)                          │        │
│  └─────────────────────────────────────────────────────────────────┘        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Estimated cost: ~$67/month (Fargate + RDS + ALB)

Prerequisites

  • AWS VPC with multi-AZ networking - See VPC Networking to set this up
  • Terraform configured with remote state
  • Domain name (optional, for HTTPS)
  • ACM certificate (optional, for HTTPS)

Don't Have VPC Infrastructure?

Either follow the VPC Networking guide first (~$35/month), or use Prefect Cloud instead.

Terraform Module

Prefect provides an official Terraform module for ECS deployment. We'll use it with customisations.

Project Structure

terraform/
└── prefect-ecs/
    ├── config/
    │   ├── backend.tf
    │   ├── main.tf
    │   ├── providers.tf
    │   ├── variables.tf
    │   ├── terraform.tfvars
    │   ├── rds.tf
    │   ├── ecs.tf
    │   ├── alb.tf
    │   └── outputs.tf
    └── modules/

Backend and Provider

Create terraform/prefect-ecs/config/backend.tf:

terraform {
  backend "s3" {
    bucket         = "your-terraform-state-bucket"
    key            = "prefect-ecs/terraform.tfstate"
    region         = "eu-west-2"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

Create terraform/prefect-ecs/config/main.tf:

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    random = {
      source  = "hashicorp/random"
      version = "~> 3.0"
    }
  }
}

Create terraform/prefect-ecs/config/providers.tf:

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Project     = "data-platform"
      ManagedBy   = "terraform"
      Component   = "prefect-server"
      Environment = var.environment
    }
  }
}

Variables

Create terraform/prefect-ecs/config/variables.tf:

variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "eu-west-2"
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "production"
}

variable "vpc_id" {
  description = "VPC ID"
  type        = string
}

variable "private_subnet_ids" {
  description = "Private subnet IDs for ECS tasks and RDS"
  type        = list(string)
}

variable "public_subnet_ids" {
  description = "Public subnet IDs for ALB"
  type        = list(string)
}

variable "allowed_cidr_blocks" {
  description = "CIDR blocks allowed to access Prefect UI"
  type        = list(string)
  default     = ["10.0.0.0/8"]
}

variable "prefect_image" {
  description = "Prefect Docker image"
  type        = string
  default     = "prefecthq/prefect:3-latest"
}

variable "server_cpu" {
  description = "CPU units for Prefect server (256, 512, 1024, etc.)"
  type        = number
  default     = 512
}

variable "server_memory" {
  description = "Memory (MB) for Prefect server"
  type        = number
  default     = 1024
}

variable "db_instance_class" {
  description = "RDS instance class"
  type        = string
  default     = "db.t3.micro"
}

variable "db_allocated_storage" {
  description = "RDS allocated storage in GB"
  type        = number
  default     = 20
}

RDS PostgreSQL

Create terraform/prefect-ecs/config/rds.tf:

# -----------------------------------------------------------------------------
# Random Password for RDS
# -----------------------------------------------------------------------------
resource "random_password" "db_password" {
  length  = 32
  special = false
}

# -----------------------------------------------------------------------------
# Security Group for RDS
# -----------------------------------------------------------------------------
resource "aws_security_group" "rds" {
  name        = "prefect-rds-${var.environment}"
  description = "Security group for Prefect RDS"
  vpc_id      = var.vpc_id

  ingress {
    description     = "PostgreSQL from ECS"
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.ecs_tasks.id]
  }

  tags = {
    Name = "prefect-rds-${var.environment}"
  }
}

# -----------------------------------------------------------------------------
# RDS Subnet Group
# -----------------------------------------------------------------------------
resource "aws_db_subnet_group" "prefect" {
  name       = "prefect-${var.environment}"
  subnet_ids = var.private_subnet_ids

  tags = {
    Name = "prefect-${var.environment}"
  }
}

# -----------------------------------------------------------------------------
# RDS Instance
# -----------------------------------------------------------------------------
resource "aws_db_instance" "prefect" {
  identifier = "prefect-${var.environment}"

  engine         = "postgres"
  engine_version = "15"
  instance_class = var.db_instance_class

  allocated_storage     = var.db_allocated_storage
  max_allocated_storage = var.db_allocated_storage * 2
  storage_type          = "gp3"
  storage_encrypted     = true

  db_name  = "prefect"
  username = "prefect"
  password = random_password.db_password.result

  db_subnet_group_name   = aws_db_subnet_group.prefect.name
  vpc_security_group_ids = [aws_security_group.rds.id]

  backup_retention_period = 7
  backup_window           = "03:00-04:00"
  maintenance_window      = "sun:04:00-sun:05:00"

  skip_final_snapshot       = false
  final_snapshot_identifier = "prefect-${var.environment}-final"

  tags = {
    Name = "prefect-${var.environment}"
  }
}

# -----------------------------------------------------------------------------
# Store Password in Secrets Manager
# -----------------------------------------------------------------------------
resource "aws_secretsmanager_secret" "db_password" {
  name        = "prefect/rds-password"
  description = "Prefect RDS PostgreSQL password"
}

resource "aws_secretsmanager_secret_version" "db_password" {
  secret_id     = aws_secretsmanager_secret.db_password.id
  secret_string = random_password.db_password.result
}

ECS Cluster and Services

Create terraform/prefect-ecs/config/ecs.tf:

# -----------------------------------------------------------------------------
# ECS Cluster
# -----------------------------------------------------------------------------
resource "aws_ecs_cluster" "prefect" {
  name = "prefect-${var.environment}"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

# -----------------------------------------------------------------------------
# Security Group for ECS Tasks
# -----------------------------------------------------------------------------
resource "aws_security_group" "ecs_tasks" {
  name        = "prefect-ecs-tasks-${var.environment}"
  description = "Security group for Prefect ECS tasks"
  vpc_id      = var.vpc_id

  ingress {
    description     = "HTTP from ALB"
    from_port       = 4200
    to_port         = 4200
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    description = "All outbound"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "prefect-ecs-tasks-${var.environment}"
  }
}

# -----------------------------------------------------------------------------
# IAM Roles for ECS
# -----------------------------------------------------------------------------
resource "aws_iam_role" "ecs_task_execution" {
  name = "prefect-ecs-execution-${var.environment}"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ecs-tasks.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "ecs_task_execution" {
  role       = aws_iam_role.ecs_task_execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

resource "aws_iam_role" "ecs_task" {
  name = "prefect-ecs-task-${var.environment}"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ecs-tasks.amazonaws.com"
      }
    }]
  })
}

# -----------------------------------------------------------------------------
# CloudWatch Log Group
# -----------------------------------------------------------------------------
resource "aws_cloudwatch_log_group" "prefect" {
  name              = "/ecs/prefect-${var.environment}"
  retention_in_days = 30
}

# -----------------------------------------------------------------------------
# Task Definition - Prefect Server
# -----------------------------------------------------------------------------
resource "aws_ecs_task_definition" "prefect_server" {
  family                   = "prefect-server-${var.environment}"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = var.server_cpu
  memory                   = var.server_memory
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn
  task_role_arn            = aws_iam_role.ecs_task.arn

  container_definitions = jsonencode([
    {
      name  = "prefect-server"
      image = var.prefect_image

      command = ["prefect", "server", "start", "--host", "0.0.0.0"]

      environment = [
        {
          name  = "PREFECT_API_DATABASE_CONNECTION_URL"
          value = "postgresql+asyncpg://prefect:${random_password.db_password.result}@${aws_db_instance.prefect.endpoint}/prefect"
        },
        {
          name  = "PREFECT_SERVER_API_HOST"
          value = "0.0.0.0"
        },
        {
          name  = "PREFECT_SERVER_API_PORT"
          value = "4200"
        }
      ]

      portMappings = [
        {
          containerPort = 4200
          protocol      = "tcp"
        }
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.prefect.name
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "server"
        }
      }

      healthCheck = {
        command     = ["CMD-SHELL", "curl -f http://localhost:4200/api/health || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60
      }
    }
  ])
}

# -----------------------------------------------------------------------------
# ECS Service - Prefect Server
# -----------------------------------------------------------------------------
resource "aws_ecs_service" "prefect_server" {
  name            = "prefect-server"
  cluster         = aws_ecs_cluster.prefect.id
  task_definition = aws_ecs_task_definition.prefect_server.arn
  desired_count   = 1
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.prefect.arn
    container_name   = "prefect-server"
    container_port   = 4200
  }

  depends_on = [aws_lb_listener.prefect]
}

Application Load Balancer

Create terraform/prefect-ecs/config/alb.tf:

# -----------------------------------------------------------------------------
# Security Group for ALB
# -----------------------------------------------------------------------------
resource "aws_security_group" "alb" {
  name        = "prefect-alb-${var.environment}"
  description = "Security group for Prefect ALB"
  vpc_id      = var.vpc_id

  ingress {
    description = "HTTP"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = var.allowed_cidr_blocks
  }

  ingress {
    description = "HTTPS"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = var.allowed_cidr_blocks
  }

  egress {
    description = "All outbound"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "prefect-alb-${var.environment}"
  }
}

# -----------------------------------------------------------------------------
# Application Load Balancer
# -----------------------------------------------------------------------------
resource "aws_lb" "prefect" {
  name               = "prefect-${var.environment}"
  internal           = true  # Change to false for public access
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = var.public_subnet_ids

  tags = {
    Name = "prefect-${var.environment}"
  }
}

# -----------------------------------------------------------------------------
# Target Group
# -----------------------------------------------------------------------------
resource "aws_lb_target_group" "prefect" {
  name        = "prefect-${var.environment}"
  port        = 4200
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"

  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    matcher             = "200"
    path                = "/api/health"
    port                = "traffic-port"
    timeout             = 5
    unhealthy_threshold = 3
  }
}

# -----------------------------------------------------------------------------
# Listener (HTTP - redirect to HTTPS in production)
# -----------------------------------------------------------------------------
resource "aws_lb_listener" "prefect" {
  load_balancer_arn = aws_lb.prefect.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.prefect.arn
  }
}

# For HTTPS, add:
# resource "aws_lb_listener" "prefect_https" {
#   load_balancer_arn = aws_lb.prefect.arn
#   port              = 443
#   protocol          = "HTTPS"
#   ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
#   certificate_arn   = var.acm_certificate_arn
#
#   default_action {
#     type             = "forward"
#     target_group_arn = aws_lb_target_group.prefect.arn
#   }
# }

Outputs

Create terraform/prefect-ecs/config/outputs.tf:

output "ecs_cluster_name" {
  description = "ECS cluster name"
  value       = aws_ecs_cluster.prefect.name
}

output "alb_dns_name" {
  description = "ALB DNS name"
  value       = aws_lb.prefect.dns_name
}

output "prefect_ui_url" {
  description = "Prefect UI URL"
  value       = "http://${aws_lb.prefect.dns_name}"
}

output "prefect_api_url" {
  description = "Prefect API URL for workers and CLI"
  value       = "http://${aws_lb.prefect.dns_name}/api"
}

output "rds_endpoint" {
  description = "RDS endpoint"
  value       = aws_db_instance.prefect.endpoint
}

Deploy the Infrastructure

cd terraform/prefect-ecs/config

terraform init
terraform plan
terraform apply

Wait for Terraform to provision all resources.

Connect to Prefect

Configure your CLI to use the self-hosted server:

# Get the ALB DNS name from Terraform outputs
export PREFECT_API_URL="http://ALB_DNS_NAME/api"

# Verify connection
prefect version
prefect work-pool ls

Create Work Pools

# Create a process work pool for simple tasks
prefect work-pool create default --type process

# Create an ECS work pool for containerised flows
prefect work-pool create production --type ecs

Monitoring

CloudWatch Metrics

ECS automatically publishes metrics to CloudWatch:

  • CPU/Memory utilisation
  • Running task count
  • Service health

CloudWatch Logs

View logs in the AWS Console or via CLI:

aws logs tail --profile data-engineer /ecs/prefect-production --follow

Alarms (Optional)

Add CloudWatch alarms for critical metrics:

resource "aws_cloudwatch_metric_alarm" "prefect_cpu" {
  alarm_name          = "prefect-high-cpu-${var.environment}"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 300
  statistic           = "Average"
  threshold           = 80

  dimensions = {
    ClusterName = aws_ecs_cluster.prefect.name
    ServiceName = aws_ecs_service.prefect_server.name
  }

  alarm_actions = [var.sns_topic_arn]  # Add SNS topic for notifications
}

Summary

You've deployed a production-grade Prefect server on AWS:

  • ECS Fargate for managed container orchestration
  • RDS PostgreSQL with automated backups
  • Application Load Balancer for access
  • CloudWatch for logging and monitoring

What's Next

With the server running, configure work pools and workers for your specific workloads.

Continue to Work Pools and Workers