Self-Hosted Setup
On this page, you will:
- Deploy Airbyte on AWS ECS Fargate with Terraform
- Configure RDS PostgreSQL for Airbyte metadata
- Set up an ALB for secure access
- Run Airbyte locally with Docker Compose for development
Optional Page
This page covers self-hosted Airbyte deployment. If you're using Airbyte Cloud, skip to Snowflake Infrastructure.
Overview
Self-hosted Airbyte runs on your own infrastructure, giving you full control over data residency, costs, and customisation.
┌─────────────────────────────────────────────────────────────────────────────┐
│ SELF-HOSTED ARCHITECTURE (AWS) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ VPC │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────────────────────────┐ │ │
│ │ │ ALB │ │ ECS Cluster │ │ │
│ │ │ (HTTPS) │────▶│ │ │ │
│ │ └─────────────┘ │ ┌───────────┐ ┌───────────┐ │ │ │
│ │ │ │ Server │ │ Worker │ │ │ │
│ │ │ │ (Fargate)│ │ (Fargate)│ │ │ │
│ │ │ └─────┬─────┘ └───────────┘ │ │ │
│ │ └────────┼────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌────────▼────────┐ │ │
│ │ │ RDS PostgreSQL │ │ │
│ │ │ (metadata) │ │ │
│ │ └─────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Local Development with Docker Compose
Before deploying to AWS, test Airbyte locally with Docker Compose.
Prerequisites
- Docker Desktop installed and running
- At least 4 GB RAM allocated to Docker
Run Airbyte Locally
# Clone Airbyte repository
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
# Run Airbyte
./run-ab-platform.sh
Airbyte will be available at http://localhost:8000.
Default credentials:
- Username:
airbyte - Password:
password
Resource Usage
Airbyte runs several containers and uses significant resources. Close it when not in use:
docker compose down
Test the UI
- Open
http://localhost:8000 - Log in with default credentials
- Explore the Sources and Destinations pages
- Verify you can see the HubSpot source connector
Testing only
This local instance is for testing only. Do not use it for production workloads.
ECS Deployment with Terraform
Project Structure
Add Airbyte infrastructure to your Terraform repository:
terraform/
├── aws/
│ ├── airbyte/
│ │ ├── main.tf # ECS cluster, services, task definitions
│ │ ├── variables.tf # Input variables
│ │ ├── outputs.tf # Output values
│ │ ├── rds.tf # RDS PostgreSQL for metadata
│ │ ├── alb.tf # Application Load Balancer
│ │ ├── iam.tf # IAM roles for ECS tasks
│ │ ├── security_groups.tf # Security group rules
│ │ └── secrets.tf # Secrets Manager references
│ └── ...
└── ...
ECS Cluster
# terraform/aws/airbyte/main.tf
resource "aws_ecs_cluster" "airbyte" {
name = "airbyte"
setting {
name = "containerInsights"
value = "enabled"
}
tags = {
Project = "modern-data-stack"
Component = "airbyte"
Environment = "production"
}
}
RDS PostgreSQL
Airbyte requires a PostgreSQL database for storing configuration, connection state, and job history.
# terraform/aws/airbyte/rds.tf
resource "aws_db_instance" "airbyte_metadata" {
identifier = "airbyte-metadata"
engine = "postgres"
engine_version = "16.4"
instance_class = "db.t4g.micro"
allocated_storage = 20
max_allocated_storage = 50
storage_type = "gp3"
db_name = "airbyte"
username = "airbyte"
password = var.rds_password
vpc_security_group_ids = [aws_security_group.airbyte_rds.id]
db_subnet_group_name = aws_db_subnet_group.airbyte.name
backup_retention_period = 7
skip_final_snapshot = false
final_snapshot_identifier = "airbyte-metadata-final"
tags = {
Project = "modern-data-stack"
Component = "airbyte"
}
}
ECS Task Definition
The Airbyte server runs as a Fargate task:
# terraform/aws/airbyte/main.tf
resource "aws_ecs_task_definition" "airbyte_server" {
family = "airbyte-server"
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
cpu = 1024 # 1 vCPU
memory = 2048 # 2 GB
execution_role_arn = aws_iam_role.airbyte_execution.arn
task_role_arn = aws_iam_role.airbyte_task.arn
container_definitions = jsonencode([
{
name = "airbyte-server"
image = "airbyte/server:latest"
essential = true
portMappings = [
{
containerPort = 8001
protocol = "tcp"
}
]
environment = [
{
name = "DATABASE_URL"
value = "jdbc:postgresql://${aws_db_instance.airbyte_metadata.endpoint}/airbyte"
},
{
name = "DATABASE_USER"
value = "airbyte"
},
]
secrets = [
{
name = "DATABASE_PASSWORD"
valueFrom = aws_secretsmanager_secret.airbyte_rds_password.arn
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.airbyte.name
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "server"
}
}
}
])
}
ECS Service
resource "aws_ecs_service" "airbyte_server" {
name = "airbyte-server"
cluster = aws_ecs_cluster.airbyte.id
task_definition = aws_ecs_task_definition.airbyte_server.arn
desired_count = 1
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.airbyte_server.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.airbyte.arn
container_name = "airbyte-server"
container_port = 8001
}
}
Application Load Balancer
# terraform/aws/airbyte/alb.tf
resource "aws_lb" "airbyte" {
name = "airbyte-alb"
internal = true # Internal only — not public-facing
load_balancer_type = "application"
security_groups = [aws_security_group.airbyte_alb.id]
subnets = var.private_subnet_ids
tags = {
Project = "modern-data-stack"
Component = "airbyte"
}
}
resource "aws_lb_listener" "airbyte_https" {
load_balancer_arn = aws_lb.airbyte.arn
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = var.certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.airbyte.arn
}
}
resource "aws_lb_target_group" "airbyte" {
name = "airbyte-server"
port = 8001
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip"
health_check {
path = "/api/v1/health"
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
interval = 30
}
}
IAM Roles
# terraform/aws/airbyte/iam.tf
# Execution role — used by ECS to pull images and write logs
resource "aws_iam_role" "airbyte_execution" {
name = "airbyte-ecs-execution"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "airbyte_execution" {
role = aws_iam_role.airbyte_execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# Task role — used by the running container
resource "aws_iam_role" "airbyte_task" {
name = "airbyte-ecs-task"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}
]
})
}
# Allow task to read secrets
resource "aws_iam_role_policy" "airbyte_secrets" {
name = "airbyte-secrets-access"
role = aws_iam_role.airbyte_task.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret"
]
Resource = [
"arn:aws:secretsmanager:${var.aws_region}:*:secret:airbyte/*"
]
}
]
})
}
Security Groups
# terraform/aws/airbyte/security_groups.tf
resource "aws_security_group" "airbyte_alb" {
name_prefix = "airbyte-alb-"
vpc_id = var.vpc_id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = var.allowed_cidr_blocks # Restrict to your network
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "airbyte_server" {
name_prefix = "airbyte-server-"
vpc_id = var.vpc_id
ingress {
from_port = 8001
to_port = 8001
protocol = "tcp"
security_groups = [aws_security_group.airbyte_alb.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "airbyte_rds" {
name_prefix = "airbyte-rds-"
vpc_id = var.vpc_id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.airbyte_server.id]
}
}
Variables
# terraform/aws/airbyte/variables.tf
variable "aws_region" {
description = "AWS region"
type = string
default = "eu-west-2"
}
variable "vpc_id" {
description = "VPC ID for Airbyte deployment"
type = string
}
variable "private_subnet_ids" {
description = "Private subnet IDs for ECS and RDS"
type = list(string)
}
variable "allowed_cidr_blocks" {
description = "CIDR blocks allowed to access Airbyte UI"
type = list(string)
}
variable "certificate_arn" {
description = "ACM certificate ARN for HTTPS"
type = string
}
variable "rds_password" {
description = "Password for RDS PostgreSQL"
type = string
sensitive = true
}
Deploy
cd terraform/aws/airbyte
terraform init
terraform plan
terraform apply
Store Self-Hosted Credentials
For self-hosted Airbyte, store the API URL in Secrets Manager:
aws secretsmanager create-secret \
--name "airbyte/api-credentials" \
--description "Airbyte self-hosted API credentials" \
--secret-string '{
"api_url": "https://airbyte.internal.example.com/api/v1",
"username": "airbyte",
"password": "YOUR_PASSWORD_HERE"
}' \
--region eu-west-2
Upgrading Airbyte
Self-hosted Airbyte requires manual upgrades:
- Check release notes at github.com/airbytehq/airbyte/releases
- Update the Docker image tag in your ECS task definition
- Run database migrations if required (check release notes)
- Deploy the update via Terraform
# Update the image tag
image = "airbyte/server:0.64.0" # Update to desired version
Breaking Changes
Major version upgrades may require migration steps. Always review release notes before upgrading. Test upgrades in a staging environment first.
Monitoring
CloudWatch Logs
Airbyte server and worker logs are sent to CloudWatch:
aws logs tail /ecs/airbyte --follow
CloudWatch Alarms
Set up alarms for key metrics:
resource "aws_cloudwatch_metric_alarm" "airbyte_cpu" {
alarm_name = "airbyte-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/ECS"
period = 300
statistic = "Average"
threshold = 80
alarm_description = "Airbyte ECS CPU utilization is high"
dimensions = {
ClusterName = aws_ecs_cluster.airbyte.name
ServiceName = aws_ecs_service.airbyte_server.name
}
}
Summary
You've deployed self-hosted Airbyte:
- Ran Airbyte locally with Docker Compose for testing
- Deployed ECS Fargate infrastructure with Terraform
- Configured RDS PostgreSQL for metadata storage
- Set up ALB with HTTPS for secure access
- Stored API credentials in AWS Secrets Manager
What's Next
With Airbyte deployed, set up the Snowflake infrastructure for Airbyte-loaded data.
Continue to Snowflake Infrastructure →