Add a New Data Source

This skill adds complete Snowflake infrastructure for a new data source, following the pattern used for existing loader databases.

When to Use

Adding a new data loader tool (e.g. Fivetran, Stitch, a custom loader)
Adding a new category of data that needs its own database
Setting up infrastructure before building ingestion pipelines

Before You Start

Gather the following information:

Loader tool name - used for database and service account naming (e.g. FIVETRAN)
Schema names - one per data source (e.g. HUBSPOT, STRIPE, SALESFORCE)
Whether the loader needs a storage integration - S3 access for file-based loading
Whether to use Snowpipe - auto-ingest from S3 vs direct loading
Whether the loader needs an AWS secret - credentials for the service account

Reference: Existing Data Sources

Read snowflake/config/databases.tf and snowflake/config/users.tf for existing patterns:

Database	Service Account	Schemas	Pattern
`DLT`	`SVC_DLT`	OPEN_EXCHANGE_RATES, APPLICATION_DATA, HUBSPOT	Direct load via dlt
`SNOWPIPE`	(none - auto-ingest)	OPEN_EXCHANGE_RATES	S3 auto-ingest
`AIRBYTE`	`SVC_AIRBYTE`	HUBSPOT	Connector load via Airbyte
`STREAMING`	`SVC_KAFKA_CONNECTOR`	ORDER_EVENTS	Kafka sink connector

Steps

1. Create the Database

Add to snowflake/config/databases.tf:

module "database_<loader>" {
  source = "./modules/snowflake_database"

  providers = {
    snowflake.sys_admin      = snowflake.sys_admin
    snowflake.security_admin = snowflake.security_admin
  }

  database_name    = "<LOADER>"
  database_comment = "Raw data loaded by <loader>."

  grant_reader_to_account_roles = [
    module.role_analytics_sources_reader.role_name,
  ]
  grant_writer_to_account_roles = [
    module.user_svc_<loader>.user_default_role,
  ]
}

Replace <LOADER> with the tool name in UPPER_CASE and <loader> in lowercase.

The database module automatically creates <LOADER>_DB_READER and <LOADER>_DB_WRITER database roles. Granting the reader to ANALYTICS_SOURCES_READER maintains the reader access chain so downstream developers and transformers can query the data.

2. Create the Service Account

Add to snowflake/config/users.tf:

module "user_svc_<loader>" {
  source = "./modules/snowflake_user"

  providers = {
    snowflake.security_admin = snowflake.security_admin
    snowflake.user_admin     = snowflake.user_admin
  }

  user_name                  = "SVC_<LOADER>"
  user_display_name          = "<Loader> Service Account"
  user_comment               = "Service account for <loader> data loading."
  user_is_service_account    = true
  user_create_dedicated_role = true
  user_default_warehouse     = module.warehouse_loading.warehouse_name

  user_additional_roles = []
}

Setting user_create_dedicated_role = true creates a role named SVC_<LOADER> that the database module can reference for writer grants.

3. Create Schemas

Add to snowflake/config/schemas.tf (or the relevant file where schemas are defined):

module "schema_<loader>_<source>" {
  source = "./modules/snowflake_schema"

  providers = {
    snowflake.sys_admin      = snowflake.sys_admin
    snowflake.security_admin = snowflake.security_admin
  }

  database_name  = module.database_<loader>.database_name
  schema_name    = "<SOURCE>"
  schema_comment = "Data from <source> loaded by <loader>."
}

Repeat for each schema (data source) within the database.

4. Add AWS Secrets Manager Container (if Needed)

If the service account needs credentials stored for CI/CD, add to aws/config/secrets.tf:

resource "aws_secretsmanager_secret" "<loader>_snowflake_credentials" {
  name        = "<loader>/snowflake-credentials"
  description = "Snowflake credentials for SVC_<LOADER>."
}

The actual secret value (account, user, private key) is set manually via the AWS CLI after Terraform creates the container.

5. Add Storage Integration (if Needed)

If the loader reads from S3, add a storage integration in snowflake/config/storage_integrations.tf using the snowflake_storage_integration module. Reference the existing S3 data lake bucket patterns.

6. Add Snowpipe (if Needed)

If using auto-ingest from S3, add a Snowpipe definition using the snowflake_snowpipe module. This creates the stage, file format, pipe, and SQS event notification. See the SNOWPIPE database setup for a working example.

7. Validate

Run from both directories:

cd snowflake/config && terraform plan
cd ../../aws/config && terraform plan

Verify:

Database created with correct name
DB_READER granted to ANALYTICS_SOURCES_READER
DB_WRITER granted to the service account's dedicated role
Service account created with dedicated role and LOADING warehouse
Schema(s) created in the correct database
AWS secret container created (if applicable)
No unexpected changes to existing resources

8. Create Pull Request

Commit, push, and create a PR. CI/CD validates with terraform plan and applies after approval.

Safety Checks

Database names must be UPPER_CASE
Service account names must start with SVC_
Always grant DB_READER to ANALYTICS_SOURCES_READER (maintains the reader access chain)
Never hard-code Snowflake account IDs or ARNs
Never hard-code secret values in Terraform - use containers and set values via CLI
Run terraform plan in both snowflake/config/ and aws/config/ before creating a PR
The service account module call must come before the database module call if the database references user_svc_<loader>.user_default_role - or use depends_on as needed