Add a New Data Source
This skill adds complete Snowflake infrastructure for a new data source, following the pattern used for existing loader databases.
When to Use
- Adding a new data loader tool (e.g. Fivetran, Stitch, a custom loader)
- Adding a new category of data that needs its own database
- Setting up infrastructure before building ingestion pipelines
Before You Start
Gather the following information:
- Loader tool name - used for database and service account naming (e.g.
FIVETRAN) - Schema names - one per data source (e.g.
HUBSPOT,STRIPE,SALESFORCE) - Whether the loader needs a storage integration - S3 access for file-based loading
- Whether to use Snowpipe - auto-ingest from S3 vs direct loading
- Whether the loader needs an AWS secret - credentials for the service account
Reference: Existing Data Sources
Read snowflake/config/databases.tf and snowflake/config/users.tf for existing patterns:
| Database | Service Account | Schemas | Pattern |
|---|---|---|---|
DLT |
SVC_DLT |
OPEN_EXCHANGE_RATES, APPLICATION_DATA, HUBSPOT | Direct load via dlt |
SNOWPIPE |
(none - auto-ingest) | OPEN_EXCHANGE_RATES | S3 auto-ingest |
AIRBYTE |
SVC_AIRBYTE |
HUBSPOT | Connector load via Airbyte |
STREAMING |
SVC_KAFKA_CONNECTOR |
ORDER_EVENTS | Kafka sink connector |
Steps
1. Create the Database
Add to snowflake/config/databases.tf:
module "database_<loader>" {
source = "./modules/snowflake_database"
providers = {
snowflake.sys_admin = snowflake.sys_admin
snowflake.security_admin = snowflake.security_admin
}
database_name = "<LOADER>"
database_comment = "Raw data loaded by <loader>."
grant_reader_to_account_roles = [
module.role_analytics_sources_reader.role_name,
]
grant_writer_to_account_roles = [
module.user_svc_<loader>.user_default_role,
]
}
Replace <LOADER> with the tool name in UPPER_CASE and <loader> in lowercase.
The database module automatically creates <LOADER>_DB_READER and <LOADER>_DB_WRITER database roles. Granting the reader to ANALYTICS_SOURCES_READER maintains the reader access chain so downstream developers and transformers can query the data.
2. Create the Service Account
Add to snowflake/config/users.tf:
module "user_svc_<loader>" {
source = "./modules/snowflake_user"
providers = {
snowflake.security_admin = snowflake.security_admin
snowflake.user_admin = snowflake.user_admin
}
user_name = "SVC_<LOADER>"
user_display_name = "<Loader> Service Account"
user_comment = "Service account for <loader> data loading."
user_is_service_account = true
user_create_dedicated_role = true
user_default_warehouse = module.warehouse_loading.warehouse_name
user_additional_roles = []
}
Setting user_create_dedicated_role = true creates a role named SVC_<LOADER> that the database module can reference for writer grants.
3. Create Schemas
Add to snowflake/config/schemas.tf (or the relevant file where schemas are defined):
module "schema_<loader>_<source>" {
source = "./modules/snowflake_schema"
providers = {
snowflake.sys_admin = snowflake.sys_admin
snowflake.security_admin = snowflake.security_admin
}
database_name = module.database_<loader>.database_name
schema_name = "<SOURCE>"
schema_comment = "Data from <source> loaded by <loader>."
}
Repeat for each schema (data source) within the database.
4. Add AWS Secrets Manager Container (if Needed)
If the service account needs credentials stored for CI/CD, add to aws/config/secrets.tf:
resource "aws_secretsmanager_secret" "<loader>_snowflake_credentials" {
name = "<loader>/snowflake-credentials"
description = "Snowflake credentials for SVC_<LOADER>."
}
The actual secret value (account, user, private key) is set manually via the AWS CLI after Terraform creates the container.
5. Add Storage Integration (if Needed)
If the loader reads from S3, add a storage integration in snowflake/config/storage_integrations.tf using the snowflake_storage_integration module. Reference the existing S3 data lake bucket patterns.
6. Add Snowpipe (if Needed)
If using auto-ingest from S3, add a Snowpipe definition using the snowflake_snowpipe module. This creates the stage, file format, pipe, and SQS event notification. See the SNOWPIPE database setup for a working example.
7. Validate
Run from both directories:
cd snowflake/config && terraform plan
cd ../../aws/config && terraform plan
Verify:
- Database created with correct name
DB_READERgranted toANALYTICS_SOURCES_READERDB_WRITERgranted to the service account's dedicated role- Service account created with dedicated role and
LOADINGwarehouse - Schema(s) created in the correct database
- AWS secret container created (if applicable)
- No unexpected changes to existing resources
8. Create Pull Request
Commit, push, and create a PR. CI/CD validates with terraform plan and applies after approval.
Safety Checks
- Database names must be UPPER_CASE
- Service account names must start with
SVC_ - Always grant
DB_READERtoANALYTICS_SOURCES_READER(maintains the reader access chain) - Never hard-code Snowflake account IDs or ARNs
- Never hard-code secret values in Terraform - use containers and set values via CLI
- Run
terraform planin bothsnowflake/config/andaws/config/before creating a PR - The service account module call must come before the database module call if the database references
user_svc_<loader>.user_default_role- or usedepends_onas needed