Working with Resources and Data Sources
Learn how to create infrastructure resources and query existing infrastructure with data sources
TLDR: Resources create and manage infrastructure (servers, databases, networks). Data sources query existing infrastructure without managing it. Use resource dependencies to control creation order. Apply lifecycle rules to prevent accidental deletion or control update behavior.
Resources are the heart of Terraform - they represent the infrastructure components you want to create and manage. Data sources let you reference information about existing infrastructure. Understanding both is key to building real configurations.
Creating Resources
A resource block declares a piece of infrastructure to create. The general syntax is:
resource "provider_type" "local_name" {
argument1 = "value1"
argument2 = "value2"
}
Let's create a complete example with a VPC, subnet, and EC2 instance on AWS:
# VPC to isolate our resources
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "main-vpc"
}
}
# Public subnet in the VPC
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
map_public_ip_on_launch = true
tags = {
Name = "public-subnet"
}
}
# Internet gateway for external access
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "main-igw"
}
}
# Route table for public internet access
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "public-rt"
}
}
# Associate route table with subnet
resource "aws_route_table_association" "public" {
subnet_id = aws_subnet.public.id
route_table_id = aws_route_table.public.id
}
# Security group
resource "aws_security_group" "web" {
name = "web-sg"
description = "Allow web traffic"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "web-sg"
}
}
# EC2 instance
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
subnet_id = aws_subnet.public.id
vpc_security_group_ids = [aws_security_group.web.id]
tags = {
Name = "web-server"
}
}
When you run terraform apply, Terraform creates these resources in the correct order based on their dependencies.
Understanding Resource Dependencies
Terraform automatically determines the order to create resources by analyzing references. In our example:
aws_vpc.main
↓
├── aws_subnet.public
│ ↓
│ aws_instance.web ← aws_security_group.web
│
└── aws_internet_gateway.main
↓
aws_route_table.public
↓
aws_route_table_association.public
The VPC must exist before creating the subnet or internet gateway. The subnet must exist before creating the instance. Terraform handles this automatically.
Sometimes you need explicit dependencies that Terraform can't infer. Use depends_on:
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
subnet_id = aws_subnet.public.id
# Explicit dependency - make sure internet gateway exists
depends_on = [aws_internet_gateway.main]
}
Use depends_on sparingly. Terraform usually figures out dependencies correctly. You need it when:
- Resources have a hidden dependency that Terraform can't detect
- An external system needs time to become available
- You need to control creation order for logical reasons
Resource Attributes
After Terraform creates a resource, it exposes attributes you can reference. Some attributes you provide (like cidr_block), others are set by the provider (like id).
Check provider documentation to see available attributes. For an AWS instance:
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
subnet_id = aws_subnet.public.id
}
# Reference the instance's attributes
output "instance_id" {
value = aws_instance.web.id # Set by AWS
}
output "private_ip" {
value = aws_instance.web.private_ip # Set by AWS
}
output "public_ip" {
value = aws_instance.web.public_ip # Set by AWS
}
output "instance_type" {
value = aws_instance.web.instance_type # What we set
}
Data Sources: Querying Existing Infrastructure
Data sources let you fetch information about resources that exist outside your Terraform configuration. They don't create or modify anything - they just query.
Syntax for data sources:
data "provider_type" "local_name" {
# Query parameters
}
Common use cases:
Finding AMIs
Instead of hardcoding an AMI ID, find the latest version dynamically:
# Find the latest Ubuntu 22.04 AMI
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical's AWS account ID
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
# Use it in a resource
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id # Always the latest
instance_type = "t3.micro"
}
This ensures you always use the newest Ubuntu image without manual updates.
Referencing Existing VPCs
If your VPC was created outside Terraform:
# Find VPC by tag
data "aws_vpc" "main" {
tags = {
Name = "main-vpc"
}
}
# Find subnets in that VPC
data "aws_subnets" "public" {
filter {
name = "vpc-id"
values = [data.aws_vpc.main.id]
}
tags = {
Tier = "public"
}
}
# Use the VPC and subnets
resource "aws_security_group" "web" {
vpc_id = data.aws_vpc.main.id
# ... rules ...
}
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
subnet_id = data.aws_subnets.public.ids[0]
}
Getting Account and Region Information
# Get current AWS account ID
data "aws_caller_identity" "current" {}
# Get current region
data "aws_region" "current" {}
# Get available availability zones
data "aws_availability_zones" "available" {
state = "available"
}
# Use them
locals {
account_id = data.aws_caller_identity.current.account_id
region = data.aws_region.current.name
azs = data.aws_availability_zones.available.names
}
output "deployment_info" {
value = "Deploying in account ${local.account_id}, region ${local.region}"
}
Reading External Data
The local_file data source reads local files:
data "local_file" "ssh_key" {
filename = "${path.module}/keys/id_rsa.pub"
}
resource "aws_key_pair" "deployer" {
key_name = "deployer-key"
public_key = data.local_file.ssh_key.content
}
Resource Lifecycle
Control how Terraform handles resource updates with lifecycle rules.
Prevent Deletion
Protect critical resources from accidental deletion:
resource "aws_db_instance" "production" {
identifier = "prod-db"
engine = "postgres"
instance_class = "db.t3.medium"
allocated_storage = 100
lifecycle {
prevent_destroy = true # Terraform will error if you try to destroy this
}
}
If someone tries to destroy this resource, Terraform refuses:
Error: Instance cannot be destroyed
on main.tf line 10:
10: resource "aws_db_instance" "production" {
Resource aws_db_instance.production has lifecycle.prevent_destroy set, but
the plan calls for this resource to be destroyed.
Create Before Destroy
Some resources can't have downtime during updates. Create the replacement first:
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
lifecycle {
create_before_destroy = true # New instance before deleting old one
}
}
When you change the AMI, Terraform:
- Creates a new instance with the new AMI
- Updates references to point to the new instance
- Deletes the old instance
Without this setting, Terraform would delete first (causing downtime) then create.
Ignore Changes
Sometimes external systems modify resources outside Terraform. Prevent Terraform from reverting these changes:
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "web-server"
}
lifecycle {
ignore_changes = [
tags["LastModified"], # Ignore changes to specific tag
user_data, # Ignore user_data modifications
]
}
}
Or ignore all changes:
lifecycle {
ignore_changes = all
}
This is useful when:
- External automation adds tags
- Auto-scaling modifies certain properties
- You're gradually adopting Terraform for existing infrastructure
Replace Triggered By
Force replacement when specific values change:
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
lifecycle {
replace_triggered_by = [
aws_security_group.web.id # Replace instance if security group changes
]
}
}
Count and For_Each: Multiple Instances
Create multiple similar resources using count or for_each.
Using Count
Create a fixed number of resources:
resource "aws_instance" "web" {
count = 3
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
subnet_id = aws_subnet.public.id
tags = {
Name = "web-${count.index}" # web-0, web-1, web-2
}
}
# Access specific instances
output "first_instance_ip" {
value = aws_instance.web[0].public_ip
}
# Access all instances
output "all_instance_ips" {
value = aws_instance.web[*].public_ip
}
count.index gives you the current iteration number (starting from 0).
Using For_Each
Create resources based on a map or set, giving each a meaningful name:
variable "instances" {
type = map(object({
instance_type = string
ami = string
}))
default = {
web = {
instance_type = "t3.micro"
ami = "ami-0c55b159cbfafe1f0"
}
api = {
instance_type = "t3.small"
ami = "ami-0c55b159cbfafe1f0"
}
worker = {
instance_type = "t3.medium"
ami = "ami-0c55b159cbfafe1f0"
}
}
}
resource "aws_instance" "server" {
for_each = var.instances
ami = each.value.ami
instance_type = each.value.instance_type
subnet_id = aws_subnet.public.id
tags = {
Name = each.key # "web", "api", or "worker"
}
}
# Access specific instance
output "web_server_ip" {
value = aws_instance.server["web"].public_ip
}
# Access all instances
output "all_server_ips" {
value = { for k, v in aws_instance.server : k => v.public_ip }
}
With for_each:
each.keyis the map key or set valueeach.valueis the map value (for sets, same aseach.key)
for_each is generally better than count because:
- Resources have meaningful names instead of numbers
- Adding/removing items doesn't renumber everything
- The intent is clearer
Use count when you just need N identical resources. Use for_each when each resource is configured differently.
Practical Example: Multi-Tier Application
Here's a complete example combining resources, data sources, and these concepts:
# Find latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
# Get available AZs
data "aws_availability_zones" "available" {
state = "available"
}
# VPC
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
tags = {
Name = "main-vpc"
}
}
# Create subnets in multiple AZs
resource "aws_subnet" "public" {
for_each = toset(slice(data.aws_availability_zones.available.names, 0, 2))
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${index(data.aws_availability_zones.available.names, each.value) + 1}.0/24"
availability_zone = each.value
map_public_ip_on_launch = true
tags = {
Name = "public-${each.value}"
}
}
# Security groups for different tiers
resource "aws_security_group" "web" {
name = "web-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "app" {
name = "app-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.web.id] # Only from web tier
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# Launch template for web servers
resource "aws_launch_template" "web" {
name_prefix = "web-"
image_id = data.aws_ami.amazon_linux.id
instance_type = "t3.micro"
vpc_security_group_ids = [aws_security_group.web.id]
user_data = base64encode(<<-EOF
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Web Server</h1>" > /var/www/html/index.html
EOF
)
lifecycle {
create_before_destroy = true
}
}
# Auto-scaling group
resource "aws_autoscaling_group" "web" {
desired_capacity = 2
max_size = 4
min_size = 1
vpc_zone_identifier = [for s in aws_subnet.public : s.id]
launch_template {
id = aws_launch_template.web.id
version = "$Latest"
}
tag {
key = "Name"
value = "web-server"
propagate_at_launch = true
}
}
This example demonstrates:
- Using data sources to find AMIs and availability zones
- Creating multiple subnets with
for_each - Setting up security groups with tier-to-tier access
- Using launch templates for scalable instances
- Lifecycle rules for zero-downtime updates
Understanding resources and data sources gives you the building blocks for any infrastructure. Next, we'll explore state management - how Terraform tracks what it has created and how to handle state safely.
Found an issue?