Network Automation

Automating network configuration, deployment, and management using infrastructure as code and configuration management tools.

Manual network configuration doesn't scale. As your infrastructure grows, automating network setup, monitoring, and maintenance becomes essential for consistency, reliability, and efficiency. This section shows you how to treat network configuration as code.

Prerequisites

  • Understanding of networking concepts from previous sections
  • Basic familiarity with YAML, JSON, or similar configuration formats
  • Experience with command-line tools and scripting

Infrastructure as Code for Networking

Network infrastructure should be version-controlled, repeatable, and testable like application code.

Terraform for Network Infrastructure

Terraform lets you define network infrastructure declaratively:

# main.tf - AWS VPC with subnets
provider "aws" {
  region = var.aws_region
}

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "${var.project_name}-vpc"
    Environment = var.environment
  }
}

resource "aws_subnet" "public" {
  count             = length(var.public_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.public_subnet_cidrs[count.index]
  availability_zone = data.aws_availability_zones.available.names[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name = "${var.project_name}-public-${count.index + 1}"
    Type = "public"
  }
}

resource "aws_subnet" "private" {
  count             = length(var.private_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "${var.project_name}-private-${count.index + 1}"
    Type = "private"
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-igw"
  }
}

resource "aws_nat_gateway" "main" {
  count         = length(aws_subnet.public)
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "${var.project_name}-nat-${count.index + 1}"
  }
}

resource "aws_eip" "nat" {
  count  = length(aws_subnet.public)
  domain = "vpc"

  tags = {
    Name = "${var.project_name}-nat-eip-${count.index + 1}"
  }
}

Define variables for flexibility:

# variables.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "project_name" {
  description = "Name of the project"
  type        = string
}

variable "environment" {
  description = "Environment (dev, staging, prod)"
  type        = string
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "public_subnet_cidrs" {
  description = "CIDR blocks for public subnets"
  type        = list(string)
  default     = ["10.0.1.0/24", "10.0.2.0/24"]
}

variable "private_subnet_cidrs" {
  description = "CIDR blocks for private subnets"
  type        = list(string)
  default     = ["10.0.10.0/24", "10.0.20.0/24"]
}

Deploy the infrastructure:

# Initialize Terraform
terraform init

# Plan the deployment
terraform plan -var="project_name=ecommerce" -var="environment=production"

# Apply the configuration
terraform apply -var="project_name=ecommerce" -var="environment=production"

# View the created resources
terraform show

Security Groups as Code

Define security groups with explicit rules:

# security-groups.tf
resource "aws_security_group" "web" {
  name        = "${var.project_name}-web-sg"
  description = "Security group for web servers"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "HTTP"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "HTTPS"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description     = "SSH from bastion"
    from_port       = 22
    to_port         = 22
    protocol        = "tcp"
    security_groups = [aws_security_group.bastion.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-web-sg"
  }
}

resource "aws_security_group" "database" {
  name        = "${var.project_name}-db-sg"
  description = "Security group for database servers"
  vpc_id      = aws_vpc.main.id

  ingress {
    description     = "PostgreSQL from app servers"
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-db-sg"
  }
}

Configuration Management with Ansible

Ansible automates server configuration, including network settings.

Network Configuration Playbook

# network-setup.yml
---
- name: Configure network settings
  hosts: all
  become: yes
  vars:
    dns_servers:
      - 8.8.8.8
      - 8.8.4.4
    ntp_servers:
      - pool.ntp.org
      - time.google.com

  tasks:
    - name: Configure DNS resolution
      template:
        src: resolv.conf.j2
        dest: /etc/resolv.conf
        backup: yes
      notify: restart networking

    - name: Install network monitoring tools
      package:
        name:
          - netstat-nat
          - tcpdump
          - nmap
          - iftop
        state: present

    - name: Configure firewall rules
      ufw:
        rule: '{{ item.rule }}'
        port: '{{ item.port }}'
        proto: '{{ item.proto }}'
        src: '{{ item.src | default(omit) }}'
      loop:
        - { rule: allow, port: 22, proto: tcp, src: '10.0.0.0/8' }
        - { rule: allow, port: 80, proto: tcp }
        - { rule: allow, port: 443, proto: tcp }
        - { rule: deny, port: 22, proto: tcp }
      notify: reload firewall

    - name: Enable firewall
      ufw:
        state: enabled
        policy: deny
        direction: incoming

  handlers:
    - name: restart networking
      service:
        name: networking
        state: restarted

    - name: reload firewall
      ufw:
        state: reloaded

Template for DNS configuration:

# templates/resolv.conf.j2
# Generated by Ansible
{% for server in dns_servers %}
nameserver {{ server }}
{% endfor %}

search {{ ansible_domain | default('local') }}
options timeout:2 attempts:3

Run the playbook:

# Run against all hosts
ansible-playbook -i inventory network-setup.yml

# Run against specific group
ansible-playbook -i inventory network-setup.yml --limit webservers

# Check what would change without applying
ansible-playbook -i inventory network-setup.yml --check --diff

Load Balancer Configuration

Automate load balancer setup:

# load-balancer.yml
---
- name: Configure nginx load balancer
  hosts: load_balancers
  become: yes
  vars:
    backend_servers:
      - { name: web1, address: '10.0.1.10:8080' }
      - { name: web2, address: '10.0.1.11:8080' }
      - { name: web3, address: '10.0.1.12:8080' }

  tasks:
    - name: Install nginx
      package:
        name: nginx
        state: present

    - name: Generate nginx configuration
      template:
        src: nginx-lb.conf.j2
        dest: /etc/nginx/sites-available/load-balancer
      notify: reload nginx

    - name: Enable load balancer site
      file:
        src: /etc/nginx/sites-available/load-balancer
        dest: /etc/nginx/sites-enabled/load-balancer
        state: link
      notify: reload nginx

    - name: Remove default site
      file:
        path: /etc/nginx/sites-enabled/default
        state: absent
      notify: reload nginx

    - name: Start and enable nginx
      service:
        name: nginx
        state: started
        enabled: yes

  handlers:
    - name: reload nginx
      service:
        name: nginx
        state: reloaded

nginx configuration template:

# templates/nginx-lb.conf.j2
upstream backend_servers {
{% for server in backend_servers %}
    server {{ server.address }};
{% endfor %}
}

server {
    listen 80;
    server_name {{ inventory_hostname }};

    location / {
        proxy_pass http://backend_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Health check
        proxy_connect_timeout 5s;
        proxy_send_timeout 10s;
        proxy_read_timeout 10s;
    }

    location /nginx_status {
        stub_status on;
        access_log off;
        allow 127.0.0.1;
        allow 10.0.0.0/8;
        deny all;
    }
}

Container Network Automation

Automate container networking setup and management.

Docker Compose Network Configuration

# docker-compose.yml - Multi-tier application
version: '3.8'

networks:
  frontend:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.1.0/24
  backend:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.2.0/24
  database:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.3.0/24

services:
  nginx:
    image: nginx:alpine
    ports:
      - '80:80'
      - '443:443'
    networks:
      - frontend
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - api

  api:
    build: ./api
    networks:
      - frontend
      - backend
    environment:
      - DATABASE_URL=postgresql://user:pass@database:5432/app
      - REDIS_URL=redis://cache:6379
    depends_on:
      - database
      - cache

  database:
    image: postgres:13
    networks:
      - database
    environment:
      - POSTGRES_DB=app
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
    volumes:
      - postgres_data:/var/lib/postgresql/data

  cache:
    image: redis:6-alpine
    networks:
      - backend
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:

Kubernetes Network Policies as Code

# k8s-network-policies.yml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-web-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: web
      ports:
        - protocol: TCP
          port: 8080

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-database
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: database
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: api
      ports:
        - protocol: TCP
          port: 5432

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to: []
      ports:
        - protocol: UDP
          port: 53

Apply network policies:

# Apply all network policies
kubectl apply -f k8s-network-policies.yml

# Verify policies are active
kubectl get networkpolicies -n production

# Test connectivity between pods
kubectl exec -it web-pod -- curl api-service:8080/health

Automated Network Monitoring

Set up monitoring that scales with your infrastructure.

Prometheus Network Monitoring Stack

# prometheus-stack.yml
version: '3.8'

networks:
  monitoring:
    driver: bridge

services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - '9090:9090'
    networks:
      - monitoring
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'

  grafana:
    image: grafana/grafana:latest
    ports:
      - '3000:3000'
    networks:
      - monitoring
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/dashboards:/etc/grafana/provisioning/dashboards
      - ./grafana/datasources:/etc/grafana/provisioning/datasources

  node-exporter:
    image: prom/node-exporter:latest
    ports:
      - '9100:9100'
    networks:
      - monitoring
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'

  blackbox-exporter:
    image: prom/blackbox-exporter:latest
    ports:
      - '9115:9115'
    networks:
      - monitoring
    volumes:
      - ./blackbox.yml:/config/blackbox.yml
    command:
      - '--config.file=/config/blackbox.yml'

volumes:
  prometheus_data:
  grafana_data:

Prometheus configuration for network monitoring:

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - 'network-alerts.yml'

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://api.example.com
          - https://app.example.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

Network alerting rules:

# network-alerts.yml
groups:
  - name: network-alerts
    rules:
      - alert: HighNetworkLatency
        expr: probe_duration_seconds > 0.5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: 'High network latency detected'
          description: '{{ $labels.instance }} has latency of {{ $value }}s'

      - alert: ServiceDown
        expr: probe_success == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: 'Service is down'
          description: '{{ $labels.instance }} is not responding'

      - alert: HighNetworkErrors
        expr: rate(node_network_receive_errs_total[5m]) > 0.01
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: 'High network error rate'
          description: '{{ $labels.device }} has {{ $value }} errors/sec'

Automated Network Testing

Include network testing in your CI/CD pipeline.

Network Connectivity Tests

#!/bin/bash
# network-tests.sh - Run in CI pipeline

set -e

echo "Running network connectivity tests..."

# Test internal service connectivity
test_internal_connectivity() {
    local service=$1
    local port=$2

    echo "Testing connectivity to $service:$port"
    if nc -zv $service $port 2>/dev/null; then
        echo "✓ $service:$port is reachable"
    else
        echo "✗ $service:$port is not reachable"
        exit 1
    fi
}

# Test external service connectivity
test_external_connectivity() {
    local url=$1
    local expected_status=$2

    echo "Testing HTTP connectivity to $url"
    status=$(curl -s -o /dev/null -w "%{http_code}" $url)

    if [ "$status" -eq "$expected_status" ]; then
        echo "✓ $url returned $status"
    else
        echo "✗ $url returned $status, expected $expected_status"
        exit 1
    fi
}

# Test DNS resolution
test_dns_resolution() {
    local hostname=$1

    echo "Testing DNS resolution for $hostname"
    if nslookup $hostname > /dev/null 2>&1; then
        echo "✓ $hostname resolves correctly"
    else
        echo "✗ $hostname DNS resolution failed"
        exit 1
    fi
}

# Run tests
test_internal_connectivity "database" "5432"
test_internal_connectivity "redis" "6379"
test_external_connectivity "https://api.example.com/health" "200"
test_dns_resolution "api.example.com"

echo "All network tests passed!"

Performance Testing

#!/usr/bin/env python3
# network-performance-test.py

import requests
import time
import statistics
import sys

def test_response_time(url, iterations=10):
    """Test HTTP response time"""
    times = []

    for i in range(iterations):
        start_time = time.time()
        try:
            response = requests.get(url, timeout=10)
            end_time = time.time()

            if response.status_code == 200:
                times.append(end_time - start_time)
            else:
                print(f"HTTP {response.status_code} for {url}")

        except requests.exceptions.RequestException as e:
            print(f"Request failed: {e}")

    if times:
        avg_time = statistics.mean(times)
        max_time = max(times)
        min_time = min(times)

        print(f"URL: {url}")
        print(f"Average response time: {avg_time:.3f}s")
        print(f"Min response time: {min_time:.3f}s")
        print(f"Max response time: {max_time:.3f}s")

        # Alert if average response time is too high
        if avg_time > 1.0:
            print(f"⚠️  High response time: {avg_time:.3f}s")
            return False

        return True
    else:
        print(f"No successful requests to {url}")
        return False

def main():
    urls = [
        "https://api.example.com/health",
        "https://app.example.com",
        "https://cdn.example.com/status"
    ]

    all_passed = True

    for url in urls:
        if not test_response_time(url):
            all_passed = False

    if not all_passed:
        sys.exit(1)

    print("All performance tests passed!")

if __name__ == "__main__":
    main()

GitOps for Network Configuration

Manage network configurations through Git workflows.

Network Configuration Repository Structure

network-configs/
├── environments/
│   ├── dev/
│   │   ├── terraform/
│   │   ├── ansible/
│   │   └── kubernetes/
│   ├── staging/
│   └── production/
├── modules/
│   ├── vpc/
│   ├── security-groups/
│   └── load-balancer/
├── scripts/
│   ├── deploy.sh
│   └── test.sh
└── .github/
    └── workflows/
        ├── plan.yml
        ├── apply.yml
        └── test.yml

GitHub Actions workflow for network changes:

# .github/workflows/network-deploy.yml
name: Network Infrastructure Deploy

on:
  push:
    branches: [main]
    paths: ['environments/production/**']
  pull_request:
    branches: [main]
    paths: ['environments/production/**']

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.3.0

      - name: Terraform Init
        run: |
          cd environments/production/terraform
          terraform init

      - name: Terraform Plan
        run: |
          cd environments/production/terraform
          terraform plan -out=tfplan

      - name: Save Plan
        uses: actions/upload-artifact@v3
        with:
          name: terraform-plan
          path: environments/production/terraform/tfplan

  test:
    runs-on: ubuntu-latest
    needs: plan
    steps:
      - uses: actions/checkout@v3

      - name: Run Network Tests
        run: |
          chmod +x scripts/test.sh
          ./scripts/test.sh

  apply:
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    needs: [plan, test]
    steps:
      - uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2

      - name: Download Plan
        uses: actions/download-artifact@v3
        with:
          name: terraform-plan
          path: environments/production/terraform/

      - name: Terraform Apply
        run: |
          cd environments/production/terraform
          terraform init
          terraform apply tfplan

Network Automation Best Practices

Version Control Everything

Store all network configurations in version control:

# Network configuration repository
git init network-configs
cd network-configs

# Create directory structure
mkdir -p {environments/{dev,staging,prod},modules/{vpc,security,lb},scripts,docs}

# Track all configuration files
git add .
git commit -m "Initial network configuration structure"

# Use branches for changes
git checkout -b feature/add-monitoring-subnet
# Make changes...
git add .
git commit -m "Add dedicated monitoring subnet"
git push -u origin feature/add-monitoring-subnet

Test Before Applying

Always test network changes:

# Terraform plan before apply
terraform plan -out=network-plan

# Review the plan
terraform show network-plan

# Apply only after review
terraform apply network-plan

Implement Rollback Procedures

Prepare for when things go wrong:

#!/bin/bash
# rollback.sh - Emergency rollback script

echo "Rolling back network changes..."

# Revert to previous Terraform state
terraform state pull > current-state.json
terraform state push previous-state.json

# Revert firewall rules
ufw --force reset
ansible-playbook -i inventory firewall-rollback.yml

echo "Rollback complete. Check connectivity."

In the final section, we'll explore advanced networking concepts including service meshes, network policies, and emerging networking patterns in modern infrastructure.

Network automation reduces human error, improves consistency, and enables rapid, reliable deployments. Start by automating your most common network tasks, then gradually expand to full infrastructure as code.

Happy automating!

Found an issue?