Working with Artifacts and Dependencies

Learn to manage build artifacts, optimize dependency handling, and create efficient pipelines that share data between workflow jobs.

One of the most frustrating aspects of CI/CD is dealing with slow builds caused by repeatedly downloading the same dependencies, or worse, having builds fail because of flaky network connections during dependency installation. You've probably waited impatiently as npm downloads hundreds of packages for the third time today, or watched a deployment fail because a package registry was temporarily unavailable.

GitHub Actions provides powerful tools to eliminate these problems through intelligent caching and artifact management. When you understand how to preserve build outputs and share data efficiently between jobs, your workflows become faster, more reliable, and less prone to external failures.

Understanding the Isolation Challenge

Every GitHub Actions job runs in a completely fresh virtual machine with no memory of previous executions. This isolation provides consistency and prevents workflows from interfering with each other, but it creates a challenge: how do you share data between jobs that need to work together?

Here's how data flows between isolated jobs using artifacts:

Job 1: Build Application
┌─────────────────────────┐
│  Fresh VM Environment   │
│  ┌─────────────────┐    │
│  │ 1. Checkout     │    │
│  │ 2. Install deps │    │
│  │ 3. Run tests    │    │
│  │ 4. Build app    │    │
│  │ 5. Create files │    │
│  └─────────────────┘    │
│           │             │
│           ▼             │
│    ┌─────────────┐      │
│    │   dist/     │      │ ← Build outputs
│    │   reports/  │      │
│    └─────────────┘      │
└─────────────────────────┘
           │
           ▼ Upload Artifact
    ┌─────────────────┐
    │  GitHub Storage │  ← Persistent storage
    │   "build-123"   │
    └─────────────────┘
           │
           ▼ Download Artifact
Job 2: Deploy Application
┌─────────────────────────┐
│  Fresh VM Environment   │  ← Completely new VM
│  ┌─────────────────┐    │
│  │ 1. Checkout     │    │
│  │ 2. Download     │    │
│  │    artifacts    │    │
│  │ 3. Deploy files │    │
│  └─────────────────┘    │
│    ┌─────────────┐      │
│    │   dist/     │      │ ← Same files as Job 1
│    │   reports/  │      │   but in new VM
│    └─────────────┘      │
└─────────────────────────┘

Without artifacts, each job would need to rebuild everything from scratch:

Without Artifacts (Inefficient):
Job 1: Test        Job 2: Build       Job 3: Deploy
┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│ 1. Checkout  │   │ 1. Checkout  │   │ 1. Checkout  │
│ 2. Install   │   │ 2. Install   │   │ 2. Install   │
│ 3. Test      │   │ 3. Build     │   │ 3. Build     │ ← Rebuild again!
└──────────────┘   └──────────────┘   │ 4. Deploy    │
                                      └──────────────┘

With Artifacts (Efficient):
Job 1: Test & Build    Job 2: Deploy
┌──────────────────┐   ┌──────────────┐
│ 1. Checkout      │   │ 1. Download  │
│ 2. Install       │   │    artifacts │
│ 3. Test          │   │ 2. Deploy    │ ← Use existing build
│ 4. Build         │───┤──────────────│
│ 5. Upload        │   └──────────────┘
└──────────────────┘

Think of artifacts as a way to preserve the valuable outputs of your build process - the compiled code, test reports, documentation, or deployment packages that represent the tangible results of your automation.

Creating and Using Build Artifacts

Let's start with a practical example that shows how to create and consume build artifacts in a real workflow:

name: Build and Deploy Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '18'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run tests
        run: npm test

      - name: Build application
        run: npm run build

      # Create a deployment package with everything needed
      - name: Create deployment package
        run: |
          mkdir -p deploy-package
          cp -r dist/* deploy-package/
          cp package.json deploy-package/
          cp package-lock.json deploy-package/

          # Add deployment metadata
          echo "{
            \"version\": \"${{ github.sha }}\",
            \"buildTime\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\",
            \"branch\": \"${{ github.ref_name }}\"
          }" > deploy-package/build-info.json

      # Upload the complete deployment package
      - name: Upload deployment package
        uses: actions/upload-artifact@v4
        with:
          name: deployment-package
          path: deploy-package/
          retention-days: 30

      # Also upload test results for analysis
      - name: Upload test results
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: test-results
          path: |
            coverage/
            test-results.xml
          retention-days: 7

  deploy:
    runs-on: ubuntu-latest
    needs: build
    if: github.ref == 'refs/heads/main'
    environment: staging

    steps:
      - name: Download deployment package
        uses: actions/download-artifact@v4
        with:
          name: deployment-package
          path: ./package

      - name: Verify package contents
        run: |
          echo "Deployment package contents:"
          ls -la package/

          echo "Build information:"
          cat package/build-info.json

          # Verify required files exist
          if [ ! -f "package/index.html" ]; then
            echo "Error: Missing index.html in deployment package"
            exit 1
          fi

      - name: Deploy to staging
        run: |
          echo "Deploying to staging environment..."
          # Your deployment commands would go here
          # rsync -avz package/ user@staging-server:/var/www/app/

This workflow demonstrates several key concepts:

The build job creates a complete deployment package with all necessary files
Metadata is included in the package to track what's being deployed
Different artifacts have different retention periods based on their purpose
The deployment job downloads and verifies the package before deploying

Smart Dependency Caching

Dependency caching can dramatically improve build performance, but it needs to be done correctly to avoid cache invalidation issues or stale dependencies. Here's how to implement intelligent caching:

name: Optimized Build with Smart Caching

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '18'
          # Built-in npm cache that handles most scenarios
          cache: 'npm'

      # For more complex caching scenarios
      - name: Cache node modules
        uses: actions/cache@v3
        with:
          path: node_modules
          # Cache key includes package-lock.json hash
          # When dependencies change, cache automatically invalidates
          key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}
          # Fallback to partial cache matches if exact key not found
          restore-keys: |
            ${{ runner.os }}-node-

      - name: Install dependencies
        run: |
          # Only install if node_modules doesn't exist or is outdated
          if [ ! -d "node_modules" ] || [ "package-lock.json" -nt "node_modules/.package-lock.json" ]; then
            echo "Installing dependencies..."
            npm ci
            cp package-lock.json node_modules/.package-lock.json
          else
            echo "Using cached dependencies"
          fi

      # Cache other build artifacts that are expensive to recreate
      - name: Cache build cache
        uses: actions/cache@v3
        with:
          path: .next/cache
          key: ${{ runner.os }}-nextjs-${{ hashFiles('package-lock.json') }}-${{ hashFiles('**/*.js', '**/*.jsx', '**/*.ts', '**/*.tsx') }}
          restore-keys: |
            ${{ runner.os }}-nextjs-${{ hashFiles('package-lock.json') }}-
            ${{ runner.os }}-nextjs-

The key to effective caching is understanding what changes and when. Package manager caches (like npm's) should be invalidated when lockfiles change. Build caches might need more sophisticated keys that include source code hashes.

Handling Multiple Environments

Real projects often need to create different builds for different environments. Here's how to handle this efficiently:

name: Multi-Environment Build

on:
  push:
    branches: [main, develop]

jobs:
  build:
    runs-on: ubuntu-latest

    strategy:
      matrix:
        environment: [staging, production]
        include:
          - environment: staging
            api_url: https://api.staging.example.com
            debug: true
          - environment: production
            api_url: https://api.example.com
            debug: false

    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '18'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Build for ${{ matrix.environment }}
        run: npm run build
        env:
          NODE_ENV: production
          API_URL: ${{ matrix.api_url }}
          DEBUG: ${{ matrix.debug }}

      - name: Upload ${{ matrix.environment }} build
        uses: actions/upload-artifact@v4
        with:
          name: build-${{ matrix.environment }}
          path: dist/

The matrix strategy creates separate builds for each environment in parallel, with environment-specific configuration applied during the build process.

Sometimes you need to share artifacts between different workflows. For example, a build workflow might create artifacts that a separate deployment workflow needs:

# build.yml workflow
name: Build Application

on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      # ... build steps ...

      - name: Upload build artifacts
        uses: actions/upload-artifact@v4
        with:
          name: build-${{ github.sha }}
          path: dist/
          retention-days: 90

# deploy.yml workflow
name: Deploy Application

on:
  workflow_dispatch:
    inputs:
      build_sha:
        description: 'Git SHA of build to deploy'
        required: true

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      # Download artifacts from a different workflow run
      - name: Download build artifacts
        uses: actions/download-artifact@v4
        with:
          name: build-${{ github.event.inputs.build_sha }}
          path: dist/
          run-id: ${{ github.event.inputs.build_sha }}

      # ... deployment steps ...

This pattern separates build and deploy concerns while ensuring you can deploy any previously built version.

Optimizing Artifact Storage

Artifacts consume storage space and can impact your GitHub Actions usage limits. Here's how to optimize artifact management:

- name: Upload artifacts with smart retention
  uses: actions/upload-artifact@v4
  with:
    name: build-artifacts
    path: dist/
    # Keep production builds longer than development builds
    retention-days: ${{ github.ref == 'refs/heads/main' && 90 || 7 }}
    # Compress artifacts to save space
    compression-level: 9

# Clean up old artifacts automatically
- name: Clean up old artifacts
  uses: actions/github-script@v7
  with:
    script: |
      const artifacts = await github.rest.actions.listWorkflowRunArtifacts({
        owner: context.repo.owner,
        repo: context.repo.repo,
        run_id: context.runId
      });

      // Delete artifacts older than 30 days
      const thirtyDaysAgo = new Date(Date.now() - 30 * 24 * 60 * 60 * 1000);

      for (const artifact of artifacts.data.artifacts) {
        if (new Date(artifact.created_at) < thirtyDaysAgo) {
          await github.rest.actions.deleteArtifact({
            owner: context.repo.owner,
            repo: context.repo.repo,
            artifact_id: artifact.id
          });
        }
      }

Handling Large Dependencies

Some projects have unusually large dependencies or datasets that don't fit well in normal caching strategies. Here's how to handle them:

- name: Cache large dependencies
  uses: actions/cache@v3
  with:
    path: |
      large-dataset/
      ml-models/
    key: large-deps-${{ hashFiles('data-version.txt') }}
    restore-keys: large-deps-

- name: Download large dependencies if needed
  run: |
    if [ ! -d "large-dataset" ]; then
      echo "Downloading large dataset..."
      wget -O dataset.tar.gz https://example.com/large-dataset.tar.gz
      tar -xzf dataset.tar.gz
      rm dataset.tar.gz
    else
      echo "Using cached large dataset"
    fi

For very large files, consider using Git LFS or external storage services like AWS S3, then downloading only when needed.

Debugging Artifact Issues

When artifacts don't work as expected, here's how to troubleshoot:

- name: Debug artifact creation
  run: |
    echo "Files to be uploaded:"
    find dist/ -type f -ls

    echo "Total size:"
    du -sh dist/

- name: Upload with debugging
  uses: actions/upload-artifact@v4
  with:
    name: debug-build
    path: dist/
  continue-on-error: true

- name: Check upload status
  run: |
    if [ $? -eq 0 ]; then
      echo "Artifact upload successful"
    else
      echo "Artifact upload failed"
      ls -la dist/
    fi

Performance Best Practices

Here are practical tips to make your artifact and dependency management faster and more reliable:

Parallel Artifact Operations:

- name: Upload multiple artifacts in parallel
  uses: actions/upload-artifact@v4
  with:
    name: logs
    path: logs/

- name: Upload test results
  uses: actions/upload-artifact@v4
  with:
    name: test-results
    path: test-results/

Smart Cache Keys:

# Good: includes all relevant files in cache key
key: ${{ runner.os }}-deps-${{ hashFiles('package-lock.json', 'requirements.txt') }}

# Better: includes environment variables that affect dependencies
key: ${{ runner.os }}-deps-${{ env.NODE_VERSION }}-${{ hashFiles('package-lock.json') }}

Avoid Cache Thrashing:

# Use restore-keys to provide fallback cache options
restore-keys: |
  ${{ runner.os }}-deps-${{ env.NODE_VERSION }}-
  ${{ runner.os }}-deps-

Understanding artifact and dependency management transforms slow, unreliable workflows into fast, predictable automation. In the next section, we'll explore how to handle environment variables and secrets securely, which is crucial for building workflows that work safely across different environments.

Part of: Introduction to CI/CD with GitHub Actions

Updated: 2/26/2025

Tags