Skip to main content
2026-03-04
12 min read

Build vs Buy in 2026: What Still Makes Sense to Build In-House?

Build vs Buy in 2026: What Still Makes Sense to Build In-House?

TLDR

  • Default to buy unless you have a specific, compelling reason to build
  • Never build: Identity management, secrets management, payment processing
  • Almost always buy: CI/CD, observability, feature flags, load balancing, message queues
  • Consider building only when: You're 50+ engineers, have 2+ FTEs to dedicate, and the platform is critical to your competitive advantage
  • True cost formula: Initial build (3-6 engineer-months) + ongoing maintenance (20-40% of original team capacity) + opportunity cost
  • Break-even timeline: Most custom infrastructure takes 18-36 months to break even, if it ever does

The Problem with "We'll Just Build It"

"How hard can it be to build our own CI/CD platform? Jenkins is open source, and we can customize it exactly how we want."

This statement, or variations of it, has burned millions of dollars and delayed countless product launches. I've watched teams of talented engineers spend 6 months building internal platforms that end up being worse than off-the-shelf solutions while costing 3-5× more to maintain.

The decision to build vs buy infrastructure tooling is rarely about technical capability—you can build almost anything given enough time. The real questions are:

  1. What is this decision actually costing in engineer-months?
  2. What product features are we not building while we build infrastructure?
  3. Can we maintain this 2-3 years from now when the original builders have moved on?

Let's break down the math, examine each major infrastructure category, and establish a decision framework that works in 2026.


The True Cost of Building Infrastructure

When engineering leaders estimate the cost of building internal tools, they typically only count the initial build time. This is wildly optimistic.

Realistic Cost Formula

Total Cost = Initial Build + (Annual Maintenance × Years) + Opportunity Cost

Initial Build:

  • Simple tool (CI pipeline, deployment script): 1-2 engineer-months
  • Medium complexity (internal platform, service mesh): 3-6 engineer-months
  • High complexity (identity platform, observability system): 12-24 engineer-months

Ongoing Maintenance (often underestimated):

  • 20-40% of the original development team's capacity
  • Includes: Bug fixes, security patches, dependency upgrades, documentation, user support, new feature requests

Opportunity Cost:

  • What product features didn't get built?
  • What competitive advantages did you miss?
  • At a typical $200K loaded cost per engineer in 2026, every engineer-month costs ~$16,700

Example: Internal CI/CD Platform

Scenario: 20-person engineering team decides to build their own CI/CD platform instead of using GitHub Actions ($4,000/year) or CircleCI ($15,000/year).

Build costs:

  • Initial development: 4 engineer-months = $66,800
  • Year 1 maintenance (30% of 1 FTE): 3.6 months = $60,100/year
  • Year 2 maintenance (as features grow): 4.8 months = $80,200/year
  • Total 2-year cost: $207,100

Buy costs (CircleCI):

  • Year 1: $15,000
  • Year 2: $15,000
  • Total 2-year cost: $30,000

Net loss from building: $177,100 over 2 years

Opportunity cost: 12+ engineer-months that could have been spent on product features, customer requests, or revenue-generating work.

This is the best case scenario where:

  • The build goes smoothly (no scope creep)
  • Only 1 engineer maintains it (usually 2-3 get pulled in)
  • No major incidents require emergency fixes
  • The original builders stay at the company

Infrastructure Decision Framework

Use this flowchart when considering building vs buying infrastructure:

                           START: Need infrastructure component
                                         |
                                         v
                           Is this identity, secrets, or payments?
                                    /         \
                                 YES           NO
                                  |             |
                            ALWAYS BUY          v
                              (Stop)    Is this your competitive advantage?
                                              /         \
                                           YES           NO
                                            |             |
                                            v             v
                              Do you have 50+ engineers?   STRONGLY BUY
                                      /         \             (Stop)
                                   YES           NO
                                    |             |
                                    v             v
                    Can you dedicate 2+ FTEs?   BUY
                          /         \           (Stop)
                       YES           NO
                        |             |
                        v             v
              CONSIDER BUILDING      BUY
               (Proceed to TCO)    (Stop)
                        |
                        v
              Calculate full TCO
              (Initial + 3yr maintenance)
                        |
                        v
              Is TCO < 3× buy cost?
                    /         \
                 YES           NO
                  |             |
                  v             v
         PROCEED WITH BUILD    BUY
         (Document decision)  (Stop)

Category-by-Category Analysis

1. Identity & Access Management

Verdict: ALWAYS BUY

Why never build:

  • Security is too critical
  • Compliance requirements (SOC2, GDPR, HIPAA) are brutal
  • OAuth flows, MFA, SSO, password reset flows have dozens of edge cases
  • One vulnerability can destroy your company

Best options:

  • Auth0: $240-$2,400/year (10K-100K MAU)
  • Auth0: $500-$5K/year (10K-25K MAU)
  • Okta: $2-$6/user/month for B2B
  • AWS Cognito: $0.0055/MAU (first 50K free)
  • Clerk: $25-$400/month (2.5K-50K MAU)

Cost to build:

  • Initial: 6-12 engineer-months ($100K-$200K)
  • Annual maintenance: $80K-$150K
  • Security audits: $50K-$100K/year
  • Total 3-year cost: $400K-$650K

Break-even math: You'd need 100K+ monthly active users to justify the cost, and even then, you're taking on massive security risk.


2. CI/CD Pipeline

Verdict: ALMOST ALWAYS BUY

Why buying makes sense:

  • Mature products with years of edge case handling
  • Integrations with every tool you'll ever need
  • Security scanning, compliance features built-in
  • Zero maintenance burden

Best options:

  • GitHub Actions: $0.008/minute (free for public repos, ~$4K/year for 30-person team)
  • CircleCI: $15K-$40K/year for 20-50 engineers
  • GitLab CI: Included with GitLab ($19-$99/user/month)
  • Buildkite: $15-$40/seat/month for self-hosted agents

When to consider building:

  • You have extremely specialized build requirements (embedded systems, custom hardware)
  • You're running 500+ engineers and spending $200K+/year on CI
  • Your build artifacts have extreme security/compliance requirements

Cost to build:

  • Initial: 4-6 engineer-months ($67K-$100K)
  • Annual maintenance: $60K-$100K (bug fixes, runner maintenance, integrations)
  • Total 3-year cost: $250K-$400K

Real-world example: A Series C startup with 80 engineers spent 6 months building a Jenkins-based platform. After 18 months, they migrated to GitHub Actions. Total waste: ~$300K in engineer time + 6 months of opportunity cost.


3. Observability & Monitoring

Verdict: ALMOST ALWAYS BUY

Why buying makes sense:

  • Data retention, querying, and visualization are solved problems
  • Enterprise features (alerting, on-call, incident management) are complex
  • Scale is expensive to build (time-series databases at scale are hard)

Best options:

  • Datadog: $15-$40/host/month + metered usage ($30K-$100K/year for 30-person team)
  • New Relic: $25-$99/user/month
  • Honeycomb: $20K-$60K/year for 20-50 engineers
  • Grafana Cloud: $0-$299/month for basic usage

When to consider building:

  • You're spending $300K+/year on observability and growing fast
  • You have 100+ engineers and need custom workflows
  • You have specialized compliance requirements (data sovereignty)

Cost to build (metrics + logs + traces):

  • Initial: 12-18 engineer-months ($200K-$300K)
  • Annual maintenance: $150K-$250K (storage costs, query optimization, dashboard maintenance)
  • Total 3-year cost: $650K-$1M

Reality check: Uber, Netflix, and Shopify built their own observability platforms. They each have 500-2,000+ engineers. If you're not at that scale, buy.


4. Secrets Management

Verdict: ALWAYS BUY

Why never build:

  • Security is critical (one leak can be catastrophic)
  • Rotation, audit logs, access control are complex
  • Compliance requirements are strict

Best options:

  • HashiCorp Vault: $0.30-$2.50/hour per cluster (~$15K-$40K/year for multi-cluster setup)
  • AWS Secrets Manager: $0.40/secret/month + API calls
  • Doppler: $0-$249/month (5-50 users)
  • 1Password Secrets Automation: $7.99/user/month

Cost to build:

  • Initial: 4-8 engineer-months ($67K-$133K)
  • Annual maintenance: $50K-$80K
  • Security audits: $30K-$50K/year
  • Total 3-year cost: $250K-$400K

Break-even math: Never. Even at 100+ engineers, managed solutions cost $30K-$60K/year. You're spending 5-10× more to build and taking on massive security risk.


5. Feature Flags / Feature Management

Verdict: BUY (unless you're Netflix)

Why buying makes sense:

  • Looks simple, gets complex fast (targeting rules, gradual rollouts, kill switches)
  • SDKs for every language take time to build and maintain
  • Analytics, audit logs, permissions are table stakes

Best options:

  • LaunchDarkly: $10-$20/seat/month ($3K-$15K/year for 20-50 engineers)
  • Split: $33-$167/seat/month
  • Unleash: Open source + self-hosted or $80-$300/month hosted
  • Flagsmith: Open source or $45-$450/month hosted

When to consider building:

  • You're spending $50K+/year on feature flags
  • You need millisecond latency for flag evaluation at massive scale
  • Feature flagging is core to your product (you sell to engineers)

Cost to build:

  • Initial: 2-4 engineer-months ($33K-$67K)
  • Annual maintenance: $30K-$50K (SDK updates, UI improvements, analytics)
  • Total 3-year cost: $125K-$217K

Break-even: At ~$15K/year for a managed service, you'd break even around year 8-14. By then, LaunchDarkly will have added dozens of features you'll need to rebuild.


6. Load Balancing / API Gateway

Verdict: USUALLY BUY

Why buying makes sense:

  • Cloud providers have mature, tested solutions
  • DDoS protection, TLS termination, health checks are complex
  • Global distribution requires infrastructure you don't have

Best options:

  • AWS ALB/NLB: $0.0225/hour + data processed (~$200-$1,000/month)
  • Cloudflare Load Balancing: $5/month base + $0.50/month per origin
  • Kong Gateway: Open source or $250-$1,500/month hosted
  • Traefik: Open source (self-hosted)

When to consider building:

  • You have very specific routing logic (multi-tenant with complex rules)
  • You're spending $30K+/year and have in-house networking expertise
  • You need sub-millisecond latency for routing decisions

Cost to build (custom proxy/gateway):

  • Initial: 3-5 engineer-months ($50K-$83K)
  • Annual maintenance: $40K-$70K (performance tuning, security patches)
  • Total 3-year cost: $170K-$293K

Compromise: Use open source (Traefik, nginx, HAProxy) with minimal customization. Only build if you're Cloudflare or Fastly.


7. Message Queue / Event Bus

Verdict: USUALLY BUY

Why buying makes sense:

  • Data loss is catastrophic
  • Scaling, replication, failover are complex
  • Operational burden is high (disk management, monitoring, upgrades)

Best options:

  • AWS SQS/SNS: $0.40-$0.50 per million requests (~$50-$500/month)
  • Confluent Cloud (Kafka): $1-$10K/month depending on throughput
  • AWS EventBridge: $1/million events
  • RabbitMQ Cloud (CloudAMQP): $19-$3,999/month

When to consider self-hosting:

  • You're spending $50K+/year on managed Kafka
  • You have 5+ services producing high-volume events
  • You have dedicated infrastructure/SRE team

Cost to build (custom message bus):

  • Don't. Seriously, don't.
  • If you must: Initial 8-16 engineer-months ($133K-$267K)
  • Annual maintenance: $100K-$200K
  • Total 3-year cost: $433K-$867K

Compromise: Self-host open source (Kafka, RabbitMQ, NATS) when you hit $30K-$50K/year in managed costs. Don't build from scratch.


8. Internal Developer Platform (IDP)

Verdict: BUILD ONLY AT 50+ ENGINEERS

Why this might make sense:

  • Standardizes deployment, reduces cognitive load
  • Can be competitive advantage for engineering velocity
  • Off-the-shelf IDPs often don't fit your workflow

When to build:

  • You have 50+ engineers and growing
  • You can dedicate 2-3 full-time engineers to build/maintain it
  • Your deployment workflow is complex enough to justify it
  • Leadership is committed for 18+ months

When to buy/use off-the-shelf:

  • Heroku/Render: $0-$7K/month for small teams
  • Platform.sh: $50-$2,500/month
  • Northflank: $20-$1,000/month
  • Railway: $5-$500/month

Cost to build:

  • Initial: 6-12 engineer-months ($100K-$200K)
  • Annual maintenance: $120K-$250K (2-3 FTEs maintaining, improving, supporting)
  • Total 3-year cost: $460K-$950K

ROI calculation:

  • If your IDP saves each engineer 2 hours/week
  • At 50 engineers: 100 hours/week = 5,200 hours/year
  • At $100/hour loaded cost: $520K/year in productivity gains
  • Break-even: ~1-2 years (if the productivity claims hold)

Reality check: Most teams overestimate productivity gains by 2-3×. Budget for 3-4 year break-even.

Real-world example: A Series B company with 60 engineers built an IDP. After 2 years:

  • Engineers saved ~45 minutes/week (not 2 hours)
  • Maintenance took 2.5 FTEs (not 1.5)
  • Net productivity gain: ~$180K/year
  • Total cost: $250K/year
  • Net loss: $70K/year (but they claim it's "worth it for developer experience")

The Five Common Mistakes

1. Only Counting Initial Build Time

Most teams estimate build time but forget:

  • Ongoing maintenance (20-40% of original team)
  • Security patches and dependency updates
  • Documentation and onboarding new engineers
  • Feature requests from internal users
  • Migration costs when you inevitably replace it

Fix: Multiply your build estimate by 3× for a 3-year TCO.


2. Underestimating "Done"

"Done" means:

  • Production-ready with error handling
  • Monitored with alerts
  • Documented (architecture, runbooks, user guides)
  • Tested (unit, integration, load tests)
  • Secure (penetration tested, security reviewed)
  • Compliant (audit logs, access controls)

Most POCs are 20-30% of "done."

Fix: If your POC took 3 weeks, budget 10-15 weeks total.


3. Ignoring Opportunity Cost

Every hour spent building infrastructure is an hour not spent on:

  • Customer-facing features
  • Bug fixes that impact revenue
  • Performance improvements
  • Technical debt reduction

Fix: Ask "What product work are we NOT doing?" for every infrastructure project.


4. Building for Imagined Future Scale

"We'll need to support 1,000 requests/second eventually, so let's build for that now."

This leads to:

  • Overengineered solutions
  • Longer build times
  • Higher maintenance costs
  • Building for problems you might never have

Fix: Build for 3× current scale, not 100×.


5. Not Planning for the Original Builders Leaving

What happens when:

  • The engineer who built it gets promoted/leaves?
  • The team that maintains it disbands?
  • No one remembers why certain decisions were made?

Fix:

  • Document architecture decisions
  • Rotate 2-3 engineers through maintenance
  • Have a "replace with SaaS" exit plan

When Building Actually Makes Sense

Despite everything above, there are legitimate reasons to build infrastructure in-house:

1. Core Competitive Advantage

If the infrastructure is your product or a key differentiator:

  • Stripe builds payment processing (they sell payment infrastructure)
  • Vercel builds deployment platforms (they sell deployment infrastructure)
  • DataDog builds observability (they sell observability)

For everyone else: If your competitive advantage is your product/service, buy infrastructure.


2. Extreme Scale

When you're spending $300K+/year on a single tool and have:

  • 100+ engineers
  • Dedicated platform/infrastructure team
  • Leadership buy-in for multi-year investment

Examples:

  • Uber built their own observability platform (they have 2,000+ engineers)
  • Netflix built Chaos Engineering tools (they pioneered the space)
  • Spotify built their own deployment platform (Backstage, which they then open-sourced)

Key difference: These companies had 500-2,000+ engineers when they built these systems.


3. Regulatory/Compliance Requirements

When you have:

  • Data sovereignty requirements (data can't leave certain geographic regions)
  • Compliance needs that off-the-shelf tools can't meet
  • Security requirements beyond what vendors offer

Even then, check if vendors have compliant offerings before building. Most major SaaS tools now have SOC2, ISO 27001, HIPAA, and regional data centers.


4. Integration Complexity

When:

  • You have extremely specific workflows
  • Off-the-shelf tools require so much customization that you're essentially rebuilding them anyway
  • The integration tax of using multiple tools is higher than building one unified system

Warning: This is the most abused justification. Most "unique workflows" aren't as unique as you think.


Real-World Scenarios

Scenario A: Series A Startup (15 Engineers)

Current state:

  • Using Heroku ($2K/month), GitHub Actions ($400/month), Datadog ($3K/month)
  • CTO wants to "save money" by moving to Kubernetes + self-hosted tools

Build path costs:

  • Kubernetes setup: 2-3 months for 2 engineers = $67K-$100K
  • Annual maintenance: 30% of 1 engineer = $60K/year
  • Migration risk: 4-8 weeks of reduced velocity

Buy path costs:

  • Heroku + GitHub Actions + Datadog: ~$65K/year

Verdict: BUY

At 15 engineers, every engineer-month counts. The "savings" from self-hosting won't materialize for 2-3 years, and you'll sacrifice velocity when you can least afford it.

Better move: Optimize Datadog usage, consider Heroku alternatives (Render, Railway), but stay on managed platforms.


Scenario B: Series B Startup (60 Engineers, $20M ARR)

Current state:

  • Spending $120K/year on infrastructure tools
  • Growing 50% year-over-year
  • CTO wants to build internal developer platform

Build path costs:

  • IDP build: 8-12 months for 2-3 engineers = $267K-$500K
  • Annual maintenance: 2-3 FTEs = $400K-$600K/year
  • Total 3-year cost: $1.5M-$2.3M

Buy path costs:

  • Continue with current tools: ~$400K/year (accounting for growth)
  • Total 3-year cost: $1.2M

Productivity gains needed to break even:

  • Need to save 8-12 engineer-months/year (13-20% of team capacity)

Verdict: MAYBE

At 60 engineers, you're in the gray zone. If:

  • You have 2-3 engineers excited to build/own this long-term
  • Your deployment process is genuinely complex and slowing teams down
  • Leadership is committed for 3+ years

Then it might make sense. But be honest about the productivity gains—most teams overestimate by 2-3×.


Scenario C: Series C Startup (200 Engineers, $100M ARR)

Current state:

  • Spending $500K/year on infrastructure tools
  • Have dedicated platform team (5 engineers)
  • Complex microservices architecture

Build path costs:

  • Custom observability platform: 18-24 months for 3-4 engineers = $900K-$1.6M
  • Annual maintenance: 4-5 FTEs = $800K-$1M/year
  • Total 3-year cost: $3.3M-$4.6M

Buy path costs:

  • Datadog/New Relic at scale: ~$300K-$500K/year
  • Total 3-year cost: $900K-$1.5M

Verdict: STILL PROBABLY BUY

Even at 200 engineers and $500K/year spend, building custom observability costs 2-4× more over 3 years.

When to build: If you're spending $1M+/year on a single tool category AND have specific needs that vendors can't meet.


Scenario D: Enterprise (500+ Engineers)

Current state:

  • Spending $2M+/year on infrastructure
  • Have dedicated platform org (20-30 engineers)
  • Complex compliance requirements

Verdict: BUILD SELECTIVELY

At this scale:

  • Building custom internal platforms makes sense
  • You have the resources to maintain them long-term
  • The cost savings and customization justify the investment

Still buy:

  • Identity (Auth0, Okta)
  • Secrets management (Vault, AWS Secrets Manager)
  • Payment processing (Stripe, Adyen)

Consider building:

  • Internal developer platforms
  • Custom observability pipelines (not the full stack)
  • Deployment orchestration
  • Service mesh configuration management

The Decision Template

Use this template when evaluating build vs buy:

## [Tool/Platform Name] Build vs Buy Decision

### Problem Statement
- What problem are we solving?
- Who is impacted? (How many engineers/teams?)
- What is the current pain point? (Be specific with metrics)

### Build Option
- Initial build time: ___ engineer-months
- Initial cost: $___
- Annual maintenance: ___ engineer-months/year = $___/year
- Total 3-year cost: $___
- Key risks:
  1. ...
  2. ...

### Buy Option
- Tool: ___
- Annual cost: $___/year
- Total 3-year cost: $___
- Limitations:
  1. ...
  2. ...

### Productivity Impact
- Build: Saves ___ hours/engineer/week (be conservative)
- Buy: Saves ___ hours/engineer/week
- Net difference: ___ hours/week for ___ engineers = $___/year

### Decision
- [ ] Build (justify why)
- [ ] Buy (which vendor?)
- [ ] Defer (not critical now)

### Success Metrics (if building)
- Adoption: ___% of engineers using it by month 6
- Time savings: ___ hours/week measured after 3 months
- Maintenance cost: <___% of original build team's capacity
- Exit criteria: If we don't hit X metric by month 12, we migrate to [SaaS option]

Key Takeaways

  1. Default to buy unless you have a compelling, specific reason to build
  2. Never build: Identity, secrets management, payment processing
  3. Almost always buy: CI/CD, observability, feature flags, load balancing, message queues
  4. Consider building at scale: Internal developer platforms (50+ engineers), custom observability pipelines (200+ engineers)
  5. Calculate full TCO: Initial build + 3 years of maintenance (20-40% of original team)
  6. Be honest about productivity gains: Most teams overestimate by 2-3×
  7. Plan for the builders leaving: Documentation, rotation, and SaaS exit plans are critical
  8. Opportunity cost matters: Every hour on infrastructure is an hour not on product

The Bottom Line

Building infrastructure in-house is expensive. The true cost is almost always 3-5× higher than initial estimates, and the opportunity cost is rarely accounted for.

In 2026, the SaaS ecosystem is mature enough that 95% of engineering teams should buy rather than build infrastructure. The 5% that should build are:

  1. Infrastructure companies (your product IS infrastructure)
  2. Massive scale (500+ engineers, $1M+/year spend on a single tool)
  3. Unique compliance (data sovereignty, extreme security requirements)

For everyone else: Buy the infrastructure, build the product. Your customers don't care if you built your own CI/CD platform—they care about the product you're selling them.

The best infrastructure is the infrastructure you don't have to think about.

Published: 2026-03-04

Found an issue?