Skip to main content

Platform API and Orchestration Layer

You are designing the orchestration layer for your IDP. A developer clicks 'Create New Service' and behind the scenes it needs to create a GitHub repo, provision a database, set up a CI/CD pipeline, configure monitoring, and register the service in the catalog. How do you architect this?

senior
advanced
Platform Engineering
Question

You are designing the orchestration layer for your IDP. A developer clicks 'Create New Service' and behind the scenes it needs to create a GitHub repo, provision a database, set up a CI/CD pipeline, configure monitoring, and register the service in the catalog. How do you architect this?

Answer

The orchestration layer is the engine of your IDP. It takes a high-level intent ('I want a new service') and breaks it into steps that talk to different systems. The architecture I would use: 1. Platform API: A single API that accepts requests from the UI, CLI, or Backstage. It validates the request, checks permissions, and kicks off a workflow. This should be a proper API with versioning, not a collection of scripts. 2. Workflow engine: Use something like Argo Workflows, Temporal, or even a simple state machine to orchestrate the steps. Each step (create repo, provision DB, set up CI) is an independent action. The workflow engine handles ordering, retries, and partial failures. 3. Resource providers: Each backend system (GitHub, AWS, ArgoCD) gets its own provider with a standard interface. This is where Crossplane or Terraform do the heavy lifting. The orchestrator does not talk to AWS directly - it tells the Terraform provider 'create this database' and the provider handles it. 4. State tracking: Every request gets a status page. The developer can see 'repo created, database provisioning, CI pipeline pending.' If something fails at step 3, you need to be able to retry from that point, not start over. The hard part is partial failures. What happens when the repo is created but the database fails? You need compensating actions (delete the repo) or the ability to resume from the failed step. Most teams get this wrong on v1 and have to rebuild it. For the Kubernetes-native approach, Crossplane with compositions works well. You define a composite resource that represents 'a service' and Crossplane reconciles all the child resources. The Kubernetes reconciliation loop handles retries for free.

Why This Matters

This is a system design question that separates people who have built platforms from people who have read about them. The orchestration layer is where most IDP projects struggle. Weak candidates describe a linear script. Strong candidates immediately talk about partial failures, idempotency, and state tracking. The best candidates have battle scars from a workflow that failed halfway through and left orphaned resources everywhere.

Code Examples

Crossplane Composition defining what 'a service' means in your platform

yaml

Developer creates a full service with one YAML file

yaml
Common Mistakes
  • Building a linear script that breaks halfway through and leaves orphaned resources with no way to clean up
  • Not tracking the state of each step, so when something fails you have to manually figure out what was already created
  • Trying to build the orchestration from scratch instead of using existing workflow engines like Temporal or Argo Workflows
Follow-up Questions
Interviewers often ask these as follow-up questions
  • The database provisioning step fails after the repo is already created. What happens next?
  • How do you make this orchestration idempotent so retrying a failed request does not create duplicate resources?
  • Why would you choose Crossplane over Terraform for this? When would Terraform be the better choice?
  • How do you handle day-2 operations like scaling the database or rotating credentials through this same layer?
Tags
platform-engineering
idp
system-design
crossplane
orchestration
Sponsored
Carbon Ads

More Platform Engineering interview questions

Also worth your time on this topic