Platform API and Orchestration Layer
You are designing the orchestration layer for your IDP. A developer clicks 'Create New Service' and behind the scenes it needs to create a GitHub repo, provision a database, set up a CI/CD pipeline, configure monitoring, and register the service in the catalog. How do you architect this?
You are designing the orchestration layer for your IDP. A developer clicks 'Create New Service' and behind the scenes it needs to create a GitHub repo, provision a database, set up a CI/CD pipeline, configure monitoring, and register the service in the catalog. How do you architect this?
The orchestration layer is the engine of your IDP. It takes a high-level intent ('I want a new service') and breaks it into steps that talk to different systems. The architecture I would use: 1. Platform API: A single API that accepts requests from the UI, CLI, or Backstage. It validates the request, checks permissions, and kicks off a workflow. This should be a proper API with versioning, not a collection of scripts. 2. Workflow engine: Use something like Argo Workflows, Temporal, or even a simple state machine to orchestrate the steps. Each step (create repo, provision DB, set up CI) is an independent action. The workflow engine handles ordering, retries, and partial failures. 3. Resource providers: Each backend system (GitHub, AWS, ArgoCD) gets its own provider with a standard interface. This is where Crossplane or Terraform do the heavy lifting. The orchestrator does not talk to AWS directly - it tells the Terraform provider 'create this database' and the provider handles it. 4. State tracking: Every request gets a status page. The developer can see 'repo created, database provisioning, CI pipeline pending.' If something fails at step 3, you need to be able to retry from that point, not start over. The hard part is partial failures. What happens when the repo is created but the database fails? You need compensating actions (delete the repo) or the ability to resume from the failed step. Most teams get this wrong on v1 and have to rebuild it. For the Kubernetes-native approach, Crossplane with compositions works well. You define a composite resource that represents 'a service' and Crossplane reconciles all the child resources. The Kubernetes reconciliation loop handles retries for free.
This is a system design question that separates people who have built platforms from people who have read about them. The orchestration layer is where most IDP projects struggle. Weak candidates describe a linear script. Strong candidates immediately talk about partial failures, idempotency, and state tracking. The best candidates have battle scars from a workflow that failed halfway through and left orphaned resources everywhere.
Crossplane Composition defining what 'a service' means in your platform
Developer creates a full service with one YAML file
- Building a linear script that breaks halfway through and leaves orphaned resources with no way to clean up
- Not tracking the state of each step, so when something fails you have to manually figure out what was already created
- Trying to build the orchestration from scratch instead of using existing workflow engines like Temporal or Argo Workflows
- The database provisioning step fails after the repo is already created. What happens next?
- How do you make this orchestration idempotent so retrying a failed request does not create duplicate resources?
- Why would you choose Crossplane over Terraform for this? When would Terraform be the better choice?
- How do you handle day-2 operations like scaling the database or rotating credentials through this same layer?
More Platform Engineering interview questions
Also worth your time on this topic
Building an Internal Developer Platform from Scratch
A step-by-step checklist for designing and building an internal developer platform (IDP) that gives your engineers self-service access to infrastructure, environments, and deployments without filing tickets.
60-120 minutes
Internal Developer Platform Purpose
Your team keeps filing tickets for things like creating new services, setting up databases, and getting access to staging environments. Your CTO asks you to fix this. What would you build, and why?
junior
Building an Internal Developer Platform from Scratch
A step-by-step guide to designing and building an internal developer platform that gives your teams self-service infrastructure, faster deployments, and fewer tickets to the platform team.