All roles
Infrastructure

Site Reliability Engineer

Infrastructure Remote — Global Full-time Senior

Ready to apply? Send us your details and a short note.

Apply for this role

You'll define reliability standards across client environments — SLOs, alerting, capacity, and incident response — and automate the toil away so small teams can run serious infrastructure.

What you'll do

  • Define and track SLOs, error budgets, and alerting
  • Build observability stacks (Prometheus, Grafana, tracing)
  • Lead incident response and blameless postmortems
  • Automate operational workflows and runbooks

What we're looking for

  • 4+ years in SRE or production operations
  • Strong with Kubernetes, observability tooling, and on-call practices
  • Scripting/automation in Python, Go, or Bash

Nice to have

  • Multi-tenant or multi-cloud operations
  • Chaos/resilience testing experience

Think you're a fit?

Apply for this role