Infrastructure

Site Reliability Engineer

Infrastructure Remote — Global Full-time Senior

Ready to apply? Send us your details and a short note.

You'll define reliability standards across client environments — SLOs, alerting, capacity, and incident response — and automate the toil away so small teams can run serious infrastructure.

What you'll do

Define and track SLOs, error budgets, and alerting
Build observability stacks (Prometheus, Grafana, tracing)
Lead incident response and blameless postmortems
Automate operational workflows and runbooks

What we're looking for

4+ years in SRE or production operations
Strong with Kubernetes, observability tooling, and on-call practices
Scripting/automation in Python, Go, or Bash

Nice to have

Multi-tenant or multi-cloud operations
Chaos/resilience testing experience

Think you're a fit?

Apply for this role