SVP of Site Reliability

SVP, Site Reliability Engineering (AI-Native SaaS Platform)
Remote

We’re hiring an experienced, hands-on SVP of Site Reliability Engineering to lead reliability, incident response, and AI-driven operations for a fast-scaling enterprise SaaS platform. This is a high-impact leadership role focused on building and evolving an AI-first SRE function where automation, agentic workflows, and intelligent remediation are central to the operating model.

You’ll lead a small team of senior engineers responsible for platform uptime, customer experience, observability, incident management, and auto-remediation systems across a large-scale AWS environment. This role requires a technical leader who remains close to the code, drives critical incident response, and partners directly with executive leadership and enterprise customers.

What We’re Looking For

10 years in SaaS infrastructure, SRE, DevOps, or platform engineering
Proven leadership experience at VP/SVP/Head of Engineering level
Deep expertise in AWS, cloud-native infrastructure, distributed systems, and multi-region production environments
Strong background in AIOps, agentic automation, auto-remediation, and AI-driven incident response
Hands-on experience with observability and incident management platforms such as Grafana, Prometheus, Datadog, PagerDuty, Loki, or similar
Strong coding and automation skills with a passion for operational excellence
Experience improving uptime, MTTR, reliability, and customer satisfaction at scale
Executive communication skills with enterprise customer-facing experience
Comfortable operating in a fast-paced, high-ownership, fully remote environment

Please apply for more information

APPLY HERE