Senior SRE – Scale, Reliability & AI | Remote
There is a difference between managing infrastructure and engineering reliability. This role is firmly in the second category.
A high-traffic global platform on Google Cloud needs an SRE who thinks in systems, who looks at a complex, distributed architecture and sees not just what is running, but how it will fail, how to prevent it, and how to recover faster when it does. Someone who measures their impact in error budgets and eliminates toil, not tickets closed.
The platform is operating at a genuine scale. The SRE function is not peripheral; it is central to how the engineering organisation grows without degrading. You will own the reliability standards that give product teams the confidence to ship fast, build the observability infrastructure that turns noise into signal, and lead the incident process that turns failure into institutional knowledge.
What makes this role distinct is the AI dimension. The team is actively building machine learning into how it detects anomalies, predicts degradation, and responds to incidents. This is not a future roadmap item; it is happening now, and the person coming into this role will have a real hand in shaping it.
If you are an SRE who wants to move beyond keeping things running and start defining how a scaled platform is built to last, this is worth a conversation.
What you will own:
What you will:
What is on offer:
For more information, contact Samer Jaffer in confidence on +353 1 649 8502 or sa[email protected]
OpenShift Platform Engineer
Syffer
(Senior) Java Backend Software Engineer (all genders)
TeamViewer Germany GmbH
(Senior) Full Stack Software Engineer — React / Java Spring Boot (all genders)
TeamViewer Germany GmbH
Product Analyst for TPP Supply
Emerging Travel Group
Conversational Data Collection Associate (Audio Recording) - Portuguese
Volga Partners
R&D Senior Member of Technical Staff, Product Development
Aveva