About this role

Role Overview Were hiring a Senior Site Reliability Engineer to own and scale the infrastructure behind our courtroom transcription platform. This is not a routine ops role - youll work on high-availability Kubernetes clusters, manage complex deployments with ArgoCD, and ensure reliability for a system processing sensitive, real-time data. Youll collaborate with a small team of elite builders and be the go-to expert for keeping our platform robust, secure, and fast. Key Responsibilities Deploy, manage, and optimize Kubernetes clusters in production environments. Operate and maintain ArgoCD for GitOps-based deployments. Troubleshoot and iron out performance, reliability, and scaling issues across our clusters. Build and maintain observability (metrics, logging, alerting) to catch and resolve issues proactively. Collaborate with backend and product teams to ensure smooth, reliable releases. Define and enforce infrastructure best practices, focusing on security, scalability, and resilience. Qualifications 10+ years of experience in production infrastructure, reliability, or DevOps roles. Proven experience deploying and managing Kubernetes clusters at scale. Experience maintaining CI/CD with GitHub actions. Hands-on expertise with ArgoCD (setup, tuning, troubleshooting). Solid foundation in Linux systems, networking, and container internals. Experience with monitoring/alerting stacks (Prometheus, Grafana, Loki, etc.). Comfortable diving into complex problems and quickly stabilizing systems. Bonus: Experience with GCP. Contributions to open-source infrastructure or reliability tooling.

Senior Site Reliability Engineer

About this role

Similar jobs

Similar jobs

Similar jobs