Site Reliability Engineer responsible for managing Kubernetes clusters in production and empowering development teams through observability insights and DevOps culture.
• Have at least 3 years of experience in a production SaaS company, preferably event-driven. • Self-starter capable of taking strategic direction and owning end-to-end delivery of technical solutions. • Expertise in Site Reliability Engineering (SRE) principles and best practices. • Strong experience with observability tools like Prometheus, Grafana, OTEL, and CloudWatch. • Skilled in Python and/or GoLang, with knowledge of Java and SpringBoot being an advantage. • Proficient in AWS services such as SQS, EKS, RDS, VPC, EC2, and CloudWatch. • Deep understanding of Linux systems, network protocols (TCP, DNS, TLS, HTTP), and bash scripting. • Familiar with DevOps tools such as Terraform, GitHub Actions, and Jenkins. • Experience with stream processing technologies like Kafka, and ITSM systems like JSM, Zendesk, or ServiceNow. • Strong communication and collaboration skills with experience in Agile development methodologies.
• Implementing and refining observability metrics, logging, and dashboards to monitor platform health. • Developing data-driven KPIs for availability and reliability in production environments. • Ensuring services are optimally running with proactive monitoring and performance tracking. • Supporting development teams by embedding SRE principles and streamlined incident response. • Building and maintaining scalable monitoring tools for logging, metrics, and tracing. • Writing maintainable code for operational, scalability, and observability enhancements. • Troubleshooting and mitigating production incidents to ensure a reliable platform. • Maintaining detailed runbooks and automating manual tasks where possible. • Supporting junior engineers in adopting best practices for observability and reliability. • Participating in a 24x7 on-call rotation, incident escalation, and post-mortems.
• 30 days of holiday plus bank holidays. • A generous pension scheme to support financial well-being. • Private medical insurance coverage. • Life assurance and other employee benefits. • A collaborative and flexible work culture promoting innovation and teamwork.
• GSS is committed to fostering an inclusive and diverse workplace where all employees feel valued and respected. • They embrace equal opportunity hiring practices and encourage applicants from all backgrounds. • The company actively supports career growth and development for underrepresented groups in technology. • Team members are encouraged to bring their authentic selves to work, contributing to a culture of belonging. • GSS promotes a workplace culture where different perspectives are valued and innovation thrives.
• GSS places a strong emphasis on creating an inclusive and diverse workspace that values all employees. • Their commitment to equal opportunity hiring encourages candidates from various backgrounds to apply. • By actively supporting career growth for underrepresented groups, they promote diversity in tech. • Employees are empowered to express their authentic selves at work, fostering a strong sense of belonging. • The company culture encourages open dialogue, diverse perspectives, and continuous innovation.
Diversaa highlights standout opportunities from inclusive employers. When you continue, you’ll be redirected to the employer’s official site to complete your application. We don’t collect or store your application details - just sharing the best paths forward.