Senior Site Reliability Engineer / Platform Engineer
Hybrid role with 1 day per week in the Galway office.
Our client builds SaaS platforms that power operational management, payments, digital engagement, and background screening for over organizations worldwide
Their Galway R&D Centre is a key hub for building the next generation of their global SaaS platform.
You will join a growing R&D team in Galway, working across AWS and Azure to design, build, and operate resilient, secure, and highly automated cloud platforms. The focus is on production systems, developer enablement, and continuous improvement—not just designing systems but making sure they run flawlessly.
You will partner closely with engineering, product, and infrastructure teams to evolve a cloud-native, multi-tenant SaaS platform, embedding SRE principles across the organisation.
What You’ll Do
- Build and operate highly available, scalable cloud infrastructure across AWS and Azure
- Drive automation-first infrastructure using Infrastructure as Code and CI/CD pipelines
- Lead platform evolution efforts including Kubernetes adoption, containerisation, and service reliability improvements
- Improve system observability (metrics, logging, tracing) to enable proactive operations
- Partner with development teams to improve deployment pipelines, resilience, and performance
- Own and continuously improve incident management, root cause analysis, and postmortems
- Contribute to security, compliance (PCI DSS), and reliability standards across the platform
- Mentor engineers and help embed a DevOps/SRE culture across teams
What You Bring
- Strong experience in SRE, Platform Engineering, or DevOps roles supporting production SaaS systems
- Deep hands-on expertise with AWS and/or Azure in a multi-cloud environment
- Experience running containerised workloads (Kubernetes) in production
- Solid scripting/programming skills (e.g. Python, Bash, PowerShell)
- Strong understanding of distributed systems, networking, and cloud-native architectures
- Proven track record in infrastructure as code, automation, and CI/CD
- Familiarity with databases, storage systems, and performance tuning
- Experience operating in regulated environments (PCI DSS or similar) is a plus
- Ability to troubleshoot complex systems and drive long-term reliability improvements