Want to build the reliability backbone for healthcare software used by care teams every day?
CompuStaff is recruiting for a Senior Site Reliability Engineer on behalf of our client, a leading healthcare software company based in Galway. This is a brilliant opportunity to join a growing SaaS business at an important stage of its journey, as it modernises its platform, strengthens observability, and moves towards a more unified, scalable cloud architecture.
You’ll be joining a collaborative engineering team where your work will directly improve platform reliability, deployment confidence, incident response, and the overall developer experience.
About the company
Our client provides a software platform that helps care providers improve safety, compliance, workflows, and outcomes for service users.
Following strong growth and recent investment, the company is now expanding its engineering capability and investing heavily in platform modernisation, automation, observability, and AI-enabled care management.
The role
This is a hands-on SRE role with plenty of room to shape how things are done.
You’ll work across cloud infrastructure, monitoring, automation, deployment pipelines, and reliability engineering. The environment includes a mix of established systems and newer platform work, so you’ll get the best of both worlds: real problems to solve now, and the chance to influence the future architecture.
You’ll help modernise observability using tools such as Prometheus and Grafana, improve CI/CD pipelines, support the company’s move towards an AWS-first architecture, and contribute to the early stages of its containerisation journey.
What you’ll be doing
- You’ll build automation to reduce manual operational work and improve platform reliability.
- You’ll improve monitoring, alerting, and observability across multiple products.
- You’ll help instrument existing services to provide better telemetry, visibility, and service-level insights.
- You’ll contribute to infrastructure modernisation using Terraform and cloud-native tooling.
- You’ll work with engineering, QA, support, and product teams to define and improve reliability metrics such as SLIs, SLOs, and SLAs.
- You’ll enhance CI/CD pipelines using tools such as GitHub and Jenkins.
- You’ll support incident response, root cause analysis, and long-term reliability improvements.
- You’ll also have a voice in future architecture decisions, including platform consolidation, AWS, Docker, and deployment strategy.
What we’re looking for
We’re looking for someone who enjoys solving reliability problems properly and not just restarting things and hoping the logs stop shouting.
You’ll likely bring strong experience in some of the following:
- Site Reliability Engineering, DevOps, Platform Engineering, or Cloud Infrastructure
- Monitoring and observability, ideally with tools like Prometheus and Grafana
- AWS or cloud-native infrastructure
- Terraform or infrastructure as code
- CI/CD pipelines, ideally GitHub Actions, Jenkins, or similar
- Scripting or development experience, ideally Python, PHP/Laravel, or C#/.NET
- Docker or containerisation principles
You don’t need to tick every box. The important thing is that you are hands-on, curious, automation-minded, and comfortable working across engineering and operations.
Why this role is worth a look
This is not a “keep the lights on” SRE role.
You’ll be joining at a point where the company is actively investing in better engineering practices, better tooling, and a more scalable platform. There is real scope to influence how reliability, observability, cloud infrastructure, and deployment automation are built for the next stage of growth.
You’ll also be working on software with a genuine purpose: systems that help care providers deliver safer, smarter, and more connected care.