• Jobs
  • >
  • Senior Site Reliability Engineer

Senior Site Reliability Engineer

  • Indefinite
  • Full time
  • 1700-036, Lisbon, Lisbon, Portugal
  • Cloud Tribe

Imagine if you get the opportunity to be part of an international project, where you will be helping to develop a whole new category of tools that support the next generation of the internet – the web of trust.

Ritain.io (R.io) is the Readiness IT’s Cloud, Data & Business Agility Center of Excellence, which has a global team of 400+ consultants, offices in three continents, and clients around the globe.

Right now, we are looking for a Senior Reliability Engineer to join our #KickAssTeam!

You will play a pivotal role in supporting and maintaining our production and development environments, working with our engineering teams to build cloud services that are secure, reliable, scalable, and observable.

What do we expect you to do?

  • Collaborating with engineering teams to design and build highly automated cloud services that are reliable, scalable, and observable;
  • Proposing, designing, and implementing strategies to improve the security of our cloud systems;
  • Designing, building, maintaining, and supporting cloud infrastructures using Terraform to define our Infrastructure as Code in our AWS environments;
  • Work with other SREs to identify components that can be shared across engineering teams to improve productivity, such as developer tooling, build automation, provisioning, logging, monitoring, alerting, incident processes, etc.;
  • Produce clean, consistent and well-organized code to automate infrastructure, builds, deployments, and configuration running on the production stack;
  • Lead the way in how we design, manage and improve our infrastructure;
  • Install and configure services in our environments using SRE principles;
  • Work with the teams by running blameless post-mortems to identify and implement improvements to make our products more reliable;
  • Work in a “you build it you run it” environment where engineering teams build, deploy, monitor and support the components that they own;
  • Consult engineering teams in a true SRE way;
  • Define and implement ways that we measure service operations and support engineering teams to implement monitoring where relevant;
  • You will keep track of industry trends and contribute to our technical roadmap.

What do you need to bring us?

  • 5+ years of experience deploying, configuring, monitoring, and supporting distributed production and non-production systems in cloud environments in AWS (or other relevant cloud infrastructure);
  • Strong understanding of security, reliability, scalability, and platform management topics;
  • Knowledge in at least one (scripting) language;
  • Proven expertise in cloud network architecture design and implementation;
  • Experience with managing applications running on Kubernetes clusters using Linux;
  • Experience with Terraform, Vault, Prometheus, EKS, and a wide range of cloud first tools;
  • Experienced working in teams that have production infrastructure defined in code using automation, continuous integration, and continuous delivery to manage your environments;
  • Experience with GitHub actions, helm & Flux;
  • Identifying manual tasks and designing automated tooling solutions to expedite their execution;
  • Enjoy working in open source/developer communities;
  • Have strong organizational skills, and enjoy a dynamic and agile working environment.

What kind of mindset are we looking for?

  • Team player
  • “Automate Everything” attitude
  • A technology enthusiast and passionated
  • Great communicator
  • Goal-oriented
  • Problem-solving focused

Join our Tribe of Tribes and get ready to #MakeEpicStuffHappen!

Visit our website at https://ritain.io to know more about our company and our service offer.