Site Reliability Engineer (SRE) - AWS - Lisboa

Site Reliability Engineer (SRE) - AWS
Lisboa
Lisboa, Lisboa, Portugal

Senior Dev
Ops Engineer (AWS)

This company is a
- growing company in the finance space who are working on a fintech platform for wealth management and asset management organisations to provide solutions and strategies for their clients.

The Site Reliability Engineering team plays a crucial role in ensuring platform stability and delivering a seamless experience for clients. In this leadership position, you’ll bridge the gap between software development and operations, applying engineering best practices to infrastructure challenges.

What are we looking for?

5+ years of experience in Site Reliability Engineering or a related field, with at least 3 years working in AWS environments.
Expertise in container orchestration, particularly Kubernetes, along with related ecosystem tools.
Familiarity with databases such as Mongo
DB, Postgre
SQL, and Dynamo
DB.
Strong understanding of reliability engineering principles and distributed system behavior.
Experience defining and implementing SLOs/SLIs to enhance system performance and reliability.
Proven ability to design observability solutions that generate meaningful insights while minimizing unnecessary alerts.
Proficiency in at least one
-
- code language (Terraform preferred) and one programming language such as Python, Ruby, or Java, with a focus on writing maintainable, testable code.
Deep understanding of modern monitoring and observability tools, including Prometheus, Grafana, Splunk, New Relic, Cloud
Watch, and ELK in cloud environments.
Strong incident response skills, including leading
- incident reviews and implementing
- term improvements.
Excellent troubleshooting skills, with experience debugging distributed systems.
A track record of automating workflows to reduce operational overhead.
Strong communication skills, with the ability to convey complex technical concepts to different audiences.
Passion for collaboration, mentorship, and continuous learning.

What will you do?

Define, implement, and manage service level objectives (SLOs) that align with business priorities and user expectations.
Develop observability strategies, leveraging key performance metrics to drive actionable improvements.
Design and deploy scalable infrastructure solutions using
- native technologies and
-
- code principles.
Lead automation efforts to reduce manual workload and enhance system reliability.
Advocate for reliability best practices, providing guidance and tools to development teams.
Oversee the design and operation of Kubernetes environments for efficient container orchestration.
Manage incident response processes, conduct
- depth postmortems, and drive continuous system improvements.
Participate in
- call rotations with a focus on proactive service enhancements.

What's on offer?

Competitive Salary
Personal Performance Bonus
Equity
Pension Fund
Health Insurance (for you and immediate family)
Food Allowance

Interested?

Hit Apply!

Informações detalhadas sobre a oferta de emprego

Empresa:	UMATR
Localização:	Lisboa Lisboa, Lisboa, Portugal
Publicado:	15. 3. 2025 Vaga de emprego atual

Responder ao anúncio
Seja o primeiro a candidar-se à vaga de emprego oferecida!

Site Reliability Engineer (SRE) - AWS Lisboa