Senior Site Reliability Engineer
Resideo
Prepare for this role
Benefits
Job Type
Description
As a Senior Site Reliability Engineer, you will apply advanced expertise to strategically evolve and manage our cloud infrastructure, ensuring high levels of availability, scalability, and resilience. This role focuses on leading complex initiatives to drive the adoption of Infrastructure-as-Code and integrating advanced DevOps practices across global engineering teams.
Your contributions will directly influence technical architecture, operational frameworks, and the seamless delivery of cutting-edge IoT and big data platforms, requiring autonomous decision-making and cross-functional collaboration to achieve significant organizational impact.
JOB DUTIES:
- Lead the strategic design, implementation, and optimization of public cloud infrastructure across Azure, AWS, or GCP, ensuring solutions align with organizational resilience and scalability objectives
- Drive the adoption and continuous enhancement of Infrastructure-as-Code (IaC) principles and tools (e.g., Terraform, ARM Templates) to automate cloud resource provisioning and management
- Develop and integrate advanced IT automation solutions using tools such as Ansible, Chef, or Helm for Kubernetes, focusing on efficiency and system reliability
- Oversee the end-to-end CI/CD pipeline, implementing modern practices with Git, Git Actions, Jenkins, Docker, and Kubernetes to streamline software delivery and deployment
- Architect and maintain comprehensive observability and monitoring frameworks utilizing platforms like Grafana, Prometheus, or Elastic, providing strategic insights into system performance and health
- Manage critical incidents, conduct Root Cause Analysis (RCA) for complex outages, and lead major infrastructure upgrades to minimize downtime and ensure service continuity
- Influence and mentor engineering teams on best practices for cloud infrastructure, reliability engineering, and operational excellence, fostering a culture of continuous improvement
YOU MUST HAVE:
- Minimum of 6 years of progressive experience in Site Reliability Engineering or a closely related cloud infrastructure role
- At least 3 years of hands-on experience with a major public cloud platform (Azure, AWS, or GCP), with demonstrated ability to architect and manage cloud-native solutions
- Proven track record of designing and implementing Infrastructure-as-Code solutions for complex environments, including a minimum of 2 years with Terraform or similar IaC platforms
- Demonstrated expertise with container orchestration platforms (Docker, Kubernetes) and supporting ecosystem
WE VALUE:
- Advanced scripting capabilities in PowerShell, Bash, Python, or similar languages for automation and system management
- Experience with distributed systems, large-scale data platforms, or IoT infrastructure
- Background in a global team environment, providing strategic technical guidance and cross-functional leadership
- Extensive experience in administering and optimizing enterprise-grade Windows and Linux environments (5+ years)
Strong leadership in incident response, root cause analysis, and problem resolution for complex production issues
WHAT'S IN IT FOR YOU:
- Flexible hybrid working arrangement to support work-life balance
- Meal ticket for each day worked
- Medical coverage to support your health and wellbeing
- 26 days of vacation
#LI-AM3
#LI-HYBRID
This job is found at InterviewStack.io
Skills
About Resideo
Resideo Technologies, Inc. is an American multinational company that provides room air temperature, quality, and humidity control and security systems primarily in residential dwellings in the U.S. and internationally. It manufactures and distributes smart-home and software products, including temperature and lighting control, security, and water and air monitoring.