Benefits

Remote WorkHealth Insurance

Job Type

full time

Description

As a Senior Site Reliability Engineer, you will apply advanced expertise to strategically evolve and manage our cloud infrastructure, ensuring high levels of availability, scalability, and resilience. This role focuses on leading complex initiatives to drive the adoption of Infrastructure-as-Code and integrating advanced DevOps practices across global engineering teams.

Your contributions will directly influence technical architecture, operational frameworks, and the seamless delivery of cutting-edge IoT and big data platforms, requiring autonomous decision-making and cross-functional collaboration to achieve significant organizational impact.

JOB DUTIES:

Lead the strategic design, implementation, and optimization of public cloud infrastructure across Azure, AWS, or GCP, ensuring solutions align with organizational resilience and scalability objectives
Drive the adoption and continuous enhancement of Infrastructure-as-Code (IaC) principles and tools (e.g., Terraform, ARM Templates) to automate cloud resource provisioning and management
Develop and integrate advanced IT automation solutions using tools such as Ansible, Chef, or Helm for Kubernetes, focusing on efficiency and system reliability
Oversee the end-to-end CI/CD pipeline, implementing modern practices with Git, Git Actions, Jenkins, Docker, and Kubernetes to streamline software delivery and deployment
Architect and maintain comprehensive observability and monitoring frameworks utilizing platforms like Grafana, Prometheus, or Elastic, providing strategic insights into system performance and health
Manage critical incidents, conduct Root Cause Analysis (RCA) for complex outages, and lead major infrastructure upgrades to minimize downtime and ensure service continuity
Influence and mentor engineering teams on best practices for cloud infrastructure, reliability engineering, and operational excellence, fostering a culture of continuous improvement

YOU MUST HAVE:

Minimum of 6 years of progressive experience in Site Reliability Engineering or a closely related cloud infrastructure role
At least 3 years of hands-on experience with a major public cloud platform (Azure, AWS, or GCP), with demonstrated ability to architect and manage cloud-native solutions
Proven track record of designing and implementing Infrastructure-as-Code solutions for complex environments, including a minimum of 2 years with Terraform or similar IaC platforms
Demonstrated expertise with container orchestration platforms (Docker, Kubernetes) and supporting ecosystem

WE VALUE:

Advanced scripting capabilities in PowerShell, Bash, Python, or similar languages for automation and system management
Experience with distributed systems, large-scale data platforms, or IoT infrastructure
Background in a global team environment, providing strategic technical guidance and cross-functional leadership
Extensive experience in administering and optimizing enterprise-grade Windows and Linux environments (5+ years)
Strong leadership in incident response, root cause analysis, and problem resolution for complex production issues

WHAT'S IN IT FOR YOU:

Flexible hybrid working arrangement to support work-life balance
Meal ticket for each day worked
Medical coverage to support your health and wellbeing
26 days of vacation

#LI-AM3

#LI-HYBRID

This job is found at InterviewStack.io

Skills

scalabilityazureawsgcpinfrastructure as codeterraformautomationansiblehelmkubernetesci/cdgitjenkinsdockerobservabilitymonitoringgrafanaprometheuspowershellbashpythondistributed systemswindowslinuxroot cause analysissite reliability engineeringincident response

About Resideo

Resideo Technologies, Inc. is an American multinational company that provides room air temperature, quality, and humidity control and security systems primarily in residential dwellings in the U.S. and internationally. It manufactures and distributes smart-home and software products, including temperature and lighting control, security, and water and air monitoring.

manufacturing, softwareWebsite

Senior Site Reliability Engineer

Prepare for this role

Benefits

Job Type

Description

Skills

About Resideo