Job Type

full time

Description

Role and Responsibilities:

Support and maintain Kubernetes-based infrastructure primarily on AWS EKS
Build and enhance automation for provisioning, configuration, monitoring, and scaling of cloud-native environments
Collaborate closely with engineering teams to ensure platform reliability, performance, and operational excellence
Implement and manage secure processes for data and secret rotation across environments
Develop tools and practices to improve observability, reliability, and incident response
Provide technical leadership, mentorship, and promote best practices in Kubernetes, automation, and cloud operations
Manage project priorities, milestones, and deliverables in a fast-paced environment

Qualifications:

Deep expertise with Kubernetes (EKS preferred) in production environments
Strong hands-on experience with AWS services, including IAM, EKS, EC2, S3
Proficiency in data and secret rotation strategies and tooling
Proficient in scripting and automation with Python and Bash
Solid understanding of Linux fundamentals, including OS-level troubleshooting and performance tuning
Experience with infrastructure as code tools such as Terraform, Helm, or ArgoCD
Familiarity with container networking, observability tooling, and CI/CD best practices
Proven ability to architect, develop, and troubleshoot distributed systems
Strong problem-solving mindset, ownership, and communication skills
Experience in high-scale, low-latency, or mission-critical environments is a plus

This job is found at InterviewStack.io

Skills

awseksautomationmonitoringobservabilitykubernetesiamec2s3pythonbashlinuxinfrastructure as codeterraformhelmargocdci/cddistributed systemsincident response

Site Reliability Engineer

Prepare for this role

Job Type

Description

Skills