InterviewStack.io LogoInterviewStack.io
Browse more Site Reliability Engineer jobs

Site Reliability Engineer

Ambergroup

Hong Kong, Hong Kong, Hong KongRemote10 months ago
90 views54 saves2 applies

Prepare for this role


Job Type

full time

Description

Role and Responsibilities:

  • Support and maintain Kubernetes-based infrastructure primarily on AWS EKS
  • Build and enhance automation for provisioning, configuration, monitoring, and scaling of cloud-native environments
  • Collaborate closely with engineering teams to ensure platform reliability, performance, and operational excellence
  • Implement and manage secure processes for data and secret rotation across environments
  • Develop tools and practices to improve observability, reliability, and incident response
  • Provide technical leadership, mentorship, and promote best practices in Kubernetes, automation, and cloud operations
  • Manage project priorities, milestones, and deliverables in a fast-paced environment

Qualifications:

  • Deep expertise with Kubernetes (EKS preferred) in production environments
  • Strong hands-on experience with AWS services, including IAM, EKS, EC2, S3
  • Proficiency in data and secret rotation strategies and tooling
  • Proficient in scripting and automation with Python and Bash
  • Solid understanding of Linux fundamentals, including OS-level troubleshooting and performance tuning
  • Experience with infrastructure as code tools such as Terraform, Helm, or ArgoCD
  • Familiarity with container networking, observability tooling, and CI/CD best practices
  • Proven ability to architect, develop, and troubleshoot distributed systems
  • Strong problem-solving mindset, ownership, and communication skills
  • Experience in high-scale, low-latency, or mission-critical environments is a plus

This job is found at InterviewStack.io

Skills

awseksautomationmonitoringobservabilitykubernetesiamec2s3pythonbashlinuxinfrastructure as codeterraformhelmargocdci/cddistributed systemsincident response