InterviewStack.io LogoInterviewStack.io
Browse more Site Reliability Engineer jobs

Site Reliability Engineer (OpenSearch)

NetApp

Bangalore, India Office (BANGALORE)1 month ago
59 views17 saves2 applies

Prepare for this role


Job Type

full time

Description

Job Summary

NetApp is seeking a Technical Operations Engineer (OpenSearch) to join our growing Instaclustr team in Bangalore, India. In this role, you will be part of a frontline Site Reliability Engineering (SRE) team responsible for ensuring the availability, performance, and reliability of large-scale, cloud-hosted OpenSearch clusters.
You will work in a highly automated environment managing distributed open-source systems at scale, collaborating with global customers across industries such as banking, telecom, gaming, and technology. This role requires strong operational expertise, problem-solving skills, and a passion for learning and working with modern cloud-native and open-source technologies.

Job Requirements

  • Provide end-to-end operational support for OpenSearch clusters deployed across public cloud platforms (AWS, Azure, GCP).
  • Monitor, troubleshoot, and resolve complex production issues, ensuring high availability and performance.
  • Perform cluster lifecycle operations, including upgrades, migrations, maintenance, and scaling activities.
  • Participate in L2 on-call rotations, ensuring timely incident response and resolution.
  • Collaborate with customer engineering teams to diagnose and resolve issues related to OpenSearch and other supported technologies.
  • Work closely with internal teams to enhance reliability, automation, and operational efficiency.
  • Develop and improve automation tools, scripts, and operational processes.
  • Analyse system behaviour and proactively identify opportunities for performance optimisation and reliability improvements.
  • Contribute to knowledge sharing, documentation, and continuous improvement initiatives.

Required Skills & Experience

  • Hands-on experience with OpenSearch (including troubleshooting, upgrades, and migrations) or strong willingness to develop deep expertise.
  • Experience with public cloud platforms such as AWS, Azure, or GCP.
  • Strong Linux system administration skills and comfort with command-line environments.
  • Solid understanding of distributed systems, networking, and OS internals.
  • Experience with containerisation technologies (e.g., Docker).
  • Strong problem-solving skills with the ability to debug complex production issues.
  • Excellent communication skills (written and verbal) with a customer-focused mindset.
  • Ability to work effectively in a collaborative, fast-paced environment and take ownership of tasks.

Preferred Skills

  • Experience working with other distributed systems such as Cassandra or Kafka.
  • Familiarity with source code debugging and issue investigation (e.g., Jira, codebase review).
  • Programming/scripting skills in Python, Java, or Bash.
  • Experience with Git or version control systems.
  • Prior experience in customer support or technical operations roles

Education

  • Typically requires a minimum of 4-8 years of related experience with a Bachelor’s degree or 6 years and a Master’s degree; or a PhD with 3 years experience; or equivalent experience.

This job is found at InterviewStack.io

Skills

awsazuregcpautomationlinuxdistributed systemsdockercassandradebuggingjirapythonjavagitcustomer supportsite reliability engineeringhigh availabilitysystem administrationincident response