InterviewStack.io LogoInterviewStack.io
Browse more Site Reliability Engineer jobs

Lead Observability Engineer

InvestCloud

Bengaluru, KA, IND10 months ago
14 views5 saves2 applies

Prepare for this role


Benefits

Remote Work

Job Type

full time

Description

Key Responsibilities

· Own the design, deployment, and lifecycle management of the Splunk Enterprise platform, including indexer and search head clustering, forwarders, and knowledge objects.

· Define and implement best practices for data onboarding, parsing, enrichment, and storage to support observability use cases.

· Collaborate with infrastructure, DevOps, security, and application teams to build reliable, scalable observability solutions.

· Develop advanced SPL searches, correlation rules, alerts, and performance dashboards.

· Improve alert quality and reduce noise through smarter event correlation and visualization.

· Drive observability maturity initiatives including logging standardization, automation, and self-service access to telemetry data.

· Evaluate and integrate additional observability and monitoring tools (e.g., Prometheus, Grafana, LogicMonitor, AppDynamics, Dynatrace, etc.) to complement existing capabilities.

· Lead troubleshooting and incident response efforts where visibility and telemetry data are required.

· Mentor junior engineers and influence platform and observability architecture decisions.

Qualifications

· 8–12 years of progressive experience in observability, infrastructure monitoring, or SRE roles.

· Minimum 7 years of direct hands-on experience with Splunk Enterprise at enterprise scale.

· Deep knowledge of Splunk architecture, including clustering, ingestion pipelines, search performance tuning, and index lifecycle policies.

· Advanced proficiency with SPL (Search Processing Language) and dashboarding.

· Experience building and scaling log pipelines using technologies such as syslog, Fluentd, Logstash, Cribl, etc.

· Familiarity with cloud platforms (AWS, Azure, or GCP) and hybrid infrastructure environments.

· Experience working with configuration management and infrastructure-as-code tools (e.g., Terraform, Ansible).

· Excellent collaboration, problem-solving, and communication skills.

Must Have

· Splunk certifications (e.g., Certified Architect, Consultant, or Admin).

· Experience with APM and tracing tools (e.g., OpenTelemetry, Jaeger, New Relic, etc.).

This job is found at InterviewStack.io

Skills

splunkobservabilityautomationmonitoringprometheusgrafanafluentdlogstashawsazuregcpterraformansibleopentelemetryjaegernew relicconfiguration managementincident response

About InvestCloud

InvestCloud is a global leader in wealth technology, driving the digital transformation of the wealth management industry. The company serves a broad array of clients globally, including Wealth and Asset Managers, Wirehouses, Banks, RIAs, and Insurers.

financial services, softwareWebsite