Site Reliability Engineer (SRE)
Singapore Exchange
SRE3 weeks ago
83 views35 saves14 applies
Prepare for this role
Job Type
full time
Description
Job Summary
SGX is hiring Site Reliability Engineers who treat operations as a software problem. You'll keep production healthy, but more importantly you'll build the automation, tooling, and agentic workflows that make running our systems boring and predictable. This is an engineering role - if your instinct on a recurring issue is to write code that removes it, you'll fit in well.
We operate in a regulated capital-markets environment, so the bar for reliability, security, and operational rigour is high.
Job Responsibilities
- Own production reliability (SLOs, capacity, incident response, postmortems) and turn every incident into a durable fix in code or automation.
- Build the platform and tooling that make services easy to deploy, observe, and operate: CI/CD, infrastructure-as-code, observability stacks, runbooks-as-code.
- Apply AI agentically across operations (triage, root-cause analysis, remediation, change review) and contribute to our internal agentic ecosystem.
- Design and integrate the systems underneath our services: messaging (e.g. Kafka), orchestration (e.g. Kubernetes), and performance-sensitive infrastructure.
- Partner with product engineers on release readiness, rollout strategy, and production hardening before things ship.
- Continuously reduce toil: measure it, attack it with code, and raise the floor on what "easy to maintain" looks like.
Job Requirements
- 5+ years in SRE, platform, or infrastructure engineering, with a clear track record of replacing manual work with code
- Strong programming ability in at least one modern language (e.g. Go, Python, Kotlin, TypeScript, Rust, etc), you write production code, not just glue scripts
- AI-native ways of working: real experience orchestrating agents for ops workflows, not just using AI for autocomplete
- Deep hands-on with Kubernetes, IaC (Terraform or equivalent), CI/CD, and modern observability (metrics, logs, traces)
- Production experience on a major cloud: GCP preferred, AWS acceptable
- Solid foundations in distributed systems and the failure modes that matter in production
- Incident-response maturity: calm under pressure, sharp on root cause, disciplined about follow-through
- Comfort in complex, regulated environments
Nice to Have
- Familiarity with the FIX protocol or capital-markets domain
- Experience building internal developer platforms or self-service tooling consumed by other engineers
This job is found at InterviewStack.io
Skills
automationci/cdobservabilitykafkakubernetespythonkotlintypescriptrustinfrastructure as codeterraformgcpawsdistributed systemsroot cause analysisincident response
About Singapore Exchange
Asia’s most international, multi-asset exchange, operating securities, fixed income and derivatives markets to the highest regulatory standards.
financial services, capital marketsWebsite