Job Summary

SGX is hiring Site Reliability Engineers who treat operations as a software problem. You'll keep production healthy, but more importantly you'll build the automation, tooling, and agentic workflows that make running our systems boring and predictable. This is an engineering role - if your instinct on a recurring issue is to write code that removes it, you'll fit in well.

We operate in a regulated capital-markets environment, so the bar for reliability, security, and operational rigour is high.

Job Responsibilities

Own production reliability (SLOs, capacity, incident response, postmortems) and turn every incident into a durable fix in code or automation.
Build the platform and tooling that make services easy to deploy, observe, and operate: CI/CD, infrastructure-as-code, observability stacks, runbooks-as-code.
Apply AI agentically across operations (triage, root-cause analysis, remediation, change review) and contribute to our internal agentic ecosystem.
Design and integrate the systems underneath our services: messaging (e.g. Kafka), orchestration (e.g. Kubernetes), and performance-sensitive infrastructure.
Partner with product engineers on release readiness, rollout strategy, and production hardening before things ship.
Continuously reduce toil: measure it, attack it with code, and raise the floor on what "easy to maintain" looks like.

Job Requirements

5+ years in SRE, platform, or infrastructure engineering, with a clear track record of replacing manual work with code
Strong programming ability in at least one modern language (e.g. Go, Python, Kotlin, TypeScript, Rust, etc), you write production code, not just glue scripts
AI-native ways of working: real experience orchestrating agents for ops workflows, not just using AI for autocomplete
Deep hands-on with Kubernetes, IaC (Terraform or equivalent), CI/CD, and modern observability (metrics, logs, traces)
Production experience on a major cloud: GCP preferred, AWS acceptable
Solid foundations in distributed systems and the failure modes that matter in production
Incident-response maturity: calm under pressure, sharp on root cause, disciplined about follow-through
Comfort in complex, regulated environments

Nice to Have

Familiarity with the FIX protocol or capital-markets domain
Experience building internal developer platforms or self-service tooling consumed by other engineers

This job is found at InterviewStack.io

Skills

automationci/cdobservabilitykafkakubernetespythonkotlintypescriptrustinfrastructure as codeterraformgcpawsdistributed systemsroot cause analysisincident response

About Singapore Exchange

Asia’s most international, multi-asset exchange, operating securities, fixed income and derivatives markets to the highest regulatory standards.

financial services, capital marketsWebsite

Site Reliability Engineer (SRE)

Prepare for this role

Job Type

Description

Job Summary

Job Responsibilities

Job Requirements

Skills

About Singapore Exchange