InterviewStack.io LogoInterviewStack.io
Browse more Software Engineer jobs

Staff Software Engineer - Agent Runtime & Infrastructure

Siza Buso Consulting

New York, New York, United States4 weeks ago
80 views29 saves12 applies

Prepare for this role


Job Type

full time

Description

The Role

You'll own two critical workstreams — the agent runtime and backend infrastructure powering every trade in our fleet, and the migration of model hosting and agent deployment to fully in-house infrastructure. This is staff-level ownership from architecture through to 3am incident response.

What You'll Build

Agent Runtime & Backend (50%)

  • Plugin runtime — per-agent position tracking, trailing stop execution, and DSL state management
  • Scanner gateway and rules engine — YAML-configurable evaluation layer between signals and execution
  • Centralised profit-trailing service — protecting open positions even when agents are offline
  • Execution layer — the MCP server bridging agents to 48+ platform tools, including position creation, market data, and exchange state
  • Real-time data pipelines — enriched intelligence flowing through Redis, Postgres, and ClickHouse

Model & Agent Hosting Migration (30%)

  • Migrate agent deployment to fully owned infrastructure — isolated workspaces, cron scheduling, state persistence, and one-command skill deployment
  • Lead the move from external LLM APIs to self-hosted inference — own the decision and the execution
  • Build agent telemetry to capture every trade decision, scanner evaluation, and signal score across the fleet
  • Zero-downtime CI/CD pipelines for shipping updates to 50+ live agents without exposing open positions

Infrastructure & Operations (20%)

  • Monitoring and alerting for agent failures, orphaned positions, and state corruption
  • Cloud infrastructure management on AWS/EKS with infrastructure-as-code
  • Own incident response — in a live trading system, every minute of downtime is real capital at risk

What You Bring

Must-haves:

  • Strong production backend engineering in at least two of: Go, Python, Node.js/TypeScript — Go preferred
  • Experience building backend services from scratch — APIs, job scheduling, state management, distributed systems
  • Solid understanding of real-time, low-latency systems — websockets, sub-second evaluation, condition-based triggers
  • Production experience with Postgres, Redis, and an analytics DB such as ClickHouse or BigQuery
  • Kubernetes experience — deploying, scaling, and debugging on AWS EKS
  • You have owned a system end-to-end — designed, built, deployed, operated, and fixed it under pressure

Strong plus:

  • Experience with LLM infrastructure — model serving, inference optimisation, vLLM, TGI, or managed endpoints
  • Background in trading systems, exchange APIs, or fintech where uptime has direct financial consequences
  • Onchain infrastructure experience — wallet operations, RPC nodes, DEX integration
  • Experience building multi-agent platforms or CI/CD pipelines for live trading systems

This is not a DevOps role. You'll spend 80% of your time writing code that ships to production — because at our stage, the best person to operate a system is the person who built it. If you are a backend engineer who wants to build the foundational infrastructure for a new category of autonomous financial software, this is your role.

This job is found at InterviewStack.io

Skills

data pipelinesredispostgresqlclickhousellmapisci/cdmonitoringawsekspythonnode.jstypescriptdistributed systemsanalyticsbigquerykubernetesdebugginginfrastructure managementincident response