InterviewStack.io LogoInterviewStack.io
Browse more DevOps Engineer jobs

SRE/DevOps Engineer- Palo Alto, the US

Kody

Palo Alto, California, United States3 days ago
54 views29 saves4 applies

Prepare for this role


Job Type

full time

Description

We are seeking a high-caliber Senior Site Reliability Engineer (SRE) based in California to ensure the scalability, reliability, and runtime efficiency of our next-generation platform. In this role, you will bridge the gap between development and operations, working closely with our global engineering teams.

We are looking for a unique engineering mindset: someone who brings a positive, collaborative energy to the daily grind, but can instantly pivot into a hyper-focused, high-ownership responder when an incident strikes.

Key Responsibilities

  • Production Reliability & Guardrails: Partner with the Platform Engineering team to implement reliability guardrails, ensuring applications running on AWS meet strict uptime and SLA requirements.
  • CI/CD & Repository Management: Own the deployment pipelines and code management practices extensively via GitHub.
  • Incident Management: Lead rapid-response troubleshooting during production incidents; conduct thorough blameless post-mortems to continuously harden our systems.
  • Observability & Performance: Implement advanced monitoring, logging, and alerting systems to proactively detect and mitigate system anomalies.
  • Cross-Border Collaboration: Act as a key technical bridge between our US operations and international engineering hubs, leveraging bilingual communication to streamline complex technical alignment.

Requirements

1. Technical Focus

    • Ecosystem Expertise (Must-Haves): Deep, practical experience managing application deployment and runtime environments on AWS, alongside master-level knowledge of advanced Git workflows and actions on GitHub.
    • Core Toolkit: Strong proficiency in monitoring tools, log management, and scripting for quick triaging and troubleshooting.

2. Soft Skills & Characteristics

    • Ownership & Transparency: You are radically open, highly responsive, and communicative. You don't just clear tickets; you own the production environment's health end-to-end.
    • Pressure-Resistance: High psychological resilience. You maintain a happy, positive attitude during smooth operations, yet feel a healthy, driving sense of urgency and laser-focus during high-stakes incidents.
    • Bilingual Capability: Absolute fluency in Mandarin and English (verbal and written) is mandatory for effective technical alignment across our global teams.

Benefits

  • Competitive packages aligned with California market standards
  • Lead a dynamic and innovative team in a very rapidly growing company
  • Collaborative, inclusive environment where your contributions are recognized and valued

This job is found at InterviewStack.io

Skills

scalabilityawsci/cdobservabilitymonitoringgitincident management