InterviewStack.io LogoInterviewStack.io
Browse more DevOps Engineer jobs

1937 Sr DevOps Engineer - Production Support

In All Media

BrazilRemote3 weeks ago
48 views13 saves6 applies

Prepare for this role


Job Type

full time

Description

Senior DevOps Engineer - Production Support (Azure/AKS)

  • Location: Remote from LATAM (100% Remote)
  • Contract Type: Full-time vendor (Contracted directly by Inallmedia.com)
  • Time Zone Alignment: Central Time (CT) ±2 hours

About Inallmedia.com

Inallmedia.com is a global technology and design firm focused on building impactful digital solutions through remote, distributed teams across LATAM. We partner with international clients across industries, providing long-term technical expertise, product innovation, and team augmentation.

For this specific role, you will be contracted directly by Inallmedia.com to support a leading, high-growth sustainable energy and clean technology enterprise based in the USA.

Project Overview

You will join a dynamic engineering squad dedicated to maintaining and optimizing the cloud infrastructure that powers critical clean energy and solar storage solutions across North America. As a Senior DevOps Engineer, you will focus heavily on production reliability, monitoring high-availability cloud systems, and driving incident response.

This role bridges the gap between infrastructure operations and engineering, ensuring the scalability and resilience of next-generation green tech platforms.

Key Responsibilities

  • System Monitoring: Monitor critical production systems—including Azure Kubernetes Service (AKS), microservices, and CI/CD pipelines—using advanced dashboards and proactive alerting.
  • Incident Response: Act as the primary technical responder for live production incidents and Slack escalations, ensuring rapid triage, root-cause identification, and swift resolution.
  • Operational Excellence: Maintain, refine, and improve internal runbooks and standard operating procedures (SOPs) to ensure operational predictability.
  • Deployment Support: Oversee and support deployment activities across both production and non-production environments while strictly adhering to SLAs and corporate response times.
  • Reliability Engineering: Collaborate deeply with core DevOps and software engineering teams to root out recurring systemic issues and elevate overall platform reliability.
  • Automation: Help design and implement smart automation scripts for recurring operational tasks to reduce manual toil.

Must-Have Skills

  • Cloud & Support Experience: 6+ years of proven experience in DevOps, Cloud Infrastructure, or high-stakes Production Support roles.
  • Azure Mastery: A solid, comprehensive understanding of Microsoft Azure fundamentals (specifically Compute, Networking, and Azure Monitoring ecosystems).
  • Kubernetes Expertise: Hands-on operational experience with Kubernetes, specifically Azure Kubernetes Service (AKS) operations, log analysis, and cluster scaling.
  • Observability Tooling: Strong familiarity with modern monitoring and observability tools (such as Azure Monitor, Grafana, Prometheus, or similar).
  • Incident Management: Well-versed in structured incident management, escalation workflows, and working under strict SLA guidelines.
  • Scripting Fluency: Intermediate-to-advanced scripting capabilities using Bash, PowerShell, or Python for task automation.
  • Remote & Agile Mindset: Extensive experience working autonomously in Agile teams within 100% remote environments.
  • Communication: Exceptional verbal and written English skills for seamless daily technical collaboration.

Nice-to-Have Skills

  • Hands-on exposure to building and optimizing CI/CD pipelines (GitHub Actions, Jenkins, etc.).
  • Practical exposure to Infrastructure as Code (IaC) concepts and tools (Terraform, Bicep).
  • Prior experience operating in 24/7 mission-critical or high-availability (HA) infrastructure environments.
  • Familiarity with ITIL frameworks or highly structured enterprise incident management ecosystems.

Time Zone & Collaboration

The role requires close collaboration with teams aligned to Central Time (CT). Full integration with the US-based team during core operational hours is expected, allowing for real-time collaboration and agile synchronization.

Language

All interviews, technical documentation, and daily communication will be conducted exclusively in English.

This job is found at InterviewStack.io

Skills

azuremonitoringscalabilitykubernetesmicroservicesci/cddashboardsswiftautomationobservabilitygrafanaprometheusbashpowershellpythonagilegithub actionsjenkinsinfrastructure as codeterraformincident managementtechnical documentationhigh availabilityincident responselog analysis