The Technical Operations Engineer is responsible for supporting the performance, reliability, and visibility of the Medlytix Production System. This role serves as a hybrid technical function combining production monitoring, telemetry analysis, workflow orchestration support, and automation engineering.

Working under the direction of the Director, this individual plays a critical role in maintaining operational health across distributed systems, data pipelines, and workflow orchestration environments. The position requires strong hands-on expertise with monitoring tools, telemetry platforms, cloud technologies, and data processing systems.

The Technical Operations Engineer evaluates system behavior, investigates production issues, supports and maintains monitoring of production systems, and drives automation and reliability improvements. The ideal candidate is highly proficient in relevant technical tools and platforms, with the ability to effectively monitor, analyze, and improve production systems.

This role also requires strong critical thinking and problem-solving skills to evaluate complex system behaviors, identify root causes, and implement effective solutions across interconnected systems.

Responsibilities:

Monitor systems, workflows, and data pipelines to ensure optimal performance, high data quality, and system reliability
Build and maintain monitoring dashboards, alerts, and observability frameworks using telemetry tools
Analyze workflow performance metrics (latency, failures) and identify trends or anomalies
Support workflow orchestration platforms (e.g., Airflow) to ensure successful job execution and dependency management
Troubleshoot workflow failures, data pipeline issues, and system disruptions across distributed environments
Perform root cause analysis using logs, telemetry data, and execution history, and provide actionable recommendations
Manage and respond to production incidents, including triage, escalation, and coordination with cross-functional teams
Ensure data quality and integrity by implementing validation checks and identifying anomalies early
Develop automation scripts and tools to reduce manual operational effort and improve efficiency
Identify opportunities to improve system reliability, fault tolerance, and operational scalability
Collaborate with Engineering, Product, and Data teams to resolve issues and enhance system performance

Communicate technical findings clearly and contribute to operational reporting and dashboards

Requirements:

Bachelor's degree in Computer Science, Information Systems, Engineering, Data Science, or related field, with 3+ years of experience in technical operations, data engineering, or business intelligence
Strong proficiency in SQL and experience with Python or scripting for troubleshooting, analysis, and automation
Hands-on experience with workflow orchestration tools (e.g., Airflow) and data pipelines
Familiarity with cloud platforms (AWS preferred) and monitoring/observability tools (e.g., Datadog, CloudWatch)
Proven ability to perform root cause analysis and troubleshoot complex issues across distributed systems
Strong critical thinking and problem-solving skills with the ability to quickly learn and apply new tools and technologies
Effective communication skills with the ability to translate technical findings into actionable insights
Exposure to ML/AI concepts, tools, or operational use cases is a plus

Technical Operations Engineer

Prepare for this role

Benefits

Job Type

Description

Skills