InterviewStack.io LogoInterviewStack.io
Browse more Site Reliability Engineer jobs

Sr. Manager Technology Ops & Reliability (100% Remote)

Clearcaptions

Remote$140,000 - $170,00011 months ago
59 views23 saves2 applies

Prepare for this role


Benefits

Visa SponsorshipRemote Work

Job Type

full time

Description

Who we are:

Since our founding in 2011, our mission has been to improve the lives of seniors and their caregivers. We are deeply passionate about communication and committed to becoming the foremost provider of services and solutions that enable seniors to lead more meaningful and independent lives. We also understand the power of connection and the profound impact it has on the lives of individuals who are hard-of-hearing. By utilizing enhanced automatic speech recognition, human captioning, and innovative product development, we deliver easy-to-use, cutting-edge technology to our primarily senior customer base. Our near real-time phone captioning technology allows individuals with hearing loss to see what callers are saying, enabling them to regain their connection to the world.


ClearCaptions is a Federal Communications Commission (FCC)-certified telephone captioning provider, adhering to the highest industry standards of privacy, security, and professionalism. We recognize the importance of maintaining the trust and confidence of our customers, and we continually strive to exceed their expectations.

For more information about our services please visit clearcaptions.com.

Position Summary:

The Senior Manager, Technology Operations & Reliability role will ensure our mission-critical calling platform remains highly available, reliable, and resilient. This role is pivotal in ensuring rapid incident resolution, and leading efforts to maintain platform stability. As an Incident Commander,the role will take charge in high-pressure situations, quickly diagnosing and addressing issues while driving long-term improvements to operational processes.

This is a Remote/Work from Home position reporting to the Director of Technology Operations

What you will do:

  • Proactively monitor and maintain the stability, performance, and availability of the ClearCaptions calling platform and other mission-critical systems.
  • Lead incident response and resolution efforts, ensuring minimal disruption and effective post-incident analysis to prevent recurrence.
  • Develop and implement automation, monitoring, and alerting to reduce downtime and enhance system observability.
  • Collaborate with engineering, infrastructure, and support teams to continuously improve system resilience and performance.
  • Act as the Incident Commander, coordinating rapid response and resolution for major outages, performance issues, and security incidents.
  • Ensure clear communication and coordination across teams, including engineering, customer support, and executive leadership, during incidents.
  • Drive root cause analysis (RCA) and implement corrective actions to mitigate future risks.
  • Establish and refine incident response processes, including runbooks, escalation protocols, and best practices.
  • Leverage DevOps principles to improve CI/CD pipelines, automate deployment processes, and enhance platform reliability.
  • Work with software engineers to develop self-healing infrastructure, improve fault tolerance, and optimize system performance.
  • Implement and manage observability tools (e.g., Prometheus, Grafana, Splunk, New Relic, Datadog) for proactive monitoring and diagnostics.
  • Partner with engineering and product teams to identify reliability gaps and implement long-term solutions.
  • Advocate for site reliability best practices, including chaos engineering, load testing, and capacity planning.
  • Stay ahead of industry trends and emerging technologies to improve system architecture and operational excellence.
  • Take an active talent management approach to onboard, mentor and inspire a diverse, highly engaged, and skilled team. Continually upgrade talent through timely talent management, development, succession planning and recruitment.

The kind of people we look for:

  • Versatile people who thrive on variety and challenge
  • Excited about working in a fast-paced environment
  • Innate problem solvers who want to grow in a flexible, collaborative culture
  • Takes initiative, pushes boundaries, motivated to innovate
  • Talented individuals with a growth mindset who want to use their learning and relationship-building skills
  • Align with our company core values: Integrity, Accountability, Collaboration, Service and Quality

Qualifications:

  • 8+ years of experience in Technology Operations, DevOps, or Site Reliability Engineering (SRE) roles.
  • Proven track record of leading incident response efforts and managing high-impact system outages.
  • Strong coding skills in Python, Go, Bash, or similar for scripting and automation.
  • Hands-on experience with cloud platforms (AWS, GCP, or Azure) and Kubernetes-based environments.
  • Deep understanding of networking, system performance tuning, and troubleshooting complex distributed systems.
  • Expertise in monitoring, logging, and alerting tools (e.g., Splunk, Prometheus, Grafana, ELK, Datadog, New Relic).
  • Familiarity with ITIL, SRE principles, and DevOps methodologies.
  • Respectful and outstanding leadership skills that motivate colleagues to focus their energy on achieving business goals.
  • Ability to plan and manage at both strategic and tactical operational levels. Works to achieve goals while overcoming obstacles and/or planning for contingencies.
  • Strong analytical, planning and budgeting skills. Ability to influence others.
  • Excellent verbal and written communication skills,presentation,and problem-solving skills.
  • Self-starter with strong organizational and time management skills, self-directed and able handle multiple priorities with demanding timeframes.
  • Ability to work collaboratively with colleagues and staff to create high-quality results-driven,team-oriented environment.
  • Demonstrated ability to use discretion, make sound decisions, and maintain confidentiality.
  • Proficient in MS Office, modern communication tools for virtual teams (i.e., MS Teams)

Physical Demands:

Employees may experience the following physical demands for extended periods of time:

  • Sitting, standing, and walking (95-100%)
  • Keyboarding (40-60%)
  • Viewing computer monitor, tablet and cell phone screen requiring close vision (95-100%)

Work Environment:

100% Remote: Work environment is at home.


Compensation:

$140,000 to $170,000/yr prospectively plus 10% bonus determined by competitive market analysis and internal equity considerations. Final compensation will be based on the candidate’s qualifications, experience, and business needs. For details on our comprehensive benefits program, visit www.clearcaptions.com/careers to explore our total rewards package.

Intrigued to learn more?

When you apply for this role, your information will be personally reviewed by our talent acquisition team (not by a robot). You can expect to hear back from us if we think there could be a fit and what next steps look like.

ClearCaptions is an equal opportunity employer committed to inclusion and diversity. All employment decisions are based on business needs, job requirements, and individual qualifications, without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.

Disclaimer:

The above information in this description has been designed to indicate the general nature and level of work performed by employees within this classification. It is not designed to contain or be interpreted as a comprehensive inventory of all duties, responsibilities, and qualifications required of employees to do this job.

CC does not offer sponsorship for work authorization. Candidates must be authorized to work for any employer in the US without a current or future need for Visa sponsorship

This job is found at InterviewStack.io

Skills

automationmonitoringci/cdobservabilityprometheusgrafanasplunknew relicdatadogpythonbashawsgcpazureelkbudgetingtalent managementsuccession planningcustomer supportroot cause analysissite reliability engineeringcapacity planningincident responseload testing