Devops/Sre
Io Tech Solutions Limited
Hong Kong Island, Hong Kong1 month ago
18 views8 saves2 applies
Prepare for this role
Job Type
full time
Description
About the Role:
We are seeking a skilled and motivated DevOps / Site Reliability Engineer (SRE) with 2+ years of experience to help us build, scale, and maintain robust, secure, and high-availability infrastructure. As a DevOps/SRE team member, you will work closely with development, QA, and operations teams to automate processes, monitor system health, and ensure the reliability of our services.
This is a hands-on role that requires strong technical skills, a deep understanding of modern DevOps tools and practices, and a problem-solving mindset.
Key Responsibilities:
- Design, implement, and maintain CI/CD pipelines for reliable code deployment
- Monitor application performance and system reliability using tools like Prometheus, Grafana, or Datadog
- Maintain and improve cloud infrastructure (e.g., AWS, GCP, Azure) following best practices
- Manage infrastructure as code using tools such as Terraform, Ansible, or CloudFormation
- Troubleshoot infrastructure and application issues, ensuring minimal downtime and fast resolution
- Automate repetitive operational tasks and improve development workflows
- Implement and enforce security, backup, and disaster recovery strategies
- Participate in on-call rotation and respond to incidents with root cause analysis and postmortem reviews
- Work closely with development teams to ensure applications are designed for performance, availability, and scalability
- Optimize resource usage and costs across cloud environments
Qualifications:
Required:
- Bachelors degree in Computer Science, Engineering, or a related field
- 2+ years of experience in a DevOps, SRE, or Systems Engineering role
- Hands-on experience with Linux/Unix system administration
- Experience with CI/CD tools such as Jenkins, GitHub Actions, CircleCI, or GitLab CI
- Working knowledge of cloud platforms (AWS, GCP, Azure)
- Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)
- Experience with infrastructure as code tools like Terraform, Ansible, or similar
- Proficient in at least one scripting or programming language (e.g., Bash, Python, Go)
- Strong understanding of monitoring, logging, and alerting systems
- Version control with Git
Preferred:
- Experience with Kubernetes administration in production environments
- Familiarity with security best practices and compliance standards
- Understanding of networking, load balancing, and DNS configurations
- Exposure to incident management and SLA/SLO/SLI concepts
- Experience working in Agile environments
This job is found at InterviewStack.io
Skills
ci/cdprometheusgrafanadatadogawsgcpazureinfrastructure as codeterraformansiblecloudformationscalabilitylinuxjenkinsgithub actionscirclecigitlabcontainerizationdockerkubernetesbashpythonmonitoringgitdnsagileincident managementroot cause analysissystems engineeringdisaster recoveryhigh availabilityload balancingsystem administration