Network Reliability Engineer
MARGO
WarsawRemotePLN 200 - PLN 250/hr2 weeks ago
49 views19 saves0 applies
Prepare for this role
Job Type
contract
Description
#HPC #AI #GPU #CLUSTERS
YOUR DAILY ROUTINE
- Build a large AI infrastructure with monitoring, diagnosis, and remediation of production incidents- Troubleshoot high-impact production issues in collaboration with other engineering teams
- Participate in an on-call rotation to handle incidents and ensure service continuity
- Implement and maintain observability solutions to monitor AI infrastructure and application health
- Contribute to AI infrastructure lifecycle management across different environments and countries
- Promote and apply best practices in terms of stability, resiliency, scalability, and security
- Maintain clear technical documentation for tools and procedures
- Contribute to system and tool evolution based on production feedback
- Collaborate closely with development teams to ensure infrastructure readiness- Participate in team rituals and knowledge-sharing initiatives
ABOUT YOU
🎯 SOFTSKILLS :
- Proactive and solution-oriented mindset
- Passion for automation and continuous improvement
- Strong collaboration and communication skills
- Ability to work independently and in a team
- Willingness to mentor and share knowledge
💻 HARDSKILLS :
- Experience with Go or Python
- Strong scripting skills (Bash, Python)
- Hands-on experience with Linux systems (Ubuntu/Debian)
- Preferred hands-on experience with GPU & HPC infrastructure
- Knowledge of networking (TCP/IP, DNS, BGP, load-balancing, IPv6, etc.)
- Familiarity with monitoring and logging tools (Prometheus, Grafana, Elastic, etc.)
- Comfortable with Infrastructure-as-Code (Ansible, Salt, AWX, etc.)
- Experience managing relational databases (MariaDB)
- Understanding of CI/CD pipelines (GitLab)
- Comfortable with English (written and spoken)
This job is found at InterviewStack.io
Skills
monitoringobservabilityscalabilityautomationpythonbashlinuxdnsbgpprometheusgrafanaansibleci/cdgitlabrelational databasestechnical documentationload balancing
About MARGO
MARGO is a tech-native IT consulting firm specializing in complex, high-value projects for the financial services, energy, industry, and technology sectors. Founded in 2005, MARGO has over 400 employees across offices in France (Paris headquarters), the UK (London), Poland (Warsaw), and other European locations. The firm excels in data engineering, artificial intelligence, software engineering, capital market technologies, and cloud transformation.