Senior Cloud Engineer
Merck & Co., Inc.
Prepare for this role
Benefits
Job Type
Description
Job Description
The Position
The Advanced Scientific Compute (ASC) team builds and operates the cloud HPC platform that underpins computational research at Our Company. AI/ML workloads are rapidly migrating to cloud, and this role exists to accelerate that transition — bringing dedicated AI/ML expertise into the HPC engineering team to strengthen integrations with AI pipelines, enable agentic AI capabilities for self-service, and drive better insights from scientific compute data.
What will you do?
- Design, deploy, and operate cloud HPC infrastructure on AWS (ParallelCluster, Slurm, S3, networking) supporting scientific research workloads.
- Lead integration of HPC environments with AI/ML pipelines — enabling training, inference, and orchestration workloads to run efficiently alongside traditional HPC jobs.
- Architect and implement agentic AI capabilities (e.g., LLM-based self-service tooling, automated job management) to improve researcher experience and reduce manual support overhead.
- Contribute to platform observability, cost optimisation, and capacity planning for cloud HPC environments.
- Partner with HPC Application Support and Client Support Engineering to translate researcher needs into platform improvements.
- Support onboarding and enablement of research teams adopting cloud-native and AI-augmented workflows.
- Participate in on-call rotation and incident response for HPC platform availability.
Qualifications, Skills & Experience Required
- 5+ years of cloud engineering experience, with hands-on AWS expertise (EC2, S3, VPC, IAM, EFS/FSx).
- Demonstrated experience with HPC environments — job schedulers (Slurm, PBS, or equivalent), parallel filesystems, MPI workloads.
- Proficiency in infrastructure-as-code (Terraform or CloudFormation) and scripting (Python, Bash).
- Experience integrating or operating ML workloads in cloud environments (training pipelines, model serving, batch inference).
- Strong systems thinking — able to diagnose performance, networking, and storage bottlenecks in distributed compute environments.
- Comfortable working in a regulated, enterprise environment with change management processes.
Nice to have
- Hands-on experience with agentic AI frameworks (e.g., LangChain, CrewAI, or similar) or LLM-based tooling.
- Familiarity with AWS ParallelCluster or equivalent cloud HPC orchestration platforms.
- Experience in pharma, biotech, or other scientific computing domains.
- Knowledge of container-based HPC workflows (Docker, Singularity/Apptainer).
What we offer
- Exciting work in a great team, global projects, international environment.
- Opportunity to learn and grow professionally within the company globally.
- Hybrid working model, flexible role pattern (e.g., even 80% full-time is possible in justified cases).
- Pension and health insurance contributions.
- Internal reward system plus referral programme.
- 5 weeks annual leave, 5 sick days, 15 days of certified sick leave paid above statutory requirements annually, 40 paid hours annually for volunteering activities, 12 weeks of parental contribution.
- Cafeteria for tax free benefits according to your choice (meal vouchers, sport, culture, health, travel, etc.), Multisport Card.
- Vodafone, Raiffeisen Bank and Foodora discount programmes.
- Up-to-date laptop and iPhone.
- Parking in the garage, showers, refreshments, massage chairs, library, music corner.
- Competitive salary, incentive pay, and many more.
Ready to take up the challenge? Apply now!
Know anybody who might be interested? Refer this job!
Required Skills:
Amazon Web Services (AWS), Amazon Web Services (AWS), Availability Management, Biodesign, Bioinformatics, Biopharmaceutical Industry, Biopharmaceuticals, Biopharmaceutics, Capacity Management, Change Controls, Change Management, Cloud Engineering, Design Applications, Drug Discovery Process, High Performance Computing (HPC), Incident Management, Information Management, Information Technology (IT) Infrastructure, Infrastructure As Code (IaC), Intelligent Agents, IT Service Management (ITSM), Pharmaceutical Biology, Release Management, Software Development, Software Development Life Cycle (SDLC) {+ 2 more}Preferred Skills:
Current Employees apply HERE
Current Contingent Workers apply HERE
Search Firm Representatives Please Read Carefully
Merck & Co., Inc., Rahway, NJ, USA, also known as Merck Sharp & Dohme LLC, Rahway, NJ, USA, does not accept unsolicited assistance from search firms for employment opportunities. All CVs / resumes submitted by search firms to any employee at our company without a valid written search agreement in place for this position will be deemed the sole property of our company. No fee will be paid in the event a candidate is hired by our company as a result of an agency referral where no pre-existing agreement is in place. Where agency agreements are in place, introductions are position specific. Please, no phone calls or emails.
Employee Status:
RegularRelocation:
No relocationVISA Sponsorship:
NoTravel Requirements:
No Travel RequiredFlexible Work Arrangements:
HybridShift:
Not IndicatedValid Driving License:
NoHazardous Material(s):
n/aJob Posting End Date:
07/15/2026*A job posting is effective until 11:59:59PM on the day BEFORE the listed job posting end date. Please ensure you apply to a job posting no later than the day BEFORE the job posting end date.
This job is found at InterviewStack.io
Skills
About Merck & Co., Inc.
Merck & Co., Inc. is a global healthcare company known for its innovative pharmaceutical products and vaccines. It operates worldwide, focusing on research and development in the pharmaceutical and biotechnology sectors.