Cloud & Infrastructure Topics
Cloud platform services, infrastructure architecture, Infrastructure as Code, environment provisioning, and infrastructure operations. Covers cloud service selection, infrastructure provisioning patterns, container orchestration (Kubernetes), multi-cloud and hybrid architectures, infrastructure cost optimization, and cloud platform operations. For CI/CD pipeline and deployment automation, see DevOps & Release Engineering. For cloud security implementation, see Security Engineering & Operations. For data infrastructure design, see Data Engineering & Analytics Infrastructure.
Cloud Platform Experience
Personal account of hands on experience using public cloud providers and the concrete results delivered. Candidates should describe specific services and patterns they used for compute, storage, networking, managed databases, serverless and eventing, and explain their role in architecture decisions, deployments, automation and infrastructure as code practices, continuous integration and continuous delivery pipelines, container orchestration, scaling and performance tuning, monitoring and incident response, and cost management. Interviewees should quantify outcomes when possible with metrics such as latency reduction, cost savings, availability improvements or deployment frequency and note any formal training or certifications. This topic evaluates depth of practical experience, ownership, and the ability to operate and improve cloud systems in production.
Load Balancing and Horizontal Scaling
Covers principles and mechanisms for distributing traffic and scaling services horizontally. Includes load balancing algorithms such as round robin, least connections, and consistent hashing; health checks, connection draining, and sticky sessions; and session management strategies for stateless and stateful services. Explains when to scale horizontally versus vertically, capacity planning, and trade offs of each approach. Also includes infrastructure level autoscaling concepts such as auto scaling groups, launch templates, target tracking and step scaling policies, and how load balancers and autoscaling interact to absorb traffic spikes. Reviews different load balancer types and selection criteria, integration with service discovery, and operational concerns for maintaining availability and performance at scale.
Technical Vision and Infrastructure Roadmap
This topic assesses a candidate's ability to define a multi year technical vision for infrastructure, platform, and systems and to translate that vision into a practical execution roadmap. Core skills include evaluating technology choices and architecture evolution, planning migration and modernization paths, anticipating scalability and capacity needs, and balancing cost performance with resilience and operational reliability. Candidates should demonstrate approaches to managing technical debt, sequencing investments across quarters and releases, estimating resources and timelines, establishing measurable infrastructure goals and key performance indicators, and implementing governance and standards. Discussion may also cover reliability and observability, security and compliance considerations, trade offs between short term stability and long term rearchitecture, prioritization to enable business outcomes, and communicating technical trade offs to both technical and non technical stakeholders.
Analyzing Requirements and Service Selection
Given a business requirement (e.g., 'store real-time game data with sub-millisecond latency'), systematically identify appropriate cloud services and justify your choice based on performance, cost, and operational considerations. Articulate trade-offs explicitly.
Build vs. Buy vs. Cloud vs. On Premise Trade Offs
Understanding key trade-offs in technology decision-making: (1) Build vs. Buy - custom development flexibility vs. packaged software speed/cost, (2) Cloud vs. On-Premise - operational burden, control, scalability, security, cost, (3) SaaS vs. Licensed - flexibility, upgrade frequency, customization options. Understanding implications for cost, time-to-value, flexibility, control, and ongoing support.
Capacity Planning and Resource Optimization
Covers forecasting, provisioning, and operating compute, memory, storage, and network resources efficiently to meet demand and service level objectives. Key skills include monitoring resource utilization metrics such as central processing unit usage, memory consumption, storage input and output and network throughput; analyzing historical trends and workload patterns to predict future demand; and planning capacity additions, safety margins, and buffer sizing. Candidates should understand vertical versus horizontal scaling, autoscaling policy design and cooldowns, right sizing instances or containers, workload placement and isolation, load balancing algorithms, and use of spot or preemptible capacity for interruptible workloads. Practical topics include storage planning and archival strategies, database memory tuning and buffer sizing, batching and off peak processing, model compression and inference optimization for machine learning workloads, alerts and dashboards, stress and validation testing of planned changes, and methods to measure that capacity decisions meet both performance and cost objectives.
Enterprise Cloud Architecture and Migration Strategy
Focuses on enterprise scale cloud architecture and migration planning, including multi cloud and hybrid cloud strategies, governance, cost optimization, compliance, security, and disaster recovery. Covers cloud migration patterns such as lift and shift, refactoring, replatforming, and full rearchitecting, plus data migration strategies, cutover and rollback plans, network and identity architecture, and workload placement decisions. Candidates should demonstrate understanding of differences between major cloud providers such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure, and describe how to evaluate provider strengths, select migration approaches, and design resilient, cost effective enterprise cloud solutions.
Cloud Basics AWS Fundamentals
Basic AWS concepts relevant to DevOps: EC2 instances, S3 storage, IAM for access control, basic networking (VPCs, security groups), and understanding how cloud services differ from on-premises infrastructure.
Real World Scenario Based Decision Making
Applying infrastructure knowledge to realistic business scenarios: handling traffic spikes, migrations from on-premises to cloud, optimizing costs during resource constraints, responding to security incidents, and managing infrastructure during rapid growth. Making trade-off decisions when constraints conflict.