Cloud & Infrastructure Topics
Cloud platform services, infrastructure architecture, Infrastructure as Code, environment provisioning, and infrastructure operations. Covers cloud service selection, infrastructure provisioning patterns, container orchestration (Kubernetes), multi-cloud and hybrid architectures, infrastructure cost optimization, and cloud platform operations. For CI/CD pipeline and deployment automation, see DevOps & Release Engineering. For cloud security implementation, see Security Engineering & Operations. For data infrastructure design, see Data Engineering & Analytics Infrastructure.
Infrastructure and Systems Architecture
Comprehensive knowledge of enterprise infrastructure and systems architecture principles, trade offs, and design patterns. Candidates should be able to compare on premises versus cloud deployments and describe hybrid cloud models, virtualization and container strategies, compute and storage choices, and networking fundamentals including segmentation and connectivity. Topics include high availability and redundancy approaches, disaster recovery and business continuity planning, capacity and performance planning, systems integration and messaging patterns, and how infrastructure decisions affect security, compliance, cost, and operational complexity. Be prepared to discuss automation, infrastructure as code, monitoring and observability, and the rationale behind architecture choices for scalability and resilience.
Cloud Strategy and Migration Planning
Fundamentals and planning practices for adopting cloud computing and migrating workloads to cloud environments. Coverage includes understanding cloud delivery models such as Infrastructure as a Service, Platform as a Service, and Software as a Service, and hybrid deployment options. Candidates should be able to evaluate migration strategies including lift and shift, refactor, replatform, and rebuild, and assess trade offs across cost, performance, security, compliance, and organizational readiness. Planning topics include workload assessment, suitability analysis, vendor evaluation for examples like Amazon Web Services, Microsoft Azure, and Google Cloud Platform, migration sequencing and runbooks, data migration and networking considerations, identity and access patterns, testing and rollback strategies, monitoring and observability, cost optimization and governance, and stakeholder and change management during migration.
Cloud Platform Experience
Personal account of hands on experience using public cloud providers and the concrete results delivered. Candidates should describe specific services and patterns they used for compute, storage, networking, managed databases, serverless and eventing, and explain their role in architecture decisions, deployments, automation and infrastructure as code practices, continuous integration and continuous delivery pipelines, container orchestration, scaling and performance tuning, monitoring and incident response, and cost management. Interviewees should quantify outcomes when possible with metrics such as latency reduction, cost savings, availability improvements or deployment frequency and note any formal training or certifications. This topic evaluates depth of practical experience, ownership, and the ability to operate and improve cloud systems in production.
System Administration Tools and Monitoring
Knowledge of monitoring, logging, and operational tooling used to observe and manage infrastructure. Includes monitoring platforms such as Nagios, Zabbix, and Prometheus; log aggregation and search with ELK Stack or similar; metrics collection, time series databases, alerting rules, dashboards and visualization, instrumenting systems and services for observability, common system utilities such as top, iostat, sar, and vmstat, integration with incident management and ticketing systems, and determining which metrics matter for capacity and performance monitoring.
Enterprise Platform Management
Covers management of enterprise software platforms such as Microsoft 365, Google Workspace, and other software as a service and platform as a service offerings. Topics include user and lifecycle management, identity and access management, single sign on and authentication patterns, integration with internal systems and directories, licensing and cost management, automation and provisioning, monitoring and availability, security and compliance controls, backup and recovery, and vendor relationship and contract management.
Technical Vision and Infrastructure Roadmap
This topic assesses a candidate's ability to define a multi year technical vision for infrastructure, platform, and systems and to translate that vision into a practical execution roadmap. Core skills include evaluating technology choices and architecture evolution, planning migration and modernization paths, anticipating scalability and capacity needs, and balancing cost performance with resilience and operational reliability. Candidates should demonstrate approaches to managing technical debt, sequencing investments across quarters and releases, estimating resources and timelines, establishing measurable infrastructure goals and key performance indicators, and implementing governance and standards. Discussion may also cover reliability and observability, security and compliance considerations, trade offs between short term stability and long term rearchitecture, prioritization to enable business outcomes, and communicating technical trade offs to both technical and non technical stakeholders.
Cloud Infrastructure Knowledge (AWS/GCP/Azure)
Have working knowledge of at least one major cloud platform: common services (EC2/Compute Engine, RDS/Cloud SQL, S3/Cloud Storage, Load Balancers, VPCs, networking), typical failure modes, and how to troubleshoot within that platform. Understand concepts like availability zones, regions, and cross-region failover.
Capacity Planning and Resource Optimization
Covers forecasting, provisioning, and operating compute, memory, storage, and network resources efficiently to meet demand and service level objectives. Key skills include monitoring resource utilization metrics such as central processing unit usage, memory consumption, storage input and output and network throughput; analyzing historical trends and workload patterns to predict future demand; and planning capacity additions, safety margins, and buffer sizing. Candidates should understand vertical versus horizontal scaling, autoscaling policy design and cooldowns, right sizing instances or containers, workload placement and isolation, load balancing algorithms, and use of spot or preemptible capacity for interruptible workloads. Practical topics include storage planning and archival strategies, database memory tuning and buffer sizing, batching and off peak processing, model compression and inference optimization for machine learning workloads, alerts and dashboards, stress and validation testing of planned changes, and methods to measure that capacity decisions meet both performance and cost objectives.
Enterprise Cloud Architecture and Migration Strategy
Focuses on enterprise scale cloud architecture and migration planning, including multi cloud and hybrid cloud strategies, governance, cost optimization, compliance, security, and disaster recovery. Covers cloud migration patterns such as lift and shift, refactoring, replatforming, and full rearchitecting, plus data migration strategies, cutover and rollback plans, network and identity architecture, and workload placement decisions. Candidates should demonstrate understanding of differences between major cloud providers such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure, and describe how to evaluate provider strengths, select migration approaches, and design resilient, cost effective enterprise cloud solutions.