Database Engineering & Data Systems Topics
Database design patterns, optimization, scaling strategies, storage technologies, data warehousing, and operational database management. Covers database selection criteria, query optimization, replication strategies, distributed databases, backup and recovery, and performance tuning at database layer. Distinct from Systems Architecture (which addresses service-level distribution) and Data Science (which addresses analytical approaches).
Relational Databases and SQL
Focuses on relational database fundamentals and practical SQL skills. Candidates should be able to write and reason about SELECT queries, JOINs, aggregations, grouping, filtering, common table expressions, and window functions. They should understand schema design trade offs including normalization and denormalization, indexing strategies and index types, query performance considerations and basic optimization techniques, how to read an execution plan, and transaction semantics including isolation levels and ACID guarantees. Interviewers may test writing efficient queries, designing normalized schemas for given requirements, suggesting appropriate indexes, and explaining how to diagnose and improve slow queries.
Database Scalability and High Availability
Architectural approaches and operational practices for scaling and maintaining database availability. Topics include vertical versus horizontal scaling trade offs; replication topologies, leader and follower roles, read replicas and replica lag; read write splitting and connection pooling; sharding and partitioning strategies including range based, hash based, and consistent hashing approaches; handling hot partitions and data skew; federation and multi database federation patterns; cache layers and cache invalidation; rebalancing and resharding strategies; distributed concurrency control and transactional guarantees across shards; multi region deployment strategies, cross region failover and disaster recovery; monitoring, capacity planning, automation for failover and backups, and cost optimization at scale. Candidates should be able to pick scaling approaches based on read and write patterns and explain operational complexity and trade offs introduced by distributed data.
Data Model Design and Access Patterns
Discuss how you'd design data models based on access patterns. Understand relational vs. NoSQL trade-offs. Know when to denormalize, how to handle distributed transactions, and strategies for scaling databases (sharding, partitioning). Discuss read vs. write optimization.
Data Partitioning and Sharding
Techniques and operational practices for horizontally partitioning data across multiple database instances or storage nodes to achieve scale, improve performance, and manage growth. Includes selection and design of partition and shard keys to evenly distribute load and avoid hotspots, with range based, hash based, and directory based approaches and consistent hashing mechanisms. Covers handling uneven distribution and data skew, hotspot detection and mitigation, and the impact of partitioning on query patterns such as joins and cross shard queries. Explains implications for transactions and consistency, including transactional boundaries that span partitions and approaches to distributed transactions and compensation. Describes resharding and online data migration strategies, rolling rebalances, and methods to minimize downtime and data movement. Emphasizes operational concerns including shard management, automation, monitoring and alerting, failure recovery, and performance tuning. Discusses trade offs between simplicity, latency, throughput, and operational complexity and highlights considerations for both transactional and analytical workloads, including routing, caching, and coordination patterns.
Database Selection and Trade Offs
How to evaluate and choose data storage systems and architectures based on workload characteristics and business constraints. Coverage includes differences between relational and nonrelational families such as document stores, key value stores, wide column stores, graph databases, time series databases, and search engines; mapping query patterns and latency requirements to storage options; trade offs between strong consistency and eventual consistency and their impact on availability and complexity; partition key design, replication strategies, and high availability considerations; operational concerns including backups, monitoring, vendor and cost trade offs, migration or hybrid strategies, and when to adopt polyglot persistence. Senior level discussion includes selecting specific managed services and reasoning about expected load patterns, failure modes, and operational burden.
Database and Caching Strategies
Understanding when and how to use persistent databases and in memory caching to improve application performance and scalability. Candidates should be able to explain read and write optimization patterns, basic indexing concepts, trade offs between consistency and availability, replication and sharding approaches, cache types and placement, cache invalidation strategies and time to live policies, and when an in memory cache such as Redis or Memcached is appropriate. Discussion should include common bottlenecks, monitoring and operational considerations, and how caching interacts with application correctness and latency.
Relational Schema Design and Normalization
Designing schemas for relational databases and applying normalization principles to reduce redundancy and maintain data integrity. Candidates should understand the normal forms including first normal form, second normal form, third normal form, and Boyce Codd normal form; primary keys, foreign keys, referential integrity, and how to model relationships such as one to one, one to many, and many to many using junction tables. Coverage includes entity relationship modeling, data modeling techniques, handling hierarchical or recursive data, choosing appropriate data types, and recognizing normalization violations in poorly designed schemas. Also discuss practical denormalization trade offs for performance, when and how to intentionally denormalize, designing schemas for maintainability and common query patterns, and considerations for analytics schemas such as star schemas and slowly changing dimensions.
SQL Fundamentals and Query Writing
Comprehensive query writing skills from basic to intermediate level. Topics include SELECT and WHERE, joining tables with inner and outer joins, grouping with GROUP BY and filtering groups with HAVING, common aggregation functions such as COUNT SUM AVG MIN and MAX, ORDER BY and DISTINCT, subqueries and common table expressions, basic window functions such as ROW_NUMBER and RANK, union operations, and principles of readable and maintainable query composition. Also covers basic query execution awareness and common performance pitfalls and how to write correct, efficient queries for combining and summarizing relational data.
Consistency Models and Transactions
Comprehensive knowledge of data consistency models and transactional guarantees in databases and distributed systems. This includes understanding transaction properties such as Atomicity, Consistency, Isolation, and Durability (ACID) and alternative design philosophies such as Basically Available, Soft state, Eventually consistent (BASE). Candidates should be able to choose appropriate isolation levels including read uncommitted, read committed, repeatable read, serializable, and snapshot isolation and explain performance versus correctness tradeoffs and common anomalies such as dirty reads, non repeatable reads, phantom reads, lost updates, and write skew. Understand consistency models including strong consistency, strict serializability, serializability, snapshot isolation, causal consistency, eventual consistency, monotonic reads, and read your writes, and when each model is acceptable based on latency, availability, and business correctness requirements. Discuss replication strategies and their impact on guarantees, including synchronous versus asynchronous replication, multi region replication, replication lag, and replica divergence. Evaluate distributed transaction and coordination approaches such as two phase commit and consensus based protocols and weigh their performance and failure modes. Propose conflict detection and resolution strategies such as last write wins, version vectors and vector clocks, conflict free replicated data types, application level reconciliation, idempotent operations, retries, and saga or compensation patterns for long running workflows. Consider practical engineering concerns including consistency service level objectives, monitoring and alerting for staleness and replication lag, testing strategies for consistency, implications for caching and sharding, and the tradeoffs between developer complexity and user facing correctness.