InterviewStack.io LogoInterviewStack.io
💾

Database Engineering & Data Systems Topics

Database design patterns, optimization, scaling strategies, storage technologies, data warehousing, and operational database management. Covers database selection criteria, query optimization, replication strategies, distributed databases, backup and recovery, and performance tuning at database layer. Distinct from Systems Architecture (which addresses service-level distribution) and Data Science (which addresses analytical approaches).

Handling Large Scale Data and Time Series Data

Design for efficient storage and querying of massive datasets. Understand time-series data patterns (metrics, logs), specialized solutions like InfluxDB or TimescaleDB, and archiving strategies for historical data.

42 questions

Database Selection and Trade Offs

How to evaluate and choose data storage systems and architectures based on workload characteristics and business constraints. Coverage includes differences between relational and nonrelational families such as document stores, key value stores, wide column stores, graph databases, time series databases, and search engines; mapping query patterns and latency requirements to storage options; trade offs between strong consistency and eventual consistency and their impact on availability and complexity; partition key design, replication strategies, and high availability considerations; operational concerns including backups, monitoring, vendor and cost trade offs, migration or hybrid strategies, and when to adopt polyglot persistence. Senior level discussion includes selecting specific managed services and reasoning about expected load patterns, failure modes, and operational burden.

40 questions

Distributed Database Architecture

Covers principles and patterns for designing databases that span nodes and regions. Subjects include replication strategies synchronous and asynchronous, partitioning and sharding approaches, leader follower and multi leader architectures, consensus mechanisms and their trade offs, consistency models including eventual consistency and strong consistency, cross region failover and disaster recovery, indexing and query routing in partitioned systems, transactional semantics and distributed transactions, and operational concerns such as backup, schema evolution, and performance tuning for distributed data stores.

0 questions

Infrastructure and Database Systems

Fundamental infrastructure and database engineering concepts relevant to analytics platforms and general backend systems. Topics include relational and non relational database architecture indexing strategies query optimization replication and consistency trade offs sharding and partitioning approaches caching systems design message queues and event streaming systems and how these components integrate to meet performance reliability and cost objectives. Candidates should be able to reason about capacity planning high availability disaster recovery backup strategies and operational concerns such as monitoring alerting and graceful degradation under load.

48 questions

Data Consistency and Recovery

Covers the spectrum of data consistency models used in distributed systems and the operational practices for detecting and recovering from inconsistency. Topics include strong consistency guarantees provided by atomicity, consistency, isolation, and durability style transactions and synchronous replication, and weaker models such as eventual consistency and causal consistency along with their read guarantees like read your writes and monotonic reads. Explain the trade offs between consistency, availability, and latency and how those trade offs influence architecture decisions, user experience, and cost. Discuss replication strategies including synchronous replication, asynchronous replication, and read replicas, and how replication modes affect staleness and failure behavior. Include coordination and consensus mechanisms for achieving stronger guarantees, for example leader based replication and consensus protocols, and distributed transaction approaches such as two phase commit. Cover operational concerns: how consistency choices change testing, deployment, monitoring, and incident response. Describe detection and recovery techniques for inconsistency such as validation checks, reconciliation and anti entropy processes, tombstones and conflict resolution strategies, use of vector clocks or conflict free replicated data types to resolve concurrent updates, point in time recovery and backups, and procedures for partial repairs, rollbacks, and replays. At senior levels also address how consistency decisions shape runbooks, alerting, and post incident analysis.

36 questions

Database Scalability and High Availability

Architectural approaches and operational practices for scaling and maintaining database availability. Topics include vertical versus horizontal scaling trade offs; replication topologies, leader and follower roles, read replicas and replica lag; read write splitting and connection pooling; sharding and partitioning strategies including range based, hash based, and consistent hashing approaches; handling hot partitions and data skew; federation and multi database federation patterns; cache layers and cache invalidation; rebalancing and resharding strategies; distributed concurrency control and transactional guarantees across shards; multi region deployment strategies, cross region failover and disaster recovery; monitoring, capacity planning, automation for failover and backups, and cost optimization at scale. Candidates should be able to pick scaling approaches based on read and write patterns and explain operational complexity and trade offs introduced by distributed data.

0 questions

Technology Selection & Deep Technical Knowledge

Deep understanding of specific technologies relevant to complex system design. Master databases (PostgreSQL, Cassandra, DynamoDB, Elasticsearch), message queues (Kafka, RabbitMQ), caching systems (Redis), search engines, and frameworks. Understand their strengths, weaknesses, trade-offs, operational characteristics, scaling patterns, and common pitfalls. Be able to justify technology choices based on specific system requirements.

40 questions

String and Date Manipulation

Covers practical skills for manipulating textual and temporal data. Typical expectations include string operations such as concatenation, substring extraction, case transformation, pattern replacement, and trimming, as well as date and time operations such as truncation, extracting date parts, computing differences, adding intervals, formatting, and handling time zones and daylight saving edge cases. Candidates may be asked to write or explain queries and small code snippets, reason about correctness and performance, and discuss pitfalls such as locale formats, leap seconds, and ambiguous input.

0 questions

Consistency Models and Transactions

Comprehensive knowledge of data consistency models and transactional guarantees in databases and distributed systems. This includes understanding transaction properties such as Atomicity, Consistency, Isolation, and Durability (ACID) and alternative design philosophies such as Basically Available, Soft state, Eventually consistent (BASE). Candidates should be able to choose appropriate isolation levels including read uncommitted, read committed, repeatable read, serializable, and snapshot isolation and explain performance versus correctness tradeoffs and common anomalies such as dirty reads, non repeatable reads, phantom reads, lost updates, and write skew. Understand consistency models including strong consistency, strict serializability, serializability, snapshot isolation, causal consistency, eventual consistency, monotonic reads, and read your writes, and when each model is acceptable based on latency, availability, and business correctness requirements. Discuss replication strategies and their impact on guarantees, including synchronous versus asynchronous replication, multi region replication, replication lag, and replica divergence. Evaluate distributed transaction and coordination approaches such as two phase commit and consensus based protocols and weigh their performance and failure modes. Propose conflict detection and resolution strategies such as last write wins, version vectors and vector clocks, conflict free replicated data types, application level reconciliation, idempotent operations, retries, and saga or compensation patterns for long running workflows. Consider practical engineering concerns including consistency service level objectives, monitoring and alerting for staleness and replication lag, testing strategies for consistency, implications for caching and sharding, and the tradeoffs between developer complexity and user facing correctness.

0 questions
Page 1/2