Posted in

Top 30 Cassandra Interview Questions and Answers for All Experience Levels

Prepare for your Cassandra interview with these 30 essential questions covering basic, intermediate, and advanced topics. This guide is designed for freshers, candidates with 1-3 years of experience, and professionals with 3-6 years, helping you master Cassandra concepts, practical scenarios, and advanced configurations.

Basic Cassandra Interview Questions (1-10)

1. What is Apache Cassandra?

Apache Cassandra is a highly scalable, distributed NoSQL database designed to handle large amounts of data across many commodity servers. It features a decentralized architecture with no single point of failure, providing high availability.[1]

2. What are the key features of Cassandra?

Cassandra offers high performance, fault tolerance, predictable scaling, and a distributed architecture. It supports schema-optional data models and excels at handling high incoming data volumes with both read and write scalability.[2]

3. What is a Keyspace in Cassandra?

A Keyspace in Cassandra is a namespace that groups multiple column families, similar to a database in relational systems. It defines data replication strategies and contains tables (column families).[3]

4. What is a Column Family in Cassandra?

A Column Family is a collection of rows where each row is identified by a row key and consists of multiple columns. It is analogous to a table but supports sparse data storage.[6]

5. Explain the Cassandra data model components.

The Cassandra data model includes Cluster (multiple nodes and keyspaces), Keyspace (groups column families), Column Family (multiple columns with row key reference), and Column (name, value, timestamp).[3]

6. What is Memtable in Cassandra?

Memtable is an in-memory data structure where Cassandra temporarily stores writes before flushing them to disk. It acts as a write buffer for fast data ingestion.[6]

7. What is SSTable in Cassandra?

SSTable is an immutable, sorted string table file on disk where data from Memtable is flushed. Unlike relational tables, SSTables are append-only and never modified after creation.[3]

8. What is CQL in Cassandra?

CQL (Cassandra Query Language) is a SQL-like language used to interact with Cassandra databases. It allows performing CRUD operations through the cqlsh shell.[2]

9. What are the common ports used by Cassandra?

Cassandra uses port 7000 for inter-node communication, 7001 for SSL inter-node communication, 9042 for CQL client connections, and 7199 for JMX monitoring.[6]

10. Does Cassandra support ACID transactions?

Cassandra does not fully support ACID transactions but provides tunable consistency, lightweight transactions (using Paxos), and eventual consistency for high availability.[1]

Intermediate Cassandra Interview Questions (11-20)

11. How does Cassandra handle data replication?

Cassandra uses a peer-to-peer architecture for replication where data is copied across multiple nodes. Key replication strategies include SimpleStrategy for single datacenters and NetworkTopologyStrategy for multi-datacenter setups.[1]

12. What is the Replication Factor in Cassandra?

Replication Factor (RF) specifies the number of nodes where data is replicated. For example, RF=3 means each data piece is stored on three nodes for fault tolerance.[6]

13. Explain tunable consistency in Cassandra.

Tunable consistency allows developers to choose consistency levels for reads and writes, such as ONE, QUORUM, or ALL. This balances availability and consistency per operation.[3]

14. How does Cassandra write data?

Cassandra writes data first to a CommitLog for durability, then to Memtable. When Memtable reaches a threshold, it flushes to an SSTable on disk.[6]

15. What is a CommitLog in Cassandra?

CommitLog is a crash-recovery mechanism that records all writes sequentially to disk before they reach Memtable, ensuring data durability even if the node crashes.[6]

16. What is a Bloom Filter in Cassandra?

Bloom Filter is a probabilistic data structure that filters out unnecessary SSTable reads during queries. It quickly determines if data might exist in an SSTable, reducing I/O operations.[3]

17. How does Cassandra handle deletes?

Cassandra handles deletes by writing tombstone markers instead of removing data immediately. These tombstones prevent deleted data from reappearing during reads until compaction cleans them up.[6]

18. What is a Tombstone in Cassandra?

A Tombstone is a special marker written during delete operations to suppress old data versions. Excessive tombstones can impact performance, requiring proper compaction strategies.[6]

19. Explain Consistency Levels for Read operations in Cassandra.

Read consistency levels include ONE (fastest, reads from one replica), QUORUM (reads from majority of replicas), LOCAL_QUORUM (majority in local datacenter), and ALL (strongest, all replicas).[6]

20. Can you change the Replication Factor on a live Cassandra cluster?

Yes, you can update the Replication Factor on a live cluster using ALTER KEYSPACE commands, followed by running nodetool repair to stream data to new replicas.[6]

Advanced Cassandra Interview Questions (21-30)

21. What are the advantages and disadvantages of Cassandra?

Advantages include high scalability, fault tolerance, and decentralized architecture. Disadvantages involve complex data modeling, eventual consistency challenges, and higher maintenance overhead.[1]

22. Scenario: At Zoho, how would you design a Cassandra schema for high-write user session data?

Use a single wide row per user with time-based clustering columns for sessions. Set high replication factor and use LOCAL_QUORUM consistency for fast writes across datacenters.

23. Explain compaction strategies in Cassandra.

Compaction strategies like SizeTieredCompactionStrategy (STCS) merge similar-sized SSTables, LeveledCompactionStrategy (LCS) organizes into levels for read efficiency, and TimeWindowCompactionStrategy (TWCS) for time-series data.[1]

24. What is the role of Gossip protocol in Cassandra?

Gossip protocol enables nodes to share cluster state information like node status, heartbeats, and schema changes in a decentralized peer-to-peer manner without a master node.[1]

25. Scenario: Paytm experiences read timeouts during peak loads. How do you troubleshoot in Cassandra?

Check read consistency level (reduce from ALL to QUORUM), increase read_request_timeout_in_ms, verify Bloom filter settings, and run nodetool compactionstats to address pending compactions.

26. What are Secondary Indexes in Cassandra and their limitations?

Secondary Indexes allow querying non-primary key columns but create additional data structures that can impact write performance and are less efficient for high-cardinality data.[3]

27. Explain Materialized Views in Cassandra.

Materialized Views are automatically maintained tables that duplicate primary key structure with different clustering columns, enabling efficient queries on alternate sort orders.[1]

28. Scenario: Salesforce needs multi-datacenter replication. Which strategy do you choose?

Use NetworkTopologyStrategy with different replication factors per datacenter, specifying topology via snitch configuration for rack-aware and datacenter-aware replication.[1]

29. What is the Hint mechanism in Cassandra?

Hints are stored repair messages that allow a coordinator node to deliver writes to temporarily unavailable replicas once they recover, within a configurable time window.[6]

30. How would you optimize Cassandra for time-series data at Swiggy?

Implement TimeWindowCompactionStrategy, partition by time buckets with device_id as clustering key, use TWCS for automatic time-based compaction, and set appropriate gc_grace_seconds.[1]

Leave a Reply

Your email address will not be published. Required fields are marked *