Prepare for Your Kafka Interview: Basic to Advanced Questions
Apache Kafka is a distributed event streaming platform used for high-throughput, fault-tolerant data processing. This comprehensive guide features 30 Kafka interview questions arranged by difficulty level, covering conceptual, practical, and scenario-based topics. Ideal for freshers, candidates with 1-3 years, and 3-6 years of experience preparing for roles at companies like Zoho, Paytm, Salesforce, Atlassian, and Swiggy.
Basic Kafka Interview Questions (1-10)
1. What is Apache Kafka?
Apache Kafka is a distributed streaming platform that allows publishing and subscribing to streams of records, similar to a message queue or enterprise messaging system but with a design optimized for high throughput, fault tolerance, and scalability.
2. What are the core components of Kafka architecture?
The core components include Topics (categories for messages), Partitions (ordered, immutable sequences within topics), Brokers (servers in the Kafka cluster), Producers (publish messages to topics), and Consumers (subscribe to topics to read messages).
3. What are the main APIs in Kafka?
Kafka has four main APIs: Producer API for publishing streams of records, Consumer API for subscribing to topics and processing records, Streams API for stream processing applications, and Connector API for integrating with external systems.
4. What is a Kafka Topic?
A Topic is a category or feed name to which records are published. Topics are partitioned into multiple partitions for scalability, and each partition holds an ordered, immutable sequence of records that are continually appended to.
5. What is a Kafka Partition?
A Partition is the unit of parallelism and scalability in Kafka. Each topic is divided into partitions, and records are distributed across partitions based on a key or round-robin, enabling parallel processing by consumers.
6. What is a Kafka Broker?
A Broker is a Kafka server that stores data and handles read/write requests for partitions. Multiple brokers form a cluster, with one broker acting as the leader for a partition while others act as followers.
7. What is a Kafka Producer?
A Producer is an application that sends records to one or more Kafka topics. Producers decide which partition a message goes to based on the message key or a partitioning strategy.
8. What is a Kafka Consumer?
A Consumer subscribes to one or more topics and reads records from them. Consumers label themselves with a consumer group name, ensuring each record is delivered to one consumer instance per group.
9. What is a Consumer Group in Kafka?
A Consumer Group is a group of consumers that coordinate to consume a set of topics. Each partition is consumed by exactly one consumer in the group, enabling load balancing and fault tolerance.
10. How does Kafka ensure fault tolerance?
Kafka ensures fault tolerance through replication of partitions across multiple brokers. Each partition has a leader and followers; if the leader fails, a follower is automatically elected as the new leader.
Intermediate Kafka Interview Questions (11-20)
11. What is the Replication Factor in Kafka?
The Replication Factor specifies the number of copies (replicas) of each partition across the cluster. A higher factor increases durability but requires more storage and network resources.
12. Explain Leader and Follower in Kafka.
The Leader is the broker responsible for all reads and writes for a partition. Followers replicate the leader’s log passively and serve reads if configured, taking over as leader during failures.
13. What are In-Sync Replicas (ISR) in Kafka?
ISR are replicas that are fully caught up with the leader. Producers can wait for acknowledgments from all ISR for durability, and leaders are elected only from ISR during failures.
14. What are the Producer Acknowledgment Modes (acks) in Kafka?
acks=0: No acknowledgment (fastest, least durable). acks=1: Leader acknowledgment only. acks=all: All ISR acknowledgment (most durable, highest latency).
15. How does Kafka achieve high throughput and low latency?
Kafka achieves this via sequential disk I/O, zero-copy transfers, batching of messages, compression, partitioning for parallelism, and efficient network protocols.
16. What is Kafka Retention Policy?
Retention Policy defines how long messages are kept in a topic, based on time (log.retention.hours) or size (log.retention.bytes). Expired messages are deleted to free storage.
17. What is Log Compaction in Kafka?
Log Compaction is a retention policy that keeps the latest value for each message key, removing older duplicates. It’s useful for stateful storage where only the current state matters.
18. Explain Exactly-Once Semantics (EOS) in Kafka.
EOS ensures each record is processed exactly once using idempotent producers and transactional APIs, preventing duplicates even with retries or failures.
19. What are Kafka Transactions?
Transactions allow atomic operations across multiple partitions/topics, ensuring all-or-nothing commits. They support EOS for producers writing to multiple topics reliably.
20. What is the Kafka Streams API?
Kafka Streams API is a client library for building stream processing applications directly on Kafka topics, supporting operations like filtering, aggregating, and joining streams.
Advanced Kafka Interview Questions (21-30)
21. How does Kafka ensure data consistency?
Consistency is ensured via replication, ISR management, configurable acknowledgments, atomic writes per partition, and idempotent producers to avoid duplicates.
22. What is Multi-Tenancy in Kafka?
Multi-Tenancy allows multiple users/teams to share a Kafka cluster securely by configuring topic-level access controls and quotas for production/consumption rates.
23. How do you design a fault-tolerant Kafka architecture for real-time processing at Paytm?
Use high replication factor (3+), multiple data centers, acks=all, idempotent producers, proper partitioning for load balancing, and monitoring for ISR shrinkage.
24. What are Kafka Connect and its types?
Kafka Connect is a framework for scalable data integration. Source Connectors ingest data into Kafka, Sink Connectors export data from Kafka to other systems.
25. Explain Kafka Operations like adding partitions.
Operations include creating/deleting topics, increasing partitions (via Add Partition Tool), graceful shutdowns, cluster expansion, and data migration between clusters.
26. How would you handle backpressure in a Kafka cluster at Salesforce?
Handle backpressure by configuring producer batch sizes, linger.ms, enable.idempotence, monitoring consumer lag, and scaling consumers/partitions dynamically.
27. What is the role of Java in Kafka?
Java is used for Kafka’s core implementation to achieve high processing rates. It provides strong community support for producer/consumer clients.
28. Scenario: A Zoho topic has uneven partition load. How to fix it?
Redesign key partitioning strategy for even distribution, increase partitions and rebalance consumers, or use custom partitioners to avoid hotspots.
29. How to monitor and optimize Kafka performance at Atlassian?
Monitor broker metrics (under-replicated partitions, ISR count), consumer lag, throughput. Optimize with compression, batching, and tuning log.segment.bytes.
30. Scenario: Swiggy needs to mirror data between Kafka clusters. How?
Use MirrorMaker tool for replication between clusters, configuring replication factor, handling offsets, and ensuring consistent topic configurations.
Master these Kafka interview questions to confidently demonstrate your expertise across all levels. Practice explaining concepts clearly for technical discussions.