Prepare for your Kafka interview with these 30 essential questions covering basic, intermediate, and advanced topics. Ideal for freshers, 1-3 years, and 3-6 years experienced candidates preparing for roles at companies like Zoho, Paytm, Salesforce, Atlassian, and Swiggy.
Basic Kafka Interview Questions (1-10)
1. What is Apache Kafka?
Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. It acts as a publish-subscribe messaging system where producers publish messages to topics and consumers subscribe to those topics to process messages.
2. What are the main components of Kafka architecture?
The core components include topics (message categories), partitions (ordered logs within topics), brokers (servers managing data), producers (send messages), and consumers (read messages).
3. What is a Kafka topic?
A topic is a category or feed name to which messages are published. Each topic is divided into partitions, allowing for parallel processing and scalability across multiple brokers.
4. What is a partition in Kafka?
A partition is an ordered, immutable sequence of messages that serves as the unit of parallelism. Topics are divided into multiple partitions to enable horizontal scaling and load distribution across brokers.
5. What are Kafka brokers?
Brokers are Kafka servers responsible for storing and managing data in partitions. A Kafka cluster consists of multiple brokers that work together to handle data replication and fault tolerance.
6. What are the main Kafka APIs?
Kafka provides four main APIs: Producer API for publishing messages, Consumer API for subscribing to topics, Streams API for stream processing, and Connector API for integrating with external systems.
7. What is a consumer group in Kafka?
A consumer group is a group of consumers that coordinate to consume messages from topics. Each message from a partition is delivered to exactly one consumer in the group, enabling parallel processing.
8. What is the offset in Kafka?
Offset is a unique identifier for each message in a partition. Consumers track their position in partitions using offsets to resume reading from the last processed message.
9. What are producers in Kafka?
Producers are applications that publish messages to Kafka topics. They decide which partition a message goes to, either sequentially or based on a custom key.
10. What are consumers in Kafka?
Consumers subscribe to topics and read messages from partitions. They process messages and commit offsets to track progress and ensure no data loss.
Intermediate Kafka Interview Questions (11-20)
11. How does Kafka achieve fault tolerance?
Kafka achieves fault tolerance through replication. Each partition has multiple replicas across brokers, with one leader handling reads/writes and followers replicating data. If the leader fails, a new leader is elected from in-sync replicas (ISR).
12. What is the role of leader and follower in Kafka?
The leader partition handles all read/write requests, while followers replicate data from the leader. Followers stay in sync as part of ISR and can become leaders if needed.
13. What are In-Sync Replicas (ISR) in Kafka?
ISR are replicas that are fully caught up with the leader partition. Kafka only considers ISR for leader election to ensure data durability and availability.
14. What are the different acknowledgment modes for producers (acks)?
- acks=0: No acknowledgment (fastest, least durable)
- acks=1: Leader acknowledgment only
- acks=all: All ISR acknowledgment (most durable)
15. How does Kafka ensure high throughput?
Kafka achieves high throughput through batching, compression, zero-copy transfers, sequential disk I/O, and horizontal scaling via partitioning.
16. What is the retention policy in Kafka?
Retention policy defines how long messages are stored in a topic. It can be time-based (log.retention.hours) or size-based (log.retention.bytes), after which old messages are deleted.
17. How do you create a topic in Kafka?
Use the Kafka topic creation tool:
kafka-topics.sh --create --topic my-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 2
18. What is replication factor in Kafka?
Replication factor specifies the number of replicas for each partition. A factor of 3 means each partition has 3 copies across different brokers for fault tolerance.
19. How does Kafka guarantee message ordering?
Messages within a single partition maintain strict ordering. Kafka ensures order by appending messages sequentially to partition logs and using partition keys for consistent routing.
20. At Paytm, how would you configure consumer rebalancing?
Consumer rebalancing occurs when consumers join/leave groups. Configure session.timeout.ms and heartbeat.interval.ms to control rebalance frequency and partition assignment strategy (RangeAssignor, RoundRobinAssignor).
Advanced Kafka Interview Questions (21-30)
21. What is exactly-once semantics (EOS) in Kafka?
EOS ensures messages are processed exactly once using idempotent producers and transactional APIs. Transactions atomically write to multiple partitions and ensure consumer visibility only after commit.
22. How do you handle late-arriving messages in Kafka Streams?
Use watermarking and allowed lateness in Kafka Streams. Configure grace period to process late events within a window before discarding them as late.
23. What are Kafka Connect source and sink connectors?
Source connectors import data from external systems into Kafka topics. Sink connectors export data from Kafka topics to external storage like databases.
24. How would you monitor Kafka cluster health at Salesforce?
Monitor metrics like under-replicated partitions, ISR shrinkage, consumer lag, broker disk usage, and network throughput using Kafka’s JMX metrics and tools like Kafka Manager.
25. Explain Kafka log compaction.
Log compaction retains the latest value for each message key in a topic, enabling key-based storage. Configure with log.cleanup.policy=compact for stateful applications.
26. What happens during Kafka leader election?
When a leader fails, the controller selects a new leader from ISR (preferring the highest watermark replica). The new leader takes over reads/writes while followers sync up.
27. How do you tune Kafka for low latency?
Reduce batch.size and linger.ms for producers, increase num.network.threads and num.io.threads for brokers, use SSD storage, and optimize partition count for parallelism.
28. At Atlassian, design a multi-tenant Kafka deployment.
Implement topic quotas, client-id quotas, and separate namespaces per tenant. Use ACLs for authorization and resource isolation to prevent noisy neighbors.
29. How does Kafka handle backpressure?
Producers buffer messages and retry on failures. Consumers control fetch rates via max.poll.records and fetch.min.bytes. Brokers use replication quotas to manage load.
30. For Swiggy’s real-time order processing, design a fault-tolerant Kafka architecture.
Use replication factor 3, acks=all, min.insync.replicas=2, multiple AZs/datacenters, idempotent producers, and monitoring for ISR health with automatic leader election.
## Key Citations
– [1] Indeed: Kafka components, partitions, replicas, ISR
– [2] Gist: Kafka APIs, consumer groups, multi-tenancy, operations
– [3] Terminal.io: Fault tolerance, high throughput, Kafka Connect, architecture design
– [4] YouTube: Fault tolerance, replication factor, leader/follower
– [5] ProjectPro: Kafka benefits, scalability, fault tolerance
– [6] GeeksforGeeks: Fault tolerance, ISR, data consistency, acks
– [7] DataCamp: Architecture, message guarantees, ISR