Prepare for your Elasticsearch interview with these 30 essential questions covering basic, intermediate, and advanced topics. This guide is designed for freshers, candidates with 1-3 years of experience, and professionals with 3-6 years, helping you master conceptual, practical, and scenario-based questions.
Basic Elasticsearch Interview Questions (1-10)
1. What is Elasticsearch?
Elasticsearch is a distributed, RESTful search and analytics engine capable of handling large volumes of data in near real-time. It is schema-free, horizontally scalable, and built on Apache Lucene.[1][3]
2. What are the primary use cases of Elasticsearch?
Elasticsearch is used for application search, enterprise search, website search, log analytics, business analytics, security analytics, geospatial data analysis, application performance monitoring, and infrastructure metrics monitoring.[1]
3. What is an Elasticsearch index?
An Elasticsearch index is a logical division of documents similar to a database in relational systems. It contains records specific to that index, like product data or customer data in an e-commerce application.[6]
4. What is the difference between an index and a type in Elasticsearch?
An index is a collection of documents while types are logical categories or partitions within an index. The semantics of types are determined by the application.[1]
5. What is an Elasticsearch cluster?
An Elasticsearch cluster is a collection of one or more nodes (servers) that work together to store data and provide failover and high availability. Nodes can be added or removed without affecting data availability.[5]
6. What are shards and replicas in Elasticsearch?
Shards are the basic unit of work in Elasticsearch, allowing horizontal scaling by splitting data across nodes. Replicas are copies of shards that provide high availability and load balancing during searches.[3]
7. How do you check the health of an Elasticsearch cluster?
Use the GET _cluster/health?v API to check cluster status, number of nodes, active shards, and overall health (green, yellow, or red).[1]
8. What is the purpose of the refresh interval in Elasticsearch?
The refresh interval determines how often newly indexed documents become visible for search operations. The default is 1 second, providing near real-time search capabilities.[3]
9. How do you check Elasticsearch indices information?
Use GET _cat/indices?v to display information about indices including space usage, shard count, and document count.[1]
10. What database does Elasticsearch use?
Elasticsearch is a NoSQL database that focuses on search capabilities rather than traditional database operations.[4]
Intermediate Elasticsearch Interview Questions (11-20)
11. What is the difference between match and term queries?
Match query performs full-text search and analyzes the input text, while term query looks for exact matches without analysis, suitable for structured data like numbers or keywords.[3]
12. How do you perform a full-text search in Elasticsearch?
Use the match query in the _search endpoint. Example:
GET /products/_search
{
"query": {
"match": {
"description": "wireless headphones"
}
}
}
[3]
13. What are the different types of queries in Elasticsearch?
Elasticsearch has full-text queries (match, multi-match, query_string) for text searches and term-level queries (range, exists, prefix, wildcard, fuzzy) for structured data.[1]
14. What is an analyzer in Elasticsearch?
An analyzer determines how text is indexed and searched. It consists of character filters, tokenizers, and token filters. Elasticsearch provides built-in and custom analyzers.[6]
15. How do you create an index with specific mappings?
Use the PUT index API with mappings definition. Example for a product index at Flipkart:
PUT /products
{
"mappings": {
"properties": {
"name": { "type": "text" },
"price": { "type": "float" },
"category": { "type": "keyword" }
}
}
}
16. What is the Query DSL in Elasticsearch?
Query DSL (Domain Specific Language) is a JSON-based query language for constructing complex queries, filters, and aggregations in Elasticsearch.[5]
17. How do you check node allocation in a cluster?
Use GET _cat/allocation?v to see disk usage and shard allocation across nodes.[1]
18. What are the different node roles in Elasticsearch?
Nodes can have roles like master-eligible, data, ingest, coordinating. By default, nodes take all roles unless specified in elasticsearch.yml.[5]
19. How do you perform a fuzzy search in Elasticsearch?
Use fuzzy query for approximate matching. Example:
GET /products/_search
{
"query": {
"fuzzy": {
"name": {
"value": "headphone",
"fuzziness": "AUTO"
}
}
}
}
[3]
20. What is fielddata in Elasticsearch?
Fielddata shows memory usage of fields per node. Monitor it using GET _cat/fielddata?v to identify memory-intensive fields.[1]
Advanced Elasticsearch Interview Questions (21-30)
21. How do you ensure data consistency in a distributed Elasticsearch cluster?
Use proper replica configuration, quorum writes, and sequential ID generation for updates. Monitor cluster health and use the _refresh API judiciously.[2]
22. What is reindexing and when is it necessary?
Reindexing creates a new index with updated mappings and migrates data from the old index. It’s necessary for schema changes that existing mappings cannot accommodate.[2]
23. How do you design an efficient Elasticsearch schema for high performance?
Define explicit mappings, choose correct data types, minimize field mappings, use keyword fields for aggregations, and avoid dynamic mapping where possible.[2]
24. How do you scale an Elasticsearch cluster horizontally?
Add more data nodes, distribute shards evenly, configure appropriate shard and replica counts, and use dedicated coordinating nodes for query load balancing.[2]
25. Write a query to aggregate document count per category in a Zoho CRM index.
GET /crm_leads/_search
{
"aggs": {
"categories": {
"terms": {
"field": "category.keyword"
}
}
}
}
[3]
26. How do you handle multi-tenant data in Elasticsearch for a SaaS platform like Salesforce?
Use index per tenant pattern, tenant ID filtering, or separate clusters. Implement field-level security and document-level access control.[2]
27. What are the best practices for bulk indexing in Elasticsearch?
Use _bulk API with 5-15MB batches, disable refresh_interval during bulk operations, use consistent routing, and monitor indexing rate.[2]
28. How do you optimize search performance for high query loads at Paytm?
Use filters instead of queries when possible, leverage caching, optimize shard size (20-50GB), use search templates, and monitor slow logs.[2]
29. Explain the difference between refresh and flush operations.
Refresh makes indexed documents searchable by creating a new segment (in-memory). Flush persists segments to disk for durability.[5]
30. How would you implement a recommendation system using Elasticsearch at Swiggy?
Use function_score query with user behavior scoring, more_like_this query for content similarity, and aggregations for popular items. Handle cold-start with default recommendations.[2]