I. Basic Concepts#
-
NRT:
Near Realtime, which has two levels of meaning: one is the very small delay (about 1 second) from writing a piece of data to when it can be searched, and the other is that search and analysis operations based on Elasticsearch can take seconds. -
Cluster
A cluster provides indexing and search services externally, consisting of one or more nodes. The cluster name determines which cluster each node belongs to (the default name is elasticsearch). -
Node
A single Elasticsearch server instance is called a node. A node is part of a cluster, and each node has an independent name, which is obtained as a UUID by default at startup, but can also be configured manually. A node can only join one Elasticsearch cluster. -
Primary Shard
The primary shard is the data storage unit. Horizontal scaling is achieved by increasing shards, specified during index creation and cannot be modified afterward. -
Replica Shard
A replica shard is a copy of the shard's data, ensuring high availability and data loss prevention, while also sharing the search requests of the shard, improving the overall throughput and performance of the cluster. -
Index
An index is a collection of documents with the same structure, analogous to a database instance in a relational database (after the removal of types in Es 6.0, it is essentially equivalent to a single data table). -
Type
Type was deprecated after Es 6.0. Reference link: Removal of mapping types -
Document
A document is the smallest data storage unit in Elasticsearch, in JSON format, similar to a record in a relational database (a row of data), with diverse structure definitions. Documents under the same index should have as similar a structure as possible.
Lucene Retrieval#
Lucene Concepts#
Basic retrieval process: Query Analysis => Tokenization => Keyword Retrieval => Search Ranking
Inverted Index#
Traditionally, our retrieval is done by traversing articles one by one to find the corresponding keyword positions. Inverted indexing, however, forms a mapping relationship table between words and articles through tokenization strategies, allowing articles to be found in O(1) time using keywords. In simple terms, it is finding documents by content, whereas traditional forward indexing like MySQL is finding documents by ID.
The underlying implementation is based on the FST (Finite State Transducer) data structure. Lucene began to use this data structure extensively starting from version 4+. FST has two advantages:
-
Small space occupancy. It compresses storage space by reusing prefixes and suffixes of words in the dictionary.
-
Fast query speed. The query time complexity is O(len(str)).
Reference links🔗:
elasticsearch inverted index principle
In-depth analysis of Lucene's dictionary FST
II. Using Elasticsearch#
CRUD Operations#
- Get API
- Delete API
- Update API
- Bulk API
Search#
- Search API
- Aggregations
- Query DSL
- Elasticsearch SQL
Analyzer Tokenization#
Standard Analyzer
: The default tokenizer, splits by words and converts uppercase to lowercase.Simple Analyzer
: Splits by non-letter characters (symbols are filtered out) and converts uppercase to lowercase.Stop Analyzer
: Splits by stop words (the, is) and converts uppercase to lowercase.Whitespace Analyzer
: Splits by whitespace without converting uppercase to lowercase.IK
: A Chinese tokenizer that requires plugin installation.ICU
: An internationalized tokenizer that requires plugin installation.jieba
tokenizer: A popular Chinese tokenizer.
Index Management#
Alias#
-
Grouping multiple indexes
When creating indexes by month, consider creating indexes by day first. Using index templates can automatically create daily indexes for log-type data, and then use monthly index aliases to categorize daily indexes.
-
Flexibly change indexes without modifying code, allowing zero-downtime migration of index data.
For example, when needing to change an index (modifying shards, mapping, renaming...), simply bind the newly created index to the corresponding alias. After data migration is complete, the old index's alias binding can be removed.
Zero-downtime migration reference official documentation: Changing Mapping with Zero Downtime
-
Can be used to create different "views" of the same index.
Suitable for multi-tenant scenarios, for example, if different users need to see different data under a certain index, a
filtered alias
can be created for filtering.
-
Scenarios with multiple physical indexes that need to be written through aliases.
When writing to multiple physical indexes through an alias, specify a
write index
, and all index and update requests pointing to the alias will attempt to resolve to a single index, i.e., the write index. Each alias can only assign one index as a write index. If no write index is specified and the alias references multiple indexes, writing is not allowed.
Example operation:
curl -X POST "localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d'
{
"actions" : [
{ "remove" : { "index" : "test1", "alias" : "alias1" } },
{ "add" : { "index" : "test2", "alias" : "alias1" } }
]
}
curl -X POST "localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d'
{
"actions" : [
{
"add" : {
"index" : "test1",
"alias" : "alias2",
"filter" : { "term" : { "user" : "kimchy" } }
}
}
]
}
Reference link: Index Aliases
Rollover API#
Create indexes based on dates.
Reference link: Rollover Index
Index Mapping#
- Mapping settings
Index Template
Dynamic Template
: Dynamically set field types based on the data types recognized by Elasticsearch combined with field names.
Reference links:
Routing#
When indexing a document, it will be stored in a primary shard. How does Elasticsearch know which shard a document should be stored in? When we create a document, how does it decide whether it should be stored in shard 1
or shard 2
?
In fact, this process is determined by the following formula:
shard = hash(routing) % number_of_primary_shards
routing
is a variable value, which defaults to the document's _id
, but can also be set to a custom value. routing
generates a number through a hash function, and then this number is divided by number_of_primary_shards
(the number of primary shards) to get the remainder. This remainder, which is between 0
and number_of_primary_shards-1
, is the location of the document's shard.
This is why we must determine the number of primary shards when creating an index and never change this number: because if the number changes, all previously routed values will become invalid, and the document will no longer be found. (In new versions, Es can support splitting or shrinking the primary shards of an index under certain conditions, but it can only be split into n times or reduced to primary shard count/n, and cannot change from 8 to 9 or 9 to 8).
III. Cluster Architecture#
Cluster Roles#
- Master Node: Globally unique, responsible for cluster elections and managing cluster-level operations and changes. The master node can also act as a data node, but this is not recommended.
- Data Node: Responsible for storing data and executing data-related operations. Generally, data read and write processes only interact with data nodes.
- Ingest Node: A concept introduced in Es version 5.0. Preprocessing operations allow for data transformation through defined
processors
andpipeline
before indexing documents, i.e., before data is written. Reference link: Ingest Node - Coordinating Node: The coordinating node forwards client requests to the data nodes that store the data. Each data node executes the request locally and returns the results to the coordinating node, which collects these results and merges them into a single global result.
- Tribe Node: Deprecated and replaced by Cross-cluster search.
Cluster Health Status#
- Green: All primary and replica shards are operating normally.
- Yellow: Primary shards are normal, but not all replica shards are operating normally, indicating a potential single point of failure risk.
- Red: Some primary shards are not operating normally.
Each index also has the above three statuses; if any replica shard is not normal, it is in Yellow status.
You can check the cluster status using curl -X GET "localhost:9200/_cluster/health?pretty"
, for more details refer to: Cluster health API.
Cluster Expansion#
When expanding the cluster and adding nodes, shards will be evenly distributed across the nodes in the cluster, thus balancing the load for indexing and searching processes. These are all done automatically by the system.
Reference: Expansion Design.
Major Internal Modules#
Cluster#
The Cluster module encapsulates the implementation of cluster management executed by the master node, managing the cluster state and maintaining configuration information at the cluster level. Its main functions include:
- Managing the cluster state and publishing the newly generated cluster state to all nodes in the cluster.
- Calling the allocation module to execute shard allocation, deciding which shards should be allocated to which nodes.
- Directly migrating shards among the nodes in the cluster to maintain data balance.
Allocation#
Encapsulates functions and strategies related to shard allocation, including the allocation of primary and replica shards. This module is called by the master node. The shard allocation process is required when creating new indexes or fully restarting the cluster.
Discovery Module#
Responsible for discovering nodes in the cluster and electing the master node. When nodes join or leave the cluster, the master node takes appropriate actions. From a certain perspective, the discovery module acts similarly to ZooKeeper, electing the master and managing the cluster topology.
Gateway#
Responsible for the persistent storage of the cluster state data received from the master and restoring them when the cluster is fully restarted.
Indices Module#
Manages global-level index settings, excluding index-level settings (index settings are divided into global and per-index levels). It also encapsulates the index data recovery function. The recovery of primary and replica shards needed during the cluster startup phase is implemented in this module.
HTTP#
The HTTP module allows access to ES's API via JSON over HTTP. The HTTP module is essentially completely asynchronous, meaning there is no blocking thread waiting for a response. The benefit of using asynchronous communication for HTTP is to solve the C10k problem (10k-level concurrent connections). In some scenarios, consider using HTTP keepalive to improve performance. Note: Do not use HTTP chunking on the client side.
Transport Module#
Used for internal communication between nodes within the cluster. Every request from one node to another uses the transport module. Like the HTTP module, the transport module is also completely asynchronous. The transport module uses TCP communication, and each node maintains several TCP long connections with other nodes. All communication between internal nodes is carried by this module.
Engine#
The Engine module encapsulates operations on Lucene and calls to translog, serving as the final provider for read and write operations on a shard.
ES uses the Guice framework for modular management. Guice is a lightweight dependency injection framework developed by Google. In software design, it is often said to depend on abstractions rather than concretions, and IoC is the realization of this concept, internally implementing object creation and management.