HBase Series - HBase Read and Write Process

HBase Series - HBase Read and Write Process#

Content organized from:

HBase - Data Writing Process Analysis

HBase Principles - Data Reading Process Analysis

Data Writing Process#

Client-Side Writing Process#

After the user submits a put request, the HBase client will determine whether to directly submit it to the server for processing based on the setting autoflush=true/false (default is true). If it is false, the request will be added to the local buffer, and will only be submitted after exceeding a certain threshold (default is 2M, configurable via configuration files). This can improve write performance, but there is a risk of request loss due to client crashes.
Before submission, HBase will find the corresponding region server based on the rowkey in the meta table.meta.. This locating process is obtained through the HConnection's locateRegion method. If it is a batch request, these rowkeys will also be grouped according to HRegionLocation, with each group corresponding to one RPC request.
HBase will construct a remote RPC request MultiServerCallable<Row> for each HRegionLocation, and then execute the call through rpcCallerFactory.<MultiResponse> newCaller(), ignoring failure resubmissions and error handling. The client's submission operation ends here.

Server-Side Writing Process#

After the server-side Region Server receives the client's write request, it will first deserialize it into a Put object, then check operations such as whether the region is read-only, whether the memstore size exceeds blockingMemstoreSize, and then execute the following core operations:

Acquire row lock and Region update shared lock: HBase uses row locks to ensure that updates to the same row of data are mutually exclusive operations, ensuring the atomicity of updates, meaning either the update succeeds or fails.
Start write transaction: Obtain write number to implement MVCC, allowing non-locking reads of data, improving read performance while ensuring read-write consistency.
Write to memstore: Each column family in HBase corresponds to a store for storing that column's data. Each store has a write cache memstore for caching written data. HBase does not directly write data to disk but first writes it to the cache, and only when the cache reaches a certain size does it write to disk.
Append HLog: HBase uses the WAL mechanism to ensure data reliability, meaning it first writes logs and then writes to the cache. Even in the event of a crash, the original data can be restored through HLog recovery. This step constructs the data into a WALEdit object and writes it sequentially into HLog, without needing to execute a sync operation. Version 0.98 adopted a new write thread model to implement HLog logging, greatly enhancing overall data update performance.
Release row lock and shared lock
Sync HLog: HLog is truly synced to HDFS. The sync operation is executed after releasing the row lock to minimize lock holding time and improve write performance. If the sync fails, a rollback operation will remove the data already written in memstore.
End write transaction: At this point, the update operation of this thread will be visible to other read requests, and the update will take effect.
Flush memstore: When the write cache reaches 64M, a flush thread will be started to refresh the data to the hard disk. When the memstore data in HBase is flushed to disk, a storefile is formed. When the number of storefiles reaches a certain level, a compaction operation is required on the storefile files. The purpose of Compact: to merge files, clear expired and redundant version data, and improve read and write efficiency.

WAL Persistence Levels#

SKIP_WAL: Only write to cache, not to HLog. This has good performance since it only uses memory, but data loss is likely, not recommended.
ASYNC_WAL: Asynchronously write data to HLog.
SYNC_WAL: Synchronously write data to the log file. It should be noted that data is only written to the file system and not truly persisted.
FSYNC_WAL: Synchronously write data to the log file and force it to disk. This is the strictest log writing level, ensuring data will not be lost, but performance is relatively poor.
USER_DEFAULT: By default, if the user does not specify a persistence level, HBase uses the SYNC_WAL level to persist data.

Users can set the WAL persistence level through the client, code: put.setDurability(Durability.SYNC_WAL);

Reading Process#

The process of the client reading and writing data on HBase for the first time:

The client obtains the Region Server where the META table is located from Zookeeper;
The client accesses the Region Server where the META table is located, queries the Region Server of the access row key from the META table, and then the client caches this information along with the location of the META table;
The client retrieves data from the Region Server where the row key is located.

If read again, the client will obtain the Region Server of the row key from the cache. This way, the client does not need to query the META table again unless the region moves, causing the cache to become invalid. In that case, it will re-query and update the cache.

Note: The META table is a special table in HBase that stores the location information of all regions, and the location information of the META table itself is stored in ZooKeeper.

-w816

For a more detailed reading data process, refer to:

HBase Principles - Data Reading Process Analysis

HBase Principles - Delayed 'Data Reading Process Details'

HBase Query Methods#

Full table query: scan tableName
Single row query based on rowkey: get tableName, '1'
Range scan based on rowkey: scan tableName, {STARTROW=>'1', STOPROW=>'2'}

setCache() and setBatch() methods
Cache sets the number of rows returned by the server at once, while Batch sets the number of columns returned by the server at once.