Govind Kamat, HBase contributor and Cloudera Performance Engineer

(Edited on 5/14/2015 -- changed ops/sec/RS to ops/sec.)

Running multiple workloads on HBase has always been challenging, especially when trying to execute real-time workloads while concurrently running analytical jobs. One possible way to address this issue is to throttle analytical MR jobs so that real-time workloads are less affected

A new QoS (quality of service) feature that Apache HBase 1.1 introduces is request-throttling, which controls the rate at which requests get handled by a HBase cluster. HBase typically treats all requests identically; however, the new throttling feature can be used to specify a maximum rate or bandwidth to override this behavior. The limit may be applied to a requests originating from a particular user, or alternatively, to requests directed to a given table or a specified namespace.

The objective of this post is to evaluate the effectiveness of this feature and the overhead it might impose on a running HBase workload. The performance runs carried out showed that throttling works very well, by redirecting resources from a user whose workload is throttled to the workloads of other users, without incurring a significant overhead in the process.

Enabling Request Throttling

It is straightforward to enable the request-throttling feature -- all that is necessary is to set the HBase configuration parameter hbase.quota.enabled to true. The related parameter hbase.quota.refresh.period specifies the time interval in milliseconds that that regionserver should re-check for any new restrictions that have been added.

The throttle can then be set from the HBase shell, like so:

hbase> set_quota TYPE => THROTTLE, USER => 'uname', LIMIT => '100req/sec'

hbase> set_quota TYPE => THROTTLE, TABLE => 'tbl', LIMIT => '10M/sec'

hbase> set_quota TYPE => THROTTLE, NAMESPACE => 'ns', LIMIT => 'NONE'

Test Setup

To evaluate how effectively HBase throttling worked, a YCSB workload was imposed on a 10 node cluster. There were 6 regionservers and 2 master nodes. YCSB clients were run on the 4 nodes that were not running regionserver processes. The client processes were initiated by two separate users and the workload issued by one of them was throttled.

More details on the test setup follow.

HBase version: HBase 0.98.6-cdh5.3.0-SNAPSHOT (HBASE-11598 was backported to this version)

Configuration:

CentOS release 6.4 (Final)

CPU sockets: 2

Physical cores per socket: 6

Total number of logical cores: 24

Number of disks: 12

Memory: 64 GB

Number of RS: 6

Master nodes: 2 (for the Namenode, Zookeeper and HBase master)

Number of client nodes: 4

Number of rows: 1080M

Number of regions: 180

Row size: 1K

Threads per client: 40

Workload: read-only and scan

Key distribution: Zipfian

Run duration: 1 hour

Procedure

An initial data set was first generated by running YCSB in its data generation mode. A HBase table was created with the table specifications above and pre-split. After all the data was inserted, the table was flushed, compacted and saved as a snapshot. This data set was used to prime the table for each run. Read-only and scan workloads were used to evaluate performance; this eliminates effects such as memstore flushes and compactions. One run with a long duration was carried out first to ensure the caches were warmed and that the runs yielded repeatable results.

For the purpose of these tests, the throttle was applied to the workload emanating from one user in a two-user scenario. There were four client machines used to impose identical read-only workloads. The client processes on two machines were run by the user “jenkins”, while those on the other two were run as a different user. The throttle was applied to the workload issued by this second user. There were two sets of runs, one with both users running read workloads and the second where the throttled user ran a scan workload. Typically, scans are long running and it can be desirable on occasion to de-prioritize them in favor of more real-time read or update workloads. In this case, the scan was for sets of 100 rows per YCSB operation.

For each run, the following steps were carried out:

Any existing YCSB-related table was dropped.
The initial data set was cloned from the snapshot.
The desired throttle setting was applied.
The desired workloads were imposed from the client machines.
Throughput and latency data was collected and is presented in the table below.

The throttle was applied at the start of the job (the command used was the first in the list shown in the “Enabling Request Throttling” section above). The hbase.quota.refresh.period property was set to under a minute so that the throttle took effect by the time test setup was finished.

The throttle option specifically tested here was the one to limit the number of requests (rather than the one to limit bandwidth).

Observations and Results

The throttling feature appears to work quite well. When applied interactively in the middle of a running workload, it goes into effect immediately after the the quota refresh period and can be observed clearly in the throughput numbers put out by YCSB while the test is progressing. The table below has performance data from test runs indicating the impact of the throttle. For each row, the throughput and latency numbers are also shown in separate columns, one set for the “throttled” user (indicated by “T” for throttled) and the other for the “non-throttled” user (represented by “U” for un-throttled).

Read + Read Workload

Throttle (req/sec)	Avg Total Thruput (ops/sec)	Thruput_U (ops/sec)	Thruput_T (ops/sec)	Latency_U (ms)	Latency_T (ms)
none	7291	3644	3644	21.9	21.9
2500 rps	7098	4700	2400	17	33.3
2000 rps	7125	4818	2204	16.6	34.7
1500 rps	6990	5126	1862	15.7	38.6
1000 rps	7213	5340	1772	14.9	42.7
500 rps	6640	5508	1136	14	70.2

image(2).png

As can be seen, when the throttle pressure is increased (by reducing the permitted throughput for user “T” from 2500 req/sec to 500 req/sec, as shown in column 1), the total throughput (column 2) stays around the same. In other words, the cluster resources get redirected to benefit the non-throttled user, with the feature consuming no significant overhead. One possible outlier is the case where the throttle parameter is at its most restrictive (500 req/sec), where the total throughput is about 10% less than the maximum cluster throughput.

Correspondingly, the latency for the non-throttled user improves while that for the throttled user degrades. This is shown in the last two columns in the table.

The charts above show that the change in throughput is linear with the amount of throttling, for both the throttled and non-throttled user. With regard to latency, the change is generally linear, until the throttle becomes very restrictive; in this case, latency for the throttled user degrades substantially.

One point that should be noted is that, while the throttle parameter in req/sec is indeed correlated to the actual restriction in throughput as reported by YCSB (ops/sec) as seen by the trend in column 4, the actual figures differ. As user “T”’s throughput is restricted down from 2500 to 500 req/sec, the observed throughput goes down from 2500 ops/sec to 1136 ops/sec. Therefore, users should calibrate the throttle to their workload to determine the appropriate figure to use (either req/sec or MB/sec) in their case.

Read + Scan Workload

Throttle (req/sec)	Thruput_U (ops/sec)	Thruput_T (ops/sec)	Latency_U (ms)	Latency_T (ms)
3000 Krps	3810	690	20.9	115
1000 Krps	4158	630	19.2	126
500 Krps	4372	572	18.3	139
250 Krps	4556	510	17.5	156
50 Krps	5446	330	14.7	242

image(1).png

image(2).png With the read/scan workload, similar results are observed as in the read/read workload. As the extent of throttling is increased for the long-running scan workload, the observed throughput decreases and latency increases. Conversely, the read workload benefits. displaying better throughput and improved latency. Again, the specific numeric value used to specify the throttle needs to be calibrated to the workload at hand. Since scans break down into a large number of read requests, the throttle parameter needs to be much higher than in the case with the read workload. Shown above is a log-linear chart of the impact on throughput of the two workloads when the extent of throttling is adjusted.

Conclusion

HBase request throttling is an effective and useful technique to handle multiple workloads, or even multi-tenant workloads on an HBase cluster. A cluster administrator can choose to throttle long-running or lower-priority workloads, knowing that regionserver resources will get re-directed to the other workloads, without this feature imposing a significant overhead. By calibrating the throttle to the cluster and the workload, the desired performance can be achieved on clusters running multiple concurrent workloads.