What's New in Apache Kafka 2.5.0
On behalf of the Apache Kafka® community, it is my pleasure to announce the release of Apache Kafka 2.5.0. The community has created another exciting release.
We are making progress on KIP-500 and have added new metrics and security features, among other improvements. This blog post goes into detail on some of the added functionality, but to get a full list of what’s new in this release, please see the release notes.
Kafka broker, producer, and consumer
KIP-500 update
In Apache Kafka 2.5, some preparatory work has been done towards the removal of Apache ZooKeeper™ (ZK).
- KIP-555: details about the ZooKeeper deprecation process in admin tools
- KIP-543: dynamic configs will not require ZooKeeper access
Exactly once semantics (EOS) – Foundational improvements
KIP-447: Producer scalability for exactly once semantics
This KIP simplifies the API for applications that read from and write to Kafka transactionally. Previously, this use case typically required separate producer instances for each input partition, but now there is no special requirement. This makes it much easier to build EOS applications that consume large numbers of partitions. This is foundational for a similar improvement in Kafka Streams in the next release.
See KIP-447 for more details.
KIP-360: Improve reliability of idempotent/transactional producer
This KIP addresses a problem with producer state retention on the broker, which is what makes the idempotence guarantee possible. Previously, when the log was truncated to enforce retention or truncated from a call to delete records, the broker dropped producer state, which led to UnknownProducerId
errors. With this improvement, the broker instead retains producer state until expiration. This KIP also gives the producer a powerful way to recover from unexpected errors.
See KIP-360 for more details.
Metrics and operational improvements
KIP-515: Enable ZK client to use the new TLS supported authentication (ZK 3.5.7)
Apache Kafka 2.5 now ships ZooKeeper 3.5.7. One feature of note is the newly added ZooKeeper TLS support in ZooKeeper 3.5. When deploying a secure Kafka cluster, it’s critical to use TLS to encrypt communication in transit. Apache Kafka 2.4 already ships with ZooKeeper 3.5, which adds TLS support between the broker and ZooKeeper. However, configuration information has to be passed via system properties as -D
command line options on the Java invocation of the broker or CLI tool (e.g., zookeeper-security-migration
), which is not secure. KIP-515 introduces the necessary changes to enable the use of secure configuration values for using TLS with ZooKeeper.
ZooKeeper 3.5.7 supports both mutual TLS authentication via its ssl.clientAuth=required
configuration value and TLS encryption without client certificate authentication via ssl.clientAuth=none
.
See KIP-515 for more details.
KIP-511: Collect and Expose Client’s Name and Version in the Brokers
Previously, operators of Apache Kafka could only identify incoming clients using the clientId
field set on the consumer and producer. As this field is typically used to identify different applications, it leaves a gap in operational insight regarding client software libraries and versions. KIP-511 introduces two new fields to the ApiVersionsRequest
RPC: ClientSoftwareName
and ClientSoftwareVersion
.
These fields are captured by the broker and reported through a new set of metrics. The metric MBean pattern is:
kafka.server:clientSoftwareName=(client-software-name),clientSoftwareVersion=(client-software-version),listener=(listener),networkProcessor=(processor-index),type=(type)
For example, the Apache Kafka 2.4 Java client produces the following MBean on the broker:
kafka.server:clientSoftwareName=apache-kafka-java,clientSoftwareVersion=2.4.0,listener=PLAINTEXT,networkProcessor=1,type=socket-server-metrics
See KIP-511 for more details.
KIP-559: Make the Kafka Protocol Friendlier with L7 Proxies
This KIP identifies and improves several parts of our protocol, which were not fully self-describing. Some of our APIs have generic bytes
fields, which have implicit encoding. Additional context is needed to properly decode these fields. This KIP addresses this problem by adding the necessary context to the API so L7 proxies can fully decode our protocols.
See KIP-559 for more details.
KIP-541: Create a fetch.max.bytes configuration for the broker
Kafka consumers can choose the maximum number of bytes to fetch by setting the client-side configuration fetch.max.bytes
. Too high of a value may degrade performance on the broker for other consumers. If the value is extremely high, the client request may time out. KIP-541 centralizes this configuration with a broker setting that puts an upper limit on the maximum number of bytes that the client can choose to fetch.
See KIP-541 for more details.
Kafka Connect
KIP-558: Track a connector’s active topics
During runtime, it’s not easy to know the topics a sink connector reads records from when a regex is used for topic selection. It’s also not possible to know which topics a source connector writes to. KIP-558 enables developers, operators, and applications to easily identify topics used by source and sink connectors.
$ curl -s 'http://localhost:8083/connector/a-source-connector/topics' {"a-source-connector":{"topics":["foo","bar","baz"]}}
The topic tracking is enabled by default but can also be disabled with topic.tracking.enable=false
.
See KIP-558 for more details.
Kafka Streams
KIP-150: Add Cogroup to the DSL
In the past, aggregating multiple streams into one could be complicated and error prone. It generally requires you to group and aggregate all of the streams into tables, then make multiple outer join calls. The new co-group operator cleans up the syntax of your programs, reduces the number of state store invocations, and overall increases performance.
KTablecogrouped = grouped1 .cogroup(aggregator1) .cogroup(grouped2, aggregator2) .cogroup(grouped3, aggregator3) .aggregate(initializer1, materialized1);
See KIP-150 for more details.
KIP-523: Add toTable() to the DSL
A powerful way to interpret a stream of events is as a changelog and to materialize a table over it. KIP-523 as a toTable()
function can be applied to a stream and materializes the latest value per key. It’s important to note that any null values will be interpreted as deletes for a given key (tombstones).
See KIP-523 for more details.
KIP-535: Allow state stores to serve stale reads during rebalance
Previously, interactive queries (IQs) against state stores would fail during the time period when there is a rebalance in progress. This degraded the uptime of applications that depend on the ability to query Kafka Streams’ tables of state. KIP-535 gives applications the ability to query any replica of a state store and observe how far each replica is lagging behind the primary.
See KIP-535 and this blog post for more details.
Deprecations
We have dropped support for Scala 2.11 in Apache Kafka 2.5. Scala 2.12 and 2.13 are now the only supported versions.
TLS 1.2 is now the default SSL protocol. TLS 1.0 and 1.1 are still supported.
Conclusion
To learn more about what’s new in Apache Kafka 2.5 and to see all the KIPs included in this release, be sure to check out the release notes and highlights video.