Thursday, May 2, 2019

Optimizing Kafka Streams Applications

With the release of Apache Kafka® 2.1.0, Kafka Streams introduced the processor topology optimization framework at the Kafka Streams DSL layer. This framework opens the door for various optimization techniques from the existing data stream management system (DSMS) and data stream processing literature. In what follows, we provide some context around how a processor topology was generated inside Kafka Streams before 2.1 with a focus on stateful operations like aggregations and joins. Along with it, we will demonstrate a few known issues that impact the efficiency of the generated processor topology. Then we will describe how the processor topology generation phase was refactored to allow optimizations in Kafka 2.1 along with a few optimization techniques already offered in this release. We will wrap up this article with some operational guidelines on how to turn on topology optimizations when upgrading your Streams application to Kafka 2.1 and newer versions.

Kafka Streams Topology Generation 101

Before we present the topology optimization techniques introduced since the Kafka 2.1 release, let's first examine how a user's specified processing logic is presented in the constructed Streams processor topology and why such a topology could be better optimized in the first place.



from DZone.com Feed http://bit.ly/2VcoQCn

No comments:

Post a Comment