In our previous article, we talked about real-time streaming data.
Now, let's consider the idea of windows. In Spark Streaming, we have small batches to come in, so we have RDD and then we have another RDD and so on.
Spark batches the incoming data according to your batch interval, but sometimes you want to remember things from the past. Maybe you want to retain a rolling thirty second average for some of your streaming data, but you want results every five seconds. In this case, you’d want a batch interval of five seconds, but a window length of thirty seconds. Spark provides several methods for making these kinds of calculations.
What if I want to see the highest value after every thirty minutes, and also update us with the highest value every five seconds?
Then it's a real problem, as every five seconds we are going to get a new brand RDD but we need to remember the data from previous RDDs.
from DZone.com Feed https://ift.tt/2CeeZTw
No comments:
Post a Comment