Spring Cloud Stream Kafka — The story of retry and dead letter queue 💀
What to do when your super rude complaint to Amazon for late delivery failed to reach them?
What the hell is Spring Cloud Stream? aka A quick refresher
As I type out this incredibly awesome article on a super awesome feature that someone else builds, I realised there are two ways I could provide you with a quick overview of what Spring cloud does.
- Literally copy and paste the textbook definition of the framework
- Provide a dumb (quick) but okayish original explanation on what it is and I’m gonna go with the dumb version. Last chance to turn back if you don’t want your brain cells to die. Also a quick reminder, This article will contain a ton of out of context gifs from really bad shows. That’s just a way for me to stay awake while I type this out. Kindly ignore them.
Still here? Lets go then
So basically, Spring cloud stream is yet another framework in a long list of frameworks that helps us with sharing event-driven messages (you aforementioned amazon complaint) with multiple microservices and it is highly scalable
There are three main component that you NEED to know in this
- Binder — It helps us communicate with the message broker. In this case, we're going to be binding with Kafka. (We can also bind it with RabbitMQ 🐰)
- Binding — This is an interface that connects us with the topic.
- Message — You Amazon late shipping message (event data)
That's it for the introduction. Now for the stuff we're all here for
Why and How should I retry. Can't we just give up?
Remember the good old days? when the internet was slow and we had to wait for a couple of minutes for a song to get downloaded. Remember that? and when the download is almost done at 99%… It freaking fails!!. We SCREAM Ahhhh!!!!.
But we persevere and never give up because WE. NEED. THAT. Taylor Swift song (Do not judge me) NO.MATTER.WHAT! because it is critical.
Almost similarly, We need to retry because every message we send through out microservices are critical even if a dumb guy sends in dumb bad data, It is critical because we need to get to the bottom of WHY THE HELL that dumb guy sent those dumb details.
Now, How to retry?
Just like the way in which I find it difficult to speak with other fellow human beings and have to figure out how to do that on my own by testing out multiple techniques like staring at the person for more than 2 minutes without saying a word, The best way to implement retry mechanism for your Amazon complaint queue is by experimenting with your environment.
But the logic for retry is pretty much the same, The stuff that we can experiment on is the configurations like, How many times do I retry, Do I take a smoke break in between retries?.
The logic that we're gonna follow is this. Let's assume that an event failed to get consumed. Maybe a fellow intern shuts down the database server when he was trying to order pizza in the office computer.
The consumer in that case with all of its glorious purpose will throw out a DatabaseException of some kind.
Once the consumer throws out the exception, The spring cloud stream will then retry 3 times (by default) with a delay in between if we did specify them.
The best way to analyze that message and to figure out why it failed, We can then configure spring cloud steam to push the same message to a separate queue called the dead letter queue.
We can get creative here by using an AWS service to analyse the event to come up with solutions to handle events that failed or an alarm that gets triggered when an event with the subject "LOKI_VARIANT" got pushed into the queue so that it sends out an email to an organization called the "TVA" to handle the error.
Enough with this theory non-sense, How do I implement this?
As with any spring boot application, First, we'll start with the dependencies that we will need to add to get this working. You will need to add the dependencies mentioned below to your pom.xml file.
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream-binder-kafka</artifactId>
<version>3.1.0</version>
</dependency>
Then we will need to write a subscriber and for that we will need to create a bind interface to connect to the Kafka topic.
public interface ListenerBind {
String SAMPLE_CHANNEL = "poppyPants";
@Input(ListenerBind.SAMPLE_CHANNEL)
SubscribableChannel processSampleEvent();
}
Once we complete creating the binding interface, we can create the Listener class like the one below. As you can probably see, This listener is just printing the event data to the console but in real life, there might be cases where it might contact a database or make a class to third party APIs.
We will not be implementing that on account of me being too lazy but for now, we will explicitly throw an exception from the listener to test out the retry implementation.
@Component
public class Listener {
@StreamListener(ListenerBind.SAMPLE_CHANNEL)
public void processSampleEvent(String sampleData) {
System.out.println("Entering listener: " + sampleData);
throw new NullPointerException();
}
}
If you’ve noticed the binder interface we have a variable called SAMPLE_CHANNEL. This is not the Kafka topic but we will mention this channel in the application.properties file by pointing it to the correct Kafka topic.
The basic configuration:
The sample topic which we are going to use to test out this retry implementation is sample-topic (pretty original huh?).
spring.cloud.stream.bindings.poppyPants.destination=sample-topic
spring.cloud.stream.bindings.poppyPants.content-type=application/json
spring.cloud.stream.bindings.poopyPants.group=groupA
spring.kafka.bootstrap-servers=localhost:9093
The configuration for retry implementation:
We have specified the maxAttempts to be 3 which mean, In case of an exception the consumer will retry the same event thrice. The backOffInitialInterval denotes the time delay in millisecond before retrying an event.
spring.cloud.stream.bindings.poppyPants.consumer.maxAttempts=3
spring.cloud.stream.bindings.poppyPants.consumer.backOffInitialInterval=900000
spring.cloud.stream.bindings.poppyPants.consumer.backOffMaxInterval=900000
spring.cloud.stream.bindings.poppyPants.consumer.backoffMultiplier=1.0
spring.cloud.stream.bindings.poppyPants.consumer.defaultRetryable=false
The configuration for dead letter queue: 💀
The configuration for DLQ (Dead Letter Queue) is pretty straight forward. We have enabled DLQ by setting the property consumer.enableDlq to be true. We have to also provide the dead letter queue name which in our case is dlq-topic.
spring.cloud.stream.kafka.bindings.poppyPants.consumer.enableDlq=true
spring.cloud.stream.kafka.bindings.poppyPants.consumer.dlqName=dlq-topic
spring.cloud.stream.kafka.bindings.poppyPants.consumer.dlqPartitions=1
Almost done people and now to test this out, Make sure that you have Kafka server running locally and is in the port 9093. Once you have Kafka running locally, run the command to start a consumer for the dead letter queue so that we can see the failed message being consumed after three retries.
sh bin/kafka-console-consumer.sh — bootstrap-server localhost:9093 — topic dlq-topic
Now if we went ahead and published an event message “Wubba Lubba Dub Dub” to the sample-topic by running the following command.
sh bin/kafka-console-producer.sh — broker-list localhost:9093 — topic sample-topic
We can check the logs to see that the event was consumed three times before throwing out an exception from the listener. After that the message is then sent to the Dead letter queue.
Conclusion
I’d love to hear your thoughts or if you have a different approach than the ones outlined above!