https://www.jesse-anderson.com/2018/08/creating-work-queues-with-apache-kafka-and-apache-pulsar/
A common use case for using Kafka and Pulsar is to create work queues. The two technologies offer different implementations for accomplishing this use case. I’ll discuss the ways of implementing work queues in Kafka and Pulsar as well as the relative strengths of doing each one.
A work queue is using a messaging technology to add a unit of work by publishing out a message. This message would be consumed by another process – ideally a cluster of processes – and then do some kind of work on it.
A work queue differs from other processing in the amount of time it takes. Most regular processing – like ETL or simple processing – of data should be measured in milliseconds or a second at the high end. A work queue will process for a much longer period of time that is measured in the seconds to minutes to hours.
This is also called a distributed work queue. This is because a single machine or process isn’t enough to keep up with the demand. We have to distribute this processing across many different processes and computers. With this need for distributed technologies, the complexity of the task increases 10-15x.
To help you understand a work queue, let me give you a few simple examples I’ve seen in the real-world. The common thread to all examples is the need to process for a longer period of time and get the result as soon as possible.
Some use cases require that a video be uploaded by a user. This video will be saved to bucket storage. Once the video is uploaded, the web service will publish a message that will be consumed by a cluster of consumers. The message would contain the bucket URL of the video to be transcoded. These consumers will transcode or prepare the video for use in web-friendly format. The transcoding will take minutes to hours to finish. Once the video is done, the transcoding process should publish a message that video is ready.
Some use cases process phone call data for call centers. Once the call is completed, several processes need to happen. First, the call’s voice conversation needs to have voice recognition run to change the audio to text. Next, the text will have various NLP or sentiment analysis run on it. The entire processing will take 1-60 minutes. Once the processing is done, a message needs to be published to flag the conversation or score the conversation.
The initial difficulty is to balance the work. You need to make sure that one long-running process doesn’t back up the rest of the queue. The rest of the work needs to continue unabated. You also need to be able to scale the cluster as more or less flows through.
The more difficult issue of work queues is dealing with errors: