Reducing Operational Overhead with Pulsar Functions | Jesse Anderson

https://www.jesse-anderson.com/2019/05/reducing-operational-overhead-with-pulsar-functions/

It’s been fascinating watching the operational world change over the years. We started out by racking and stacking anything that needed to run. We wisened up a bit and started using virtual machines. More recently we’ve moved into containerization. Surely, that’s the pinnacle of operational excellence.

As of now, that’s a yes, with a but. That “but” really comes into play when you’re dealing with messaging systems. This is when when you’re trying to operationalize producers, consumers, and consumer/producers.

Operationalizing Consumers and Producers

If you’ve ever operationalized a consumer, certain producers, or consumer/producers you know how tedious it is. If you haven’t let me briefly describe it.

Every single one of your consumers, producers, and consumer/producers needs to be a separate process. This just gets unwieldy on several levels. How do you deploy these processes? How do you update them with newer versions? How do you know each required process is running?

Maybe you’re lucky and you can use containers. That makes things a little easier. You have a better mechanism to spin up and monitor each process.

Even with containers, you still have a ton of duplicated code. Containers only solved part of the problem. Now you have to copy-and-paste Docker files and spin up YAMLs with minimal changes. Â You still have the copious boilerplate code that configures and starts the processing.

We’ve solved a few issues, but didn’t really solve the whole issue. This is because these consumers, producers, and consumer/producers all present a slightly different problem.

Boiling Down the Problem

Let’s boil our problem down the absolutely crucial components:

We need to receive data from one or more topics
We need to run our code
We need to output data to one or more topics

All of the other code that doesn’t support those 3 objectives is boilerplate code. This includes things like defining our main method or our Dockerfile. In an ideal world, we’d just focus on those 3 things.

You might have seen the same or similar concept called serverless computing. This is an approach that’s popular with the cloud providers. There are several parallels with Pulsar Functions. In both systems, you use a specific API and all underlying operations are taken care of for you.

Focusing with Pulsar Functions

Pulsar Functions is a way of running a custom function on data in Apache Pulsar. The functions run in a highly-available cluster that is separate from Pulsar, but is connected to it.