Message Queues: Deep Dive
In a previous article, Introduction to Message Queues, we took a look at the fundamentals of message queues. The article introduced message queues, message producers and consumers, and the advantages to using message queues for asynchronous processing in our system architecture.
In this post, we will go further into the internals of message queues, and look at different strategies used to handle interaction between producers and consumers. We will also expand more on producers and consumers, and what application developers need to keep in mind when designing them to work with message queues.
The Operational Structure of Message Queues
A message queue does not work in isolation — it is mainly a broker of messages between message producers and message consumers. Each distinct entity in the setup has a responsibility and they should be decoupled as much as possible from each other.
The contract between all entities should only be valid messages with the message queue facilitating the movement of messages from producers to consumers.
In the following sections, we will take a deep dive into the responsibilities of each component and look at the different methods with which the message queue delivers a message to consumers.
We will use a hypothetical example of an invoicing service in an e-commerce application that will help highlight how these concepts work. Whenever a purchase is made, a request is sent to the invoicing service to prepare and send an invoice to the customer's email address.
Message producers initiate the asynchronous processing request, i.e. they are the source of the messages transferred in the message queue setup. In the invoicing service example described above, the message producer is the e-commerce store where the purchase is made.
Producers have a very simple responsibility in the message queue setup, which is to generate a valid message and publish it to the message queue. Message producers can also be referred to as message publishers, as they have to also submit the message to the queue after producing it.
Application developers determine where a message is produced in an application, for example when a purchase is made in the e-commerce store. Messages can be produced from different parts of the application and sent to the queue. These messages can be meant for the same consumer or different types of consumers (for example, some messages can be produced to target the invoicing service while another message targets a payment service).
Messages submitted to the queue are then queued up and delivered to consumers to be processed asynchronously.
Producers communicate with message queues using the Advanced Message Queuing Protocol (AMQP). This protocol allows multi-directional interaction between the 3 distinct entities in a message queue setup.
A message queue in its most basic form is a queue. You can literally implement a simple queue programmatically within your application that buffers messages and that consumers can pick messages from, and you have a message queue.
A message queuing system can be as simple as the one described above or as complex as having additional functionality for routing rules, persistence, security, etc.
Below is a list, ordered from simple to complex, of different ways of implementing a message queue:
- A shared folder allowing you to read and write files from it
- A component backed by an SQL database for persistence (this is the strategy for most homegrown message queues)
- A dedicated broker with functionalities for handling ingestion, persistence, and delivery of messages
When message queues are implemented using the last strategy in the list (with advanced functionalities and as a stand-alone application), they are referred to as message brokers.
Message brokers are the actual decoupling elements in the setup, sitting between and managing the process of communication between producers and consumers.
Being an independent application, brokers can also contain features/responsibilities such as:
- Permissions control
- Message validation
- Failure recovery
- Custom message routing rules
- Switch communication protocol to support different types of producers and consumers
Brokers are also optimized for high concurrency and throughput. Adding queues and queuing messages fast is one of their key responsibilities. They are also expected to be available at all times or else the communication breaks down.
Because of their simplicity, they are able to achieve higher throughput compared to relational databases.
Message brokers favor configuration over customization. Thus, no custom code is required to extend its functionalities but its behavior can be configured to suit the requirements of the system.
It is important to note that adding message brokers introduces an extra layer of complexity into your infrastructure and requires you to scale them as well. Brokers also have their specific requirements and limitations when it comes to scalability.
Message brokers also allow consumers to specify the type of messages they are interested in. This can be achieved by using a named queue or more advanced routing methods. The routing methods available to you depend on the type of message broker you decide to use in your infrastructure.
We will discuss routing methods later on in this post.
The main responsibility of consumers is to receive and process messages from the queue. Using our invoicing example above, the consumer is the invoicing service that will receive request messages from the queue, generate the invoice and send it to the customer's email address.
Most consumers are API services implemented by application developers and are the ones that perform that actual asynchronous processing.
Consumers can be implemented in different application language technologies and maintained independently from other components.
To achieve optimal decoupling, consumers should know nothing about the producers. The only contract that should exist between the two is valid messages from the queue. Messages can also be validated before being processed.
When properly decoupled, consumers can serve as independent service layers that can be used by both the message queue setup and other components in your infrastructure.
Consumer Communication Strategies
Message queues need to communicate messages down to consumers, so how do consumers become aware that there are messages to consume?
Depending on how application developers implement consumers, message queues have two distinct ways of making consumers aware of messages and delivering them.
In this model, the consumer periodically connects to and checks the status of the queue. This polling strategy is done at scheduled intervals programmed on the side of the consumer.
If there are messages in the queue, the consumer consumes them until there are no more messages, or stops when a certain amount of messages have been consumed. This amount can be configured on the message broker.
This method is often referred to as the pull method as the consumer is the one periodically checking to see if there are available messages to consume.
The following factors are common reasons for adopting this strategy:
- The scripting language in which the consumer application is implemented does not have a persistently running application container, e.g. languages like PHP, Perl, or Ruby.
- Messages are rarely added to the queue.
- Network connectivity is unreliable.
In this model, unlike the pull model, the queue does not wait for the consumer to actively read the message from it.
Once a message is added, the idly waiting consumer is notified and the message is then pushed down to it. Messages are pushed down to consumers at a rate at which the consumer can handle or at a configured rate on the side of the broker.
This approach is mostly used in situations where the consumer application is implemented in a language with a persistent application container (it's always running). These languages include Java, Node.js, and C#.
Consumer Subscription Methods
Aside from the different message transfer models, there are also ways in which consumers can subscribe to message queues to ensure that they are receiving the right messages. Messages need to be routed to the appropriate consumers to avoid confusion.
Let's take a look at the most used subscription strategies.
Direct worker queue method
This method helps to achieve load distribution amongst multiple instances of the same consumer.
In this delivery method, producers and consumers only need to know the name of the queue. This way, producers can know where to publish o and consumers can know where to consume from.
On one side of the named queue, you have multiple producers publishing messages to the queue. On the other side, you have consumers competing for messages. Each message arriving at the queue is routed to only one consumer; this way, one consumer sees only a subset of the total messages arriving at the queue.
Consumers in this setup are required to be uniform and stateless. One of the very common applications of this strategy is in distributing time-consuming tasks across multiple worker machines. Consumers can be easily scaled up by adding more instances.
This method can also be used when you need to send out multiple emails, process a lot of videos, or resize a lot of images.
This method involves publishing a message to a topic and not a queue. Each consumer connected to the broker maintains its private queue to receive messages from topics.
Unlike the direct worker queue method, consumers connected to the broker can perform different functionalities. For example, a consumer can be responsible for generating the PDF invoice while another consumer is responsible for sending a push notification of a completed purchase to the customer's mobile application.
Consumers subscribe to topics and when a message is published to that topic, the message is cloned for each subscriber and added to the consumer's private queue. This method follows the observer pattern paradigm.
If there are no consumers at the time of publishing, the messages are discarded. This behavior can, however, be configured.
One major benefit of this method is that new functionality that depends on a topic can easily be added without having to adjust the existing ones. For example, we can easily add additional functionality for a delivery notice to be sent to the delivery department in the e-commerce example above. All we need to do is subscribe the delivery notification service consumer to the same topic as the PDF and push notification consumers.
Custom routing method
The methods above already cover a lot of the common use cases for routing messages in message queues. However, there are numerous peculiar scenarios encountered when building applications that may require a custom routing setup in brokers, rather than the ones you get out of the box.
Most enterprise brokers support different formats for the custom routing of messages to consumers. Consumers can decide in a flexible and configurable way the type of messages they want to be routed to their queues. For example, RabbitMQ allows the use of bindings to create customized routing rules based on text pattern matching.
This ability allows your system to adapt to new requirements using configuration, rather than adjusting the code base of producers and consumers.
Creating custom routing rules is subject to the type of message broker being used, and further information and implementation details can be found in the broker's documentation.
A better and more in-depth understanding of message queues helps you get more out of the technology and solve more problems in your infrastructure. That is what we have tried to achieve in this post, providing a more detailed analysis of how message queues operate and the different strategies you can adopt to make them work for you. This post is not an end in itself or a standard reference for message queues. Thus, you are encouraged to consult more material to keep updating your knowledge on message queues.