Replicated Queues

AMPS provides a unique approach to replicating queues. This approach is designed to offer high performance in the most common cases, while continuing to provide delivery model guarantees, resilience and failover in the event that one of the replicated instances goes offline.

When a queue is replicated, AMPS replicates the publish commands to the underlying topic, the sow_delete commands that contain the acknowledgment messages, and special queue management commands that are internal to AMPS.

Queue Message Ownership

To guarantee that no message is processed more than once, AMPS tracks ownership of the message within the network of replicated instances.

For a distributed queue (that is, a queue defined with the Queue configuration element), the instance that first receives the publish command from a client owns the message. Although all replicated instances downstream record the publish command in their transaction logs, they do not provide the message to queue subscribers unless that instance owns the message.

For a group local queue (that is, a queue defined with the GroupLocalQueue tag), the instance specified in the InitialOwner element for the queue owns a message when the message first enters the queue, regardless of where the message was originally published.

Only one instance can own a message at any given time. Other instances can request that the current owner transfer ownership of a message.

To transfer ownership, an instance that does not currently own the message makes a request to the current message owner. The owning instance makes an explicit decision to transfer ownership, and replicates the transfer notification to all instances to which the queue topic is replicated.

The instance that owns a message will always deliver the message to a local subscriber if possible. This means that performance for local subscribers is unaffected by the number of downstream instances. However, this also means that if the local subscribers are keeping up with the message volume being published to the queue, the owning instance will never need to grant a transfer of ownership.

A downstream instance will request ownership transfer if:

The downstream instance has subscriptions for that topic with available backlog, and
The amount of time since the message arrived at the instance is greater than the typical time between the replicated message arriving and the replicated acknowledgment arriving.

Notice that this approach is intended to minimize ungranted transfer requests. In normal circumstances, the typical processing time reflects the speed at which the local processors are consuming messages at a steady state. Downstream instances will only request messages that have been seen to exceed that time, indicating that the processors are not keeping up with the incoming message rate.

The instance that owns the message will grant ownership to a requesting instance if:

The request is the first request received for this message, and
There are no subscribers on the owning instance that can accept the message

When the owning instance grants the request, it logs the transfer in its transaction log and sends the transfer of ownership to all instances that are receiving replicated messages for the queue. When the owning instance does not grant the transfer of ownership, it takes no action.

Notice that your replication topology must be able to replicate acknowledgments to all instances that receive messages for the queue. Otherwise, an instance that does not receive the acknowledgments will not consider the messages to be processed. Replication validation can help to identify topologies that do not meet this requirement.

A barrier message is delivered immediately when there are no unacknowledged messages ahead of the barrier message in the queue on this instance, regardless of which instance owns the message. This means that for a distributed queue or group local queue, every queue that contains the barrier message will deliver the barrier message when all previous messages on that instance have been acknowledged.

Failover and Queue Message Ownership

When an instance that contains a queue fails or is shut down, that instance is no longer able to grant ownership requests for the messages that it owns. By default, those messages become unavailable for delivery, since there is no longer a central coordination point at which to ensure that the messages are only delivered once.

AMPS provides a way to make those messages available. Through the admin console, you can choose to enable_proxied_transfer, which allows an instance to act as an ownership proxy for an instance that has gone offline. In this mode, the local instance can assume ownership of messages that are owned by an offline instance.

Use this setting with care: when active, it is possible for messages to be delivered twice if the instance that had previously owned the message comes back online, or if multiple instances have proxied transfer enabled for the same queue.

In general, you enable_proxied_transfer as a temporary recovery step while one of the instances is offline, and then disable proxied transfer when the instance comes back online, or when all of the messages owned by that instance have been processed.

Configuration for Queue Replication

To provide replication for a distributed queue, AMPS requires that the replication configuration meet the following requirements:

Provide bidirectional replication between the instances. In other words, if instance A replicates a queue to instance B, instance B must also replicate that queue to instance A.
If the topic is a queue on one instance, it must be a queue on all replicated instances.
On all replicated instances, the queue must use the same underlying topic definition and filters. For queues that use a regular expression as the topic definition, this means that the regular expression must be the same. For a GroupLocalQueue, the InitialOwner must be the same on all instances that contain the queue.
The underlying topics must be replicated to all replicated instances (since this is where the messages for the queue are stored).
Replicated instances must provide passthrough for instances that replicate queues. For example, consider the following replication topology: Instance A in group One replicates a queue to instance B in group Two. Instance B in group Two replicates the queue to instance C in group Three.
For this configuration, instance B must provide passthrough for group Three to instance A, and must also provide passthrough for group One to instance C. The reason for this is to ensure that ownership transfer and acknowledgment messages can reach all instances that maintain a copy of the queue.
Likewise, consider a topology where Instance X in GroupOne replicates a queue to Instance Y in GroupOne. Instance X must provide passthrough for GroupOne, since any incoming replication messages for the queue (for example, from Instance Z) that arrive at Instance X must be guaranteed to reach Instance Y. Otherwise, it would be possible for the queue on Instance Y to have different messages than Instance X and Instance Z if Instance Z does not replicate to Instance Y (or if the network connection between Instance Z and Instance Y fails).

Notice that the requirements above apply only to queue topics. If the underlying topic uses a different name than the queue topic, it is possible to replicate the messages from the underlying topic without replicating the queue itself. This approach can be convenient for simply recording and storing the messages provided to the queue on an archival or auditing instance. When only the underlying topic (or topics) are replicated, the requirements above do not apply, since AMPS does not provide queuing behavior for the underlying topics.

A queue defined with LocalQueue cannot be replicated. The data from the underlying topics for the queue can be replicated without special restrictions. The queue topic itself, however, cannot be replicated. AMPS reports an error if any LocalQueue topic is replicated.

PreviousUnderstanding Replication Message Routing NextReplication Best Practices

Last updated 5 days ago