Replicated Queues
AMPS provides a unique approach to replicating queues. This approach is designed to offer high performance in the most common cases, while continuing to provide delivery model guarantees, resilience and failover in the event that one of the replicated instances goes offline.
When a queue is replicated, AMPS replicates the publish
commands to the underlying topic, the sow_delete
commands that contain the acknowledgment messages, and special queue management commands that are internal to AMPS.
Queue Message Ownership
To guarantee that no message is processed more than once, AMPS tracks ownership of the message within the network of replicated instances.
For a distributed queue (that is, a queue defined with the Queue
configuration element), the instance that first receives the publish command from a client owns the message. Although all replicated instances downstream record the publish command in their transaction logs, they do not provide the message to queue subscribers unless that instance owns the message.
For a group local queue (that is, a queue defined with the GroupLocalQueue
tag), the instance specified in the InitialOwner
element for the queue owns a message when the message first enters the queue, regardless of where the message was originally published.
Only one instance can own a message at any given time. Other instances can request that the current owner transfer ownership of a message.
When a message is published, the instance that owns the message depends on the type of queue:
Queue
Instance where the message was published.
GroupLocalQueue
Instance specified in the InitialOwner
tag.
LocalQueue
cannot be replicated
N/A
Each instance owns its copy of the message. The queue is not replicated. Each instance will independently deliver its copy of the message.
To transfer ownership, an instance that does not currently own the message makes a request to the current message owner. The owning instance makes an explicit decision to transfer ownership, and replicates the transfer notification to all instances to which the queue topic is replicated.
The instance that owns a message will always deliver the message to a local subscriber if possible. This means that performance for local subscribers is unaffected by the number of downstream instances. However, this also means that if the local subscribers are keeping up with the message volume being published to the queue, the owning instance will never need to grant a transfer of ownership.
A downstream instance will request ownership transfer if:
The downstream instance has subscriptions for that topic with available backlog, and
The amount of time since the message arrived at the instance is greater than the typical time between the replicated message arriving and the replicated acknowledgment arriving.
Notice that this approach is intended to minimize ungranted transfer requests. In normal circumstances, the typical processing time reflects the speed at which the local processors are consuming messages at a steady state. Downstream instances will only request messages that have been seen to exceed that time, indicating that the processors are not keeping up with the incoming message rate.
The instance that owns the message will grant ownership to a requesting instance if:
The request is the first request received for this message, and
There are no subscribers on the owning instance that can accept the message
When the owning instance grants the request, it logs the transfer in its transaction log and sends the transfer of ownership to all instances that are receiving replicated messages for the queue. When the owning instance does not grant the transfer of ownership, it takes no action.
Notice that your replication topology must be able to replicate acknowledgments to all instances that receive messages for the queue. Otherwise, an instance that does not receive the acknowledgments will not consider the messages to be processed. Replication validation can help to identify topologies that do not meet this requirement.
A barrier message is delivered immediately when there are no unacknowledged messages ahead of the barrier message in the queue on this instance, regardless of which instance owns the message. This means that for a distributed queue or group local queue, every queue that contains the barrier message will deliver the barrier message when all previous messages on that instance have been acknowledged.
Disaster Recovery and Queue Message Ownership
When an instance that contains a queue fails or is shut down, that instance is no longer able to grant ownership requests for the messages that it owns. This means that those messages cannot be delivered to subscribers since the owner cannot transfer ownership.
AMPS provides a way to change the handling of transfer requests so those messages can be delivered. Through the admin console, you can choose to enable_proxied_transfer
, which allows an instance to act as an ownership proxy for an instance that has gone offline. That is, an instance with proxied transfer enabled can choose to process transfer requests itself rather than sending the requests to an instance that is known to be offline.
An instance may claim ownership of a message directly rather than sending a transfer request when:
Proxied transfer is enabled, and
An active subscription could receive the message, and
The current owner is not known to be reachable through replication
An instance may grant a transfer request for a message it does not own when:
Proxied transfer is enabled, and
A transfer request is received for a message currently in the queue, and
The current owner is not known to be reachable through replication
When an instance assumes ownership of a message that instance will write an ownership transfer to the transaction log (which will then be replicated). This means that other instances that are still available will know that this instance has taken ownership.
Use this setting with care
When proxied transfer is enabled it is possible for messages to be delivered twice or for queues to reach an inconsistent state if an instance that currently owns messages in the queue is online and has subscribers to the queue, or if multiple instances enable proxy transfer for the same queue.
Using Proxied Transfer for Disaster Recovery
Proxied transfer is used as a temporary recovery step while an instance that owns messages for a queue is offline and the level of service for the queue requires those messages to be delivered before that instance is recovered.
In this case, the recommended procedure is to:
Choose one of the remaining instances that hosts the queue
Fail over clients with subscriptions to that queue to that instance
Enable proxied transfer for that queue on that instance
Recover the offline instance without allowing client connections to that instance. Notice that, once replication reconnects, that instance will again be available to process transfer requests for messages that it owns.
Once the offline instance is recovered, disable proxied transfer on the queue on the instance where it was enabled.
Enable client connections to the recovered instance.
Notice that, as described earlier, enabling proxied transfer only affects how an instance will handle a transfer request for an offline instance. When this is enabled on a queue, it does not cause the queue to immediately take ownership of messages, nor does it immediately send transfer requests. The setting only affects messages that AMPS is trying to deliver to a subscription, which limits the risk of duplicate delivery to only those messages that will be consumed immediately.
Configuration for Queue Replication
To provide replication for a distributed queue, AMPS requires that the replication configuration meet the following requirements:
Provide bidirectional replication between the instances. In other words, if instance A replicates a queue to instance B, instance B must also replicate that queue to instance A.
If the topic is a queue on one instance, it must be a queue on all replicated instances.
On all replicated instances, the queue must use the same underlying topic definition and filters. For queues that use a regular expression as the topic definition, this means that the regular expression must be the same. For a
GroupLocalQueue
, theInitialOwner
must be the same on all instances that contain the queue.The underlying topics must be replicated to all replicated instances (since this is where the messages for the queue are stored).
Replicated instances must provide passthrough for instances that replicate queues. For example, consider the following replication topology: Instance A in group One replicates a queue to instance B in group Two. Instance B in group Two replicates the queue to instance C in group Three.
For this configuration, instance B must provide passthrough for group Three to instance A, and must also provide passthrough for group One to instance C. The reason for this is to ensure that ownership transfer and acknowledgment messages can reach all instances that maintain a copy of the queue.
Likewise, consider a topology where Instance X in GroupOne replicates a queue to Instance Y in GroupOne. Instance X must provide passthrough for GroupOne, since any incoming replication messages for the queue (for example, from Instance Z) that arrive at Instance X must be guaranteed to reach Instance Y. Otherwise, it would be possible for the queue on Instance Y to have different messages than Instance X and Instance Z if Instance Z does not replicate to Instance Y (or if the network connection between Instance Z and Instance Y fails).
Notice that the requirements above apply only to queue topics. If the underlying topic uses a different name than the queue topic, it is possible to replicate the messages from the underlying topic without replicating the queue itself. This approach can be convenient for simply recording and storing the messages provided to the queue on an archival or auditing instance. When only the underlying topic (or topics) are replicated, the requirements above do not apply, since AMPS does not provide queuing behavior for the underlying topics.
A queue defined with LocalQueue
cannot be replicated. The data from the underlying topics for the queue can be replicated without special restrictions. The queue topic itself, however, cannot be replicated. AMPS reports an error if any LocalQueue
topic is replicated.
Last updated