Replication Configuration Validation

Replication configuration validation helps to ensure that any configuration that could result in message loss or inconsistent message contents between two instances of AMPS is explicitly designed into the replication topology and not the result of accidental misconfiguration.

Replication can involve coordinating configuration among a large number of AMPS instances. It can sometimes be difficult to ensure that all of the instances are configured correctly, and to ensure that a configuration change for one instance is also made at the replication destinations. For example, if a high-availability pair replicates the topics ORDERS, INVENTORY, and CUSTOMERS to a remote disaster recovery site, but the disaster recovery site only replicates ORDERS and INVENTORY back to the high-availability pair, disaster recovery may not occur as planned. Likewise, if only one member of the HA pair replicates ORDERS to the other member of the pair, the two instances will contain different messages, which could cause problems for the system.

Starting in the 5.0 release, AMPS automatic replication configuration validation makes it easier to keep configuration items consistent across a replication fabric.

Replication configuration validation happens when a replication connection is made between two instances. The validation compares the configuration of those two instances. By default, any difference in configuration that could result in message loss, different behavior between the source instance and the destination instance, or different replication guarantees between the source instance and the destination instance is reported as an error.

When replication validation reports an error, the reason for the error is logged to the event log on the instance that detects the problem, and the connection is closed. To troubleshoot the issue, it is typically necessary to check the logs and configuration on both instances.

Automatic configuration validation is enabled for all elements of the replication configuration by default. You can turn off validation for specific elements of the configuration, as described below.

AMPS replication uses a leaderless, "all nodes hot" model. This means that no single AMPS instance has a view of the entire replication fabric, and a single AMPS instance will always assume that there are instances in the replication fabric that it is not aware of. The replication validation rules are designed with this assumption. The advantage of this assumption is that if instances are added to the replication fabric, it is typically only necessary to change configuration on the instances that they directly communicate with for replication to function as expected. The tradeoff, however, is that it is sometimes necessary to configure an instance as though it were part of a larger fabric (or exclude a validation rule) even in a case where the instance is part of a much simpler replication design.

Each Topic in a replication Destination can configure a unique set of validation checks. By default, all of the checks apply to all topics in the Destination.

When troubleshooting a configuration validation error, it is important to look at the AMPS logs on both sides of the connection. Typically, the AMPS instance that detects the error will log complete information on the part of validation that failed and the changes required for the connection to succeed, while the other side of the connection will simply note that the connection failed validation. This means that if a validation error is reported on one instance, but details are not present, the other side of the connection detected the error and will have logged relevant details.

Excluding a validation check directs AMPS to make a replication connection that could result in inconsistent state or data loss. Use caution when excluding validation checks. See the table below for details on each validation check.

The table below lists aspects of replication that AMPS validates. By default, replication validation treats the downstream instance as though it is intended to be a full highly-available failover partner for any topic that is replicated. For situations where that is not the case, many validation rules can be excluded. For example, if the downstream instance is a view server that does not accept publishes and, therefore, is not configured to replicate a particular topic back to this instance, the replicate validation check might need to be excluded.

Removing validation checks should be done with caution. Removing a validation check states that this configuration is intended to create instances that may differ in contents.

AMPS performs the following validation checks:

CheckValidates

txlog

The topic is contained in the transaction log of the remote instance.

An error on this validation check indicates that this instance is replicating a topic that is not in the transaction log on the downstream instance. This means that the downstream instance is not persisting the messages in a way that can be used for replication, replay, or used as the basis for a queue.

replicate

The topic is replicated from the remote instance back to this instance.

An error on this validation check indicates that this instance is replicating a topic to the downstream instance that is not being replicated back to this instance. This means that any publishes or updates to the topic on the downstream instance are not replicated back to this instance.

sow

If the topic is a SOW topic in this instance, it must also be a SOW topic in the remote instance.

An error on this validation check indicates that this instance is replicating a topic to the downstream instance that is a SOW/Topic on this instance but is not a SOW/Topic on the downstream instance. This means that the topic has different behavior on the downstream instance, and does not maintain the current value of records in the topic in the SOW.

cascade

The remote instance must enforce the same set of validation checks for this topic as this instance does.

When relaxing validation rules for a topic that the downstream instance itself replicates, adding an exclusion for cascade is often necessary as well.

An error on this validation check indicates that this instance enforces a validation check for a topic that the downstream instance does not enforce when that instance replicates the topic.

To understand the impact of this validation check, consider the validation checks that the downstream instance enforces. If the downstream instance enforces the appropriate validation checks, this instance can exclude the cascade check.

It is sometimes necessary to exclude this check as part of a rolling upgrade, and then to leave this exclusion in place until all instances can be taken offline at the same time. If the cascade check is the only check being excluded on any instance, the topology can be considered to meet validation rules (and the cascade exclusion can be safely removed during a maintenance window when all of the instances can be updated simultaneously).

queue

If the topic is a queue in this instance, it must also be a queue in the remote instance.

A distributed queue will not function correctly if one of the instances it is replicated to does not define the topic as a queue.

This option cannot be excluded.

keys

If the topic is a SOW topic in this instance, it must also be a SOW topic in the remote instance and the SOW in the remote instance must use the same Key definitions.

An error on this validation check indicates that this instance is replicating a topic to the downstream instance that is a SOW/Topic on both instances, but that the definition of message identity (the Key configuration for the topic) does not match on the two instances. This means that the contents of this topic may be different on these two instances for the same set of messages published.

replicate_filter

If this topic uses a replication filter, the remote instance must use the same replication filter for replication back to this instance.

An error on this validation check indicates that this instance uses a replication filter for a topic that the downstream instance does not use when it replicates the topic. This means that, given the same set of publishes, the downstream instance may replicate a different set of messages than are replicated to that instance. This would produce inconsistent data across the set of replicated instances.

queue_passthrough

If the topic is a queue in this instance, the remote instance must support passthrough from this group.

An error on this validation check indicates that this instance does not pass through messages for one or more groups that the queue is replicated from. This could lead to a situation where a queue message is undeliverable if a network connection is unavailable or if additional instances are added to the set of instances that contain the queue.

queue_underlying

If the topic is a queue in this instance, it must use the same underlying topic definition and filters in the remote instance.

This option cannot be excluded.

By default, all of these checks are applied.

The sample below shows how to exclude validation checks for a replication destination. In this sample, the Topic does not require the remote destination to replicate back to this instance, and does not require that the remote destination enforce the same configuration checks for any downstream replication of this topic.

<Destination>
    ...
    <Topic>
        <MessageType>json</MessageType>
        <Name>MyStuff-VIEW</Name>
        <ExcludeValidation>replicate,cascade</ExcludeValidation>
    </Topic>
    ...
</Destination>

Last updated

Copyright 2013-2024 60East Technologies, Inc.