Replication Configuration Validation

Replication configuration validation helps to ensure that any configuration that could result in message loss or inconsistent message contents between two instances of AMPS is explicitly designed into the replication topology and not the result of accidental misconfiguration.

Replication can involve coordinating configuration among a large number of AMPS instances. It can sometimes be difficult to ensure that all of the instances are configured correctly, and to ensure that a configuration change for one instance is also made at the replication destinations. For example, if a high-availability pair replicates the topics ORDERS, INVENTORY, and CUSTOMERS to a downstream disaster recovery site, but the disaster recovery site only replicates ORDERS and INVENTORY back to the high-availability pair, disaster recovery may not occur as planned. Likewise, if only one member of the HA pair replicates ORDERS to the other member of the pair, the two instances will contain different messages, which could cause problems for the system.

Starting in the 5.0 release, AMPS automatic replication configuration validation makes it easier to keep configuration items consistent across a replication fabric.

Replication configuration validation happens when a replication connection is made between two instances. The validation compares the configuration of those two instances. By default, any difference in configuration that could result in message loss, different behavior between the source instance and the destination instance, or different replication guarantees between the source instance and the destination instance is reported as an error.

When replication validation reports an error, the reason for the error is logged to the event log on the instance that detects the problem, and the connection is closed. To troubleshoot the issue, it is typically necessary to check the logs and configuration on both instances.

Automatic configuration validation is enabled for all elements of the replication configuration by default. You can turn off validation for specific elements of the configuration, as described below.

AMPS replication uses a leaderless, "all nodes hot" model. This means that no single AMPS instance has a view of the entire replication fabric, and a single AMPS instance will always assume that there are instances in the replication fabric that it is not aware of. The replication validation rules are designed with this assumption. The advantage of this assumption is that if instances are added to the replication fabric, it is typically only necessary to change configuration on the instances that they directly communicate with for replication to function as expected. The tradeoff, however, is that it is sometimes necessary to configure an instance as though it were part of a larger fabric (or exclude a validation rule) even in a case where the instance is part of a much simpler replication design.

Each Topic in a replication Destination can configure a unique set of validation checks. By default, all of the checks apply to all topics in the Destination.

When troubleshooting a configuration validation error, it is important to look at the AMPS logs on both sides of the connection. Typically, the AMPS instance that detects the error will log complete information on the part of validation that failed and the changes required for the connection to succeed, while the other side of the connection will simply note that the connection failed validation. This means that if a validation error is reported on one instance, but details are not present, the other side of the connection detected the error and will have logged relevant details.

Excluding a validation check directs AMPS to make a replication connection that could result in inconsistent state or data loss. Use caution when excluding validation checks. See the table below for details on each validation check.

The table below lists aspects of replication that AMPS validates. By default, replication validation treats the downstream instance as though it is intended to be a full highly-available failover partner for any topic that is replicated. For situations where that is not the case, many validation rules can be excluded. For example, if the downstream instance is a view server that does not accept publishes and, therefore, is not configured to replicate a particular topic back to this instance, the replicate validation check might need to be excluded.

Removing validation checks should be done with caution. Removing a validation check states that this configuration is intended to create instances that may differ in contents.

AMPS performs the following validation checks. In this discussion "this instance" refers to the instance sending messages via replication and the "downstream instance" refers to the instance receiving messages via replication.

Check
Validates

txlog

The topic is contained in the transaction log of the downstream instance

replicate

The topic is replicated from the downstream instance back to this instance.

sow

If the topic is a SOW topic in this instance, it must also be a SOW topic in the downstream instance.

cascade

The downstream instance must enforce the same set of validation checks for this Topic as this instance does. When relaxing validation rules for a Topic that the downstream instance itself replicates, adding an exclusion for cascade is often necessary as well.

queue

If the topic is a queue in this instance, it must also be a queue in the downstream instance. This option cannot be excluded.

keys

If the topic is a SOW topic in this instance, it must also be a SOW topic in the downstream instance and the SOW in the downstream instance must use the same Key definitions.

replicate_filter

If the topic uses a replication filter, the downstream instance must use the same replication filter for replication back to this instance.

queue_passthrough

If the topic is a queue in this instance, the downstream instance must support passthrough from this group.

queue_underlying

If the topic is a queue in this instance, it must use the same underlying topic definition and filters in the downstream instance. This option cannot be excluded.

By default, all of these checks are applied.

txlog validation check

Validates that the topic is contained in the transaction log of the downstream instance.

An error on this validation check indicates that the instance is replicating a topic that is not in the transaction log on the downstream instance. This means that the downstream instance is not persisting messages in a way that can be used for replication, replay, or used as the basis for a queue.

replicate validation check

Validates that the topic is replicated from the downstream instance back to this instance.

An error on this validation check indicates that this instance is replicating a topic to the downstream instance that is not being replicated back to this instance. This means that any publishes or updates to the topic on the downstream instance are not replicated back to this instance. If this is intentional (for example, replicating messages to a read-only view server), the upstream instance can exclude this validation check.

sow validation check

Validates that if the topic is a SOW/Topic on this instance, it must also be a SOW/Topic on the downstream instance.

An error on this validation check indicates that this instance is replicating a topic to the downstream instance that is a SOW/Topic on this instance but is not a SOW/Topic on the downstream instance. This means that the topic has different behavior on the downstream instance, and does not maintain the current value of records in the topic in the SOW.

cascade validation check

Validates that the downstream instance must enforce the same set of validation checks for this Topic that this instance does.

When relaxing validation rules for a Topic that the downstream instance itself replicates, adding a cascade exclusion for this instance is usually necessary as well.

An error on this validation check indicates that this instance enforces a validation check for a Topic that the downstream instance does not enforce when that instance replicates the Topic. These validation checks include the cascade validation check itself.

To understand the impact of this validation check, consider the validation checks that the downstream instance enforces. If the downstream instance enforces validation checks that are correct for the intended behavior, this instance can exclude the cascade check. For example, if the downstream instance replicates to a view server that does not replicate back, the downstream instance may exclude the replicate check, and this instance would need to exclude the cascade check to indicate that replicating to an instance that excludes checks is intentional.

Replication topologies that intentionally have asymmetrical replication typically require this exclusion (for example, replication to a read-only view server as mentioned earlier).

This exclusion can also become necessary as part of a rolling update where topics are being added or changed, even if the final state of the replication fabric would not require this exclusion. In this case, it is typically necessary to leave the exclusion in place until all instances can be taken offline at the same time (so the cascade exclusion can be removed from all of the configurations at once). If the cascade validation check is the only check that is excluded throughout a replication fabric, the topology can be considered to fully meet validation rules.

queue validation check

This is a mandatory validation check, and cannot be excluded.

A distributed queue (defined with the SOW/Queue or SOW/GroupLocalQueue tags) will not function correctly if one of the instances it is replicated to does not define the topic as a queue. An error in this validation check means that the queue will not function correctly, and the appropriate queue definition must be added to the downstream instance.

keys validation check

Validates that if the topic is a SOW/Topic on this instance, it must also be a SOW/Topic on the downstream instance and the SOW/Topic on the downstream instance must use the same Key definitions.

An error on this validation check indicates that the instance is replicating a topic to the downstream instance that is a SOW/Topic on both instances, but that the definition of message identity (the Key configuration for the topic) does not match on the two instances. This means that the contents of this topic may be different on these two instances for the same set of messages published.

replicate_filter check

Validates that if a replication filter is used, the downstream instance must use the same replication filter for replication back to this instance.

An error on this validation check indicates that this instance uses a replication filter for a topic that the downstream instance does not use when it replicates the topic. The result is that, for a given set of messages, the downstream instance may replicate a different set of messages than the messages the downstream instance received. This would produce inconsistent data across a set of messages.

In some cases (for example, partitioning a global stream of messages into particular regions), this is the intended result.

queue_passthrough check

Validates that if the topic is a queue in this instance, the downstream instance must support passthrough from this group to its replication destinations.

An error on this validation check indicates that this instance does not pass through messages for one or more groups that the queue is replicated from. This could lead to a situation where a queue message is undeliverable if a network connection is unavailable or if additional instances are added to the set of instances that contain the queue.

queue_underlying check

Validates that if a topic is a queue in this instance, it must use the same underlying topic definition and filters in the downstream instance.

This is a mandatory validation check, and cannot be excluded.

A distributed queue (defined with the SOW/Queue or SOW/GroupLocalQueue tags) will not function correctly if one of the instances it is replicated to does not contain the same messages as the other instances that host the queue. An error in this validation check means that the queue definitions will not contain the same messages, so the queue will not function correctly. The underlying topics in the queue definition must be identical on this instance and the downstream instance.

Example

The sample below shows how to exclude validation checks for a replication destination. In this sample, the Topic does not require the downstream destination to replicate back to this instance, and does not require that the downstream destination enforce the same configuration checks for any downstream replication of this topic.

<Destination>
    ...
    <Topic>
        <MessageType>json</MessageType>
        <Name>MyStuff-VIEW</Name>
        <ExcludeValidation>replicate,cascade</ExcludeValidation>
    </Topic>
    ...
</Destination>

Last updated

Copyright 2013-2024 60East Technologies, Inc.