Configuring Topics in a SOW

This section outlines the configuration options for recording a Topic in the State of the World (SOW).

All SOW topics require a basic definition of the messages to be recorded.

Described below are the required configuration items for a Topic in a SOW. Expand each item for more details.

Name (required)

The name of the SOW topic.

By default, unique messages published to this topic will be stored in a topic-specific SOW database.

Every SOW requires a method of determining which messages are unique. Several methods are provided within AMPS.

See the Understanding SOW Keys section for information on generating SOW Keys, and the following sections for relevant configuration items.

If no Name is provided, AMPS accepts Topic as a synonym for Name to provide compatibility with versions of AMPS previous to 5.0.

Notice that if the topic uses a Pattern tag (described below) to record multiple logical topics in a single physical topic, the Name defines the physical topic only.

MessageType (required)

The type of messages to be stored.

To use AMPS generated SOW keys, the message type specified must support content filtering so that AMPS can determine the SOW key for the message. All of the default message types, except binary, support content filtering. Since the binary message type does not support content filtering, that type can only be used for a SOW when publishers use explicit keys.

See the Message Types section for a discussion of the message types that AMPS loads by default. Some message types (such as Google Protocol Buffers) require additional configuration, and must be configured before using the message type in a SOW topic.

Below is an example of a simple SOW topic configuration:

<SOW>
    <Topic>
        <Name>orders</Name>
        <Key>/orderId</Key>
        <MessageType>nvfix</MessageType>
        <FileName>./sow/%n.sow</FileName>
    </Topic>
</SOW>

Required Considerations

A SOW Topic must also consider the options outlined in the sections below.

Storage and Recovery

A SOW topic must either specify the FileName to be used for persisting the topic, or declare that the topic will be in-memory only by setting the Durability to transient.

Described below are the configuration options for specifying the storage and recovery behavior for the topic, if the topic is not durably persisted or if the recovery file is not present. Expand each item for more details.

FileName (required if Durability is persistent)

The file where the SOW data will be stored.

This element is required for SOW topics with a Durability of persistent (the default) because those topics are persisted to the filesystem.

This is not required for SOW topics with a Durability of transient.

This element should contain the path and the file name. The path can be either an absolute path or a path relative to the current working directory of the AMPS process.

Within this element, the escape %n will be replaced with the Name and MessageType of the topic. This can be a convenient way to avoid having to retype the topic name in this element.

Two different topics must not share the same file. Two instances of AMPS must not share the same file.

Durability

Defines the data durability of a SOW topic.

SOW databases specified as persistent are stored to the file system and retain their data across instance restarts. Those specified as transient are not persisted to the file system and are recreated each time the AMPS instance restarts.

Notice that when the Durability is transient, and the topic is recorded in the transaction log, each time AMPS starts, AMPS will recover the state of the topic from the transaction log.

The recovery begins at the RecoveryPoint specified for the topic, which defaults to epoch, the beginning of the transaction log.

(For persistent topics, AMPS recovers from the last message written to the SOW topic, or from the RecoveryPoint if the SOW file is removed.)

Valid values: persistent or transient

When a value of persistent is specified, the FileName element must be present.

Synonyms: Duration is also accepted for this parameter for backward compatibility with configuration prior to 4.0.0.1.

Default: persistent

RecoveryPoint

For SOW topics that are covered by the transaction log, the point from which to recover the SOW if the SOW file is removed, or if the SOW topic has transient duration.

This configuration item allows two values:

  • epoch - Recovers the SOW from the beginning of the transaction log.

  • now - Recovers the SOW from the current point in the transaction log.

Default: epoch

Expiration

Defines the length of time a record should remain in the SOW database for this topic.

The expiration time is stored on each message, so changing the expiration time in the configuration file will not affect the expiration of messages currently in the SOW.

AMPS accepts interval values for Expiration, using the interval format described at the start of this guide in the section on Units, or one of the following special values:

  • A value of disabled specifies that AMPS will not process SOW expiration for this topic. In this case, AMPS saves any expiration value set on a message by the publisher, but does not process expiration.

    This value must be set to disabled (the default) if History is enabled for this topic.

  • A value of enabled specifies that AMPS will process SOW expiration for this topic, with no expiration set by default. Instead, AMPS uses the value set on the individual messages (with no expiration set for messages that do not contain an expiration value).

Expiration must be disabled if History is enabled.

Default: disabled (messages never expire)

Record Identity Definition

Each SOW topic must define how AMPS will determine which messages are unique. Typically, record identity is based on the content of one or more fields within the message.

An application can either have AMPS determine the key by specifying one or more Key fields or provide a SOW key with the publish command each time a message is published to AMPS. AMPS also provides the ability to provide a custom SowKey generator with a plugin module.

See the Understanding SOW Keys section for a full discussion.

Described below are the configuration options for specifying how AMPS determines the SowKey for a message. Expand each item for more details.

Key

Specifies an XPath-based identifier within each message that AMPS will use to generate a SOW key, which determines whether a message is unique. This element can be specified multiple times to create a composite key from the combined value of the specified Key elements.

When one or more Key elements is specified for the SOW, AMPS generates the SOW key for each message. When no Key fields are specified and no KeyGenerator is specified, publishers must explicitly provide the SOW key for each message when the message is published.

60East recommends configuring a Key and having AMPS generate the SOW key for a message unless your application has specific needs that make this impractical.

AMPS automatically creates a hash index for the set of fields specified in the Key elements.

There is no default for this element.

KeyDomain

The seed value for SowKeys used within the topic when AMPS generates the SOW key. The default is the topic name, but it can be changed to a string value to unify SowKey values between different topics.

For example, if your application has a ShippingAddress SOW and a CreditRating SOW that both use /customerID as the SOW key, you can use a KeyDomain to ensure that the generated SowKey for a given /customerId is identical for both SOW topics. This does not affect how AMPS processes the SOW topics, but can make correlating information from different SOW topics easier in your application.

This option can only be specified when one or more Key fields are specified. When a SOW key generator module is used, or the publisher must send a SOW key, this option is not valid.

Default: Name of the SOW topic.

KeyGenerator

Specifies the SOW key generator module to use for this topic. When this configuration element is present, AMPS calls the specified module to generate a SOW key for each incoming message.

A KeyGenerator element contains the following elements:

  • Module (required within a KeyGenerator element) - The name of the module. This module must be loaded elsewhere in the configuration file.

  • Options - Contains one or more XML elements. These elements are provided to the key generator module as options. The options provided depend on the key generator. The creator of the key generator module must document the options for that module.

Default: Unset (no SOW key generator module). When there is no SOW key generator module specified, AMPS uses the specified Key fields if the Key fields are provided. If no generator is specified and no Key fields are specified, AMPS requires publishers to set a SOW key on each message published.

Memory and File Growth

If the SOW topic will contain a large number of records, or if an individual record will exceed the default allocation size, the topic must define the allocation size.

The SOW topic configuration also specifies how the SOW file is allowed to grow. See SOW Parameters in the Operations Best Practices section for detailed recommendations.

Described below are the configuration options for controlling how the file is allocated and how the file grows. Expand each item for more details.

SlabSize

The size of each allocation for the SOW file, as a number of bytes. When AMPS needs more space for the SOW, it requests this amount of space from the operating system. This effectively sets the maximum message size that AMPS guarantees can be stored in the SOW. This size includes headers set by AMPS on the message.

60East recommends setting this value only if you will be storing messages larger than the default SlabSize or if performance or capacity testing indicates a need to tune SOW performance. If you plan to store messages larger than the default setting, 60East recommends a starting value of several times the maximum message size. For example, if your maximum message size is 2MB, a good starting point for SlabSize would be 8MB.

If it becomes necessary to tune the SlabSize, see SOW Parameters for a full discussion about tuning this setting.

Default: 5MB

Maximum: 1GB

InitialSlabCount

The number of SOW slabs that AMPS will allocate on startup.

Default: 1

Maximum: 1024

Optional Considerations

A SOW Topic may also provide the following options (notice, though, that there are restrictions on how some of these options are used with other options).

Indexing Options

A SOW topic can declare additional hash indexes or direct AMPS to create memo indexes before a field is queried by an application to improve performance. AMPS automatically creates a memo index for a field within a SOW topic when that field is used (for example, is used in a view or is queried by an application).

In addition, AMPS automatically creates a hash index (the primary key index) for the combination of fields used to define the SOW key. Indexing is described in more detail in the Indexing SOW Topics section.

Described below are the configuration options that allow you to manage index creation for a SOW topic. Expand each item for more details.

HashIndex

AMPS provides the ability to do fast lookup for SOW records based on specific fields.

When one or more HashIndex elements are provided, AMPS creates a hash index for the fields specified in the element. These indexes are created on startup and are kept up to date as records are added, removed, and updated.

The HashIndex element contains a Key element for each field in the hash index.

AMPS uses a hash index when a query uses an exact string match for all of the fields in the index. AMPS does not use hash indexes for range queries or regular expressions.

AMPS automatically creates a hash index for the set of fields specified in the set of Key fields for the SOW, if those fields are specified.

Index

AMPS automatically creates memo index fields as needed. This can include the first time a particular field is used in a query. AMPS supports the ability to create memo indexes for specific fields during startup using the Index configuration option.

When one or more Index elements are provided, AMPS creates memo indexes for any field specified in an Index element on startup, prior to executing a query that uses that field.

Otherwise, AMPS indexes each field the first time a query uses the field. Adding one or more Index configurations to a SOW/Topic can improve retrieval performance the first time a query that contains the indexed fields runs for large SOW topics.

ExpectedKeyCountHint

For SOW topics that will contain a large number of distinct keys, providing an expected key count allows AMPS to pre-size the data structure that holds the key. This can provide a performance improvement for publishers by avoiding cases where AMPS has to resize the data structure.

On startup, AMPS will size the internal data structures to hold the number of keys provided. AMPS does this by presizing the structure to hold a number of keys that is a power of 2 equal to or greater than the hint provided. This hint does not limit the number of keys in the topic. This hint sets the number of keys that the topic can hold without resizing the data structure.

There is no default for this value. When no value is provided, AMPS does not pre-size data structures for the SOW.

Below is an example of a SOW with hash indexes.

<SOW>
    <Topic>
        <Name>customers</Name>
        <Key>/customerId</Key>
        <MessageType>json</MessageType>
        <FileName>./sow/%n.sow</FileName>
        <HashIndex>
            <Key>/customerName</Key>
        </HashIndex>
        <HashIndex>
            <Key>/zipCode</Key>
            <Key>/customerType</Key>
        </HashIndex>
    </Topic>
    </SOW>

Historical Query

A SOW topic can keep message state to allow "point in time" historical queries of current values. Notice that this option is not required for message-by-message replay; recording the topic in a transaction log provides full replay. Instead, this option provides the ability to determine what the current value was for a message at a particular point in time (even if that value was set, and remained unchanged, long before the point in time that is being queried).

A SOW topic can, optionally, maintain the ability to query current values at a specific point in time. To specify this, include a History element in the topic configuration.

Described below are the configuration options the History element must include. Expand each item for more details.

Window (required if History is present)

For a historical SOW, the length of time to store history.

For example, when the value is 1w, AMPS will store one week of history for this SOW.

Used within the History element.

Granularity (required if History is present)

For a historical SOW, the granularity of the history to store.

For many applications, it is not necessary for AMPS to store all of the updates to the SOW. This parameter sets the resolution at which AMPS will save the state of a message. A value of 0s or equivalent specifies that AMPS will preserve every update within the Window.

For example, when you set a granularity of 1m, AMPS will save the state of the message no more frequently than once per minute, even when the state of the message is updated several times a minute.

Used within the History element.

Below is an example of a historical SOW configuration that will store 7 days of history from the catalog topic, with the state of the messages being saved every 15 minutes.

<SOW>
    <Topic>
        <Name>catalog</Name>
        <Key>/sku</Key>
        <MessageType>json</MessageType>
        <FileName>./sow/%n.sow</FileName>
        <History>
            <Window>7d</Window>
            <Granularity>15m</Granularity>
        </History>
    </Topic>
</SOW>

Message Enrichment

AMPS can modify a message’s content as it is published to a SOW topic from an application. See the State of the World Message Enrichment section for details.

Described below are the configuration options that allow you to perform message enrichment. Expand each item for more details.

Preprocessing

When present, specifies the message enrichment to be performed before AMPS determines the SOW key for the message.

The Preprocessing element must contain one or more Field elements that specify the enrichment to perform.

Enrichment

When present, specifies the message enrichment to be performed after AMPS determines the SOW key for the message.

The Enrichment element must contain one or more Field elements that specify the enrichment to perform.

Below is an example of a SOW with enrichment. This configuration adds a /fullName field that is constructed from the /firstName and /lastName fields.

<SOW>
    <Topic>
        <Name>sales-reps</Name>
        <Key>/employeeId</Key>
        <MessageType>bflat</MessageType>
        <Enrichment>
            <Field>CONCAT(/firstName, " ", /lastName) AS /fullName</Field>
        </Enrichment>
    </Topic>
</SOW>

Multiple Logical Topics in One Physical Topic

A SOW topic can be declared as a regular expression topic, where multiple topic names use the same definition and are stored in the same physical file. This can be useful when converting a system from topic-based routing, or any situation where a set of topics use the same message structure, the same method for determining the key, and each topic has a relatively small set of messages.

AMPS can store the last values for a set of topics that match the same naming pattern and use the same configuration in a single set of SOW data structure and physical SOW file. See Storing Multiple Logical Topics in One Physical Topic for details.

Described below is the configuration option used to define a set of related SOW topics. Expand the item for more details.

Pattern

When present, declares that this topic will record multiple logical topics into one physical data structure and file, and specifies the pattern to use to determine if the topic that a message is published to will be captured in this topic.

Physical topics that include multiple logical topics have the benefits and limitations described in the Storing Multiple Logical Topics in One Physical Topic section of the AMPS User Guide.

When this element is present, the Topic cannot specify History, Preprocessing, or Enrichment.

There is no default for this element.

Legacy Protocols Note: The legacy header formats do not include support for subscribing to or querying from topics that use this element.

Last updated