Defining Views and Aggregations

Multiple topic aggregation creates a view using more than one topic as a data source. This allows you to enrich messages as they are processed by AMPS, enabling aggregate calculations using information published to more than one topic. You can combine messages from multiple topics and use filtered subscriptions to determine which messages are of interest. For example, you can set up a topic that contains orders from high-priority customers.

You can join topics of different message types, and you can project messages of a different type than the underlying topic.

To create an aggregate using multiple topics, each topic needs to maintain a SOW. Since views maintain an underlying SOW, you can create views from views.

To define an aggregate, you decide:

The topic, or topics, that contain the source for the aggregation
If the aggregation uses more than one topic, how those topics relate to each other
What messages to publish, or project, from the aggregation
How to group messages for aggregation
The message type of the aggregation

Message types provided with AMPS fully support views, with the following exceptions:

binary message types cannot be the underlying topic for a view or the type of a view.
protobuf message types can be the underlying topic for a view, but cannot be the type of a view.
composite-global message types can be the underlying topic for a view, but cannot be the type of a view.
struct message types can be the underlying topic for a view, but cannot be the type of a view.

If you are using a custom message type, check with the message type developer as to whether that message type supports aggregation.

Single Topic Aggregation: UnderlyingTopic

For aggregations based on a single topic, use the UnderlyingTopic element to tell AMPS which topic to use. All messages from the UnderlyingTopic will appear in the aggregation.

<UnderlyingTopic>MyOriginalTopic</UnderlyingTopic>

Multiple Topic Aggregation: Join

Join definitions tell AMPS how to relate underlying topics to each other. You use a separate Join element for each relationship in the view. Most often, the Join definition describes a relationship between topics:

[topic].[field]=[topic].[field]

The topics specified must be previously defined in the AMPS configuration file. The square brackets [] are optional. If they are omitted, AMPS uses the first / in the expression as the start of the field definition. You can use any number of Join expressions to define a multiple topic aggregation.

A Join definition is an equality comparison between the values of two fields. The Join definition is not evaluated as an AMPS expression, so functions, operators (other than =) and so forth are not evaluated in these definitions.

Within a Join definition, values are always compared as strings. This means that values such as 12345, 12345.00, and 1.2345E+04 can be considered to be different values by the Join expression since these are different strings, even though these strings contain the same numeric value.

If your aggregation will join messages of different types, or produce messages of a different type than the underlying topics, you add message type specifiers to the Join definition:

[messagetype].[topic].[field]=[messagetype].[topic].[field]

In this case, the square brackets [] around the messagetype are mandatory. AMPS creates a projection in the aggregation that combines the messages from each topic where the expression is true. In other words, for the expression:

<UnderlyingTopic>
   <Join>[Orders].[/CustomerID]=[Addresses].[/CustomerID]</Join>
</UnderlyingTopic>

AMPS projects every message where the same CustomerID appears in both the Addresses topic and the Orders topic. If a CustomerID value appears in only the Addresses topic, AMPS does not create a projection for the message. If a CustomerID value appears in only the Orders topic, AMPS projects the message with NULL values for the Addresses topic. In database terms, this is equivalent to a LEFT OUTER JOIN.

info

In a Join expression, AMPS does not consider NULL values, including empty strings, to be equivalent. This behavior matches ANSI SQL behavior, but differs from previous releases of AMPS.

You can disable this behavior and cause NULL values to match by including the JoinNullEquivalency option in the View definition and setting that option to enabled.

60East does not recommend setting this option unless it is necessary to preserve backward compatibility with previous versions of AMPS.

You can use any number of Join definitions in an underlying topic:

<Join>[nvfix].[Orders].[/CustomerID]=[json].[Addresses].[/CustomerID]</Join>
<Join>[nvfix].[Orders].[/ItemID]=[nvfix].[Catalog].[/ItemID]</Join>

In this case, AMPS creates a projection that combines messages from the Orders, Addresses, and Catalog topics for any published message where matching messages are present in all three topics. Where there are no matching messages in the Catalog and Addresses topics, AMPS projects those values as NULL.

Setting the Message Type

The MessageType element of the definition sets the type of the outgoing messages. The message type of the aggregation does not need to be the same as the message type of the topics used to create the aggregation. However, if the MessageType differs from the type of the topics used to produce the aggregation, you must explicitly specify the message type of the underlying topics.

For example, to produce JSON messages regardless of the types of the topics in the aggregation, you would use the following element:

<MessageType>json</MessageType>

Defining Projections

AMPS makes available all fields from matching messages in the join specification. You specify the fields that you want AMPS to project and how to project them.

To tell AMPS how to project a message, you specify each field to include in the projection. The specification provides a name for the projected field and one or more source fields to use for the projected field. The data can be projected as-is, or aggregated using one of the AMPS aggregation functions, as described in the section on Aggregate Functions in the Constructing Fields topic.

You refer to source fields using the XPath-like expression for the field. You name projected fields by creating an XPath-like expression for the new field. AMPS uses this expression to name the new field.

<Projection>
    <Field>[Orders].[/CustomerID]</Field>
    <Field>[Addresses].[/ShippingAddress] AS /DestinationAddress</Field>
    <Field>SUM([Orders].[/TotalPrice]) AS /AccountTotal</Field>
</Projection>

The sample above uses the CustomerID from the orders topic and the shipping address for that customer from the Addresses topic. The sample calculates the sum of all of the orders for that customer as the AccountTotal. The sample also renames the ShippingAddress field as DestinationAddress in the projected message.

For more information on constructing fields in a view, see the Constructing Fields topic.

Data Types and Projections

When projecting views, AMPS converts the original values into the AMPS internal type system and serializes those values into a new message. This approach allows AMPS to efficiently aggregate messages of different types and produce predictable results. The data type of the serialization is determined by the message type of the projected message: the message types provided by 60East in this release project the AMPS internal type.

This means that, for message types that rely on type markers to identify the type (such as bson), the type of the field in the projected message may reflect the AMPS internal type rather than the original type. This conversion is typically a widening conversion for numeric types (for example, input typed as a 32-bit integer will typically be widened to a 64-bit integer). For other types, the most common conversion is from a specific data type (such as regular expression) to a string type.

A projection is evaluated as projecting a single value in the AMPS type system. This means that complex or nested data types are typically projected as the string equivalent. For example, a nested set of XML elements could be projected as an empty string (the text value of the containing element), or an array could be projected as the first value in the array.

If necessary, and if the destination data type supports nested data structures, you can project the individual fields of a complex type. For example, given a set of messages like the following (in a Topic with keys of /orderId and /line):

{"orderId":42, "line":1, "detail":{"product":"AAPL", "qty":40}}

{"orderId":42, "line":2, "detail":{"product":"AAPL", "qty":60}}

You could produce summaries for each detailed product by using a projection like:

<Projection>
   <Field>/orderId</Field>
   <Field>/detail/product</Field>
   <Field>SUM(/detail/qty) as /detail/qty</Field>
 </Projection>
 <Grouping>
    <Field>/orderId</Field>
    <Field>/detail/product</Field>
 </Grouping>

For the messages above, this would produce the following summary record:

{"detail":{"product":"AAPL","qty":100.0},"orderId":42}

For details on the AMPS data types, see the section that describes the AMPS Data Types.

Grouping

Use Grouping statements to tell AMPS how to aggregate data across messages and generate projected messages.

For example, an Orders topic that contains messages for incoming orders could be used to calculate aggregates for each customer, or aggregates for each symbol ordered. The Grouping statement tells AMPS which way to group messages for aggregation.

<Grouping>
    <Field>[Orders].[/CustomerID]</Field>
</Grouping>

The sample above groups and aggregates the projected messages by CustomerId. Since this statement tells AMPS to group by CustomerId, AMPS projects a message for each distinct CustomerId value. A message to the Orders topic will create an outgoing message with data aggregated over the CustomerId.

info

Fields used in the Grouping element must be fields in the underlying topics.

Each field in the projection should either be an aggregate or be specified in the Grouping element. Otherwise, AMPS returns the last processed value for the field.

info

Unlike ANSI SQL, AMPS allows you to include fields in the projection that are not included in the Grouping or used within the aggregate functions.

In this case, AMPS uses the value present in the last message inserted or updated within the grouping as the value for these fields. The value of the field in this case depends entirely on the order in which this instance of AMPS processes inserts and updates to the underlying topic (or topics). Deleting the last message processed does not update this value. Unlike an aggregation function (which processes deletes), a non-aggregated field is not changed by a delete.

Upon recovery, AMPS enforces a consistent order of updates when rebuilding the view from the SOW topic to ensure that the value of the field is consistent across recovery and restart.

Inline Update Conflation

AMPS has the ability to conflate updates to a view. Conflation is particularly useful when a view receives a high velocity of updates and subscribers to the view have no need to track every update, but instead want to see the current state of the view as quickly as possible. For applications that have a high update rate and relatively complicated view processing, inline conflation can significantly reduce the total number of updates processed for the view and increase overall throughput.

Inline conflation changes how AMPS manages pending updates for a view. Without inline conflation enabled for a view, AMPS processes all messages for a view strictly in the order in which those messages were published. Even if there are multiple pending updates to the same record, AMPS processes each of those messages in turn and updates the view for each message.

When inline conflation is enabled and a message arrives with the same Grouping value as a message waiting to be processed, AMPS replaces the pending message with the new message, and only processes the new message. Inline conflation does not cause AMPS to slow down the rate at which AMPS processes updates for a view. AMPS continues to process updates for the view as fast as possible, and makes no guarantees as to the number of updates to a view produced by a given set of updates to an underlying topic.

The diagram below shows a simplified representation of inline conflation for a view where the underlying SOW uses the id field of the message as the Key. With conflation set to none (the default for a view), each message is added to the end of the messages waiting to be processed, whether or not an update for that group is already waiting. Both updates are processed. By contrast, when conflation is set to inline, if there is an existing update waiting, the new update replaces the existing update, and only the new update is processed.

Diagram showing inline conflation for a view

Given that inline conflation replaces messages while processing is pending, the following considerations apply to views that enable inline conflation:

Not every update to the underlying topic will produce an individual update to the view: when multiple updates occur to the same record in a short period of time, AMPS may only process the last update.
Updates to the view may be produced in an order different than the order in which the messages were published to the underlying topic, since AMPS replaces messages waiting to be processed.
The final state of the view will be exactly as if each update were processed, since it will be based on the latest values in the underlying topic (or topics).

To enable inline conflation, add the Conflation element to the configuration for the View, as shown below:

<SOW>
    <View>
        ...
        <Conflation>inline</Conflation>
        ...
    </View>
</SOW>

Filtering Single Topic Aggregations

When a view aggregates a single topic, you can use a Filter element in the view definition to limit the messages included in the view to only those messages that match the filter. For example, to aggregate only messages from an underlying topic where the /status is complete, you could define your view as follows:

<SOW>
    ...

    <Topic>
        <Name>orders</Name>
        <MessageType>json</MessageType>
        <Key>/orderId</Key>
        <FileName>./sow/%n.sow</FileName>
    </Topic>
    <View>
        <Name>CompleteByRegion</Name>
        <UnderlyingTopic>orders</UnderlyingTopic>
        <MessageType>json</MessageType>
        <Projection>
            <Field>COUNT(/orderId) AS /completedOrders</Field>
            <Field>/region AS /region</Field>
        </Projection>
        <Grouping>
            <Field>/region</Field>
        </Grouping>
        <Filter>/status = 'complete'</Filter>
    </View>

    ...
</SOW>

The Filter element is not supported for multiple topic aggregation.

Single Topic Aggregation: UnderlyingTopic​

Multiple Topic Aggregation: Join​

Setting the Message Type​

Defining Projections​

Data Types and Projections​

Grouping​

Inline Update Conflation​

Filtering Single Topic Aggregations​

Single Topic Aggregation: UnderlyingTopic

Multiple Topic Aggregation: Join

Setting the Message Type

Defining Projections

Data Types and Projections

Grouping

Inline Update Conflation

Filtering Single Topic Aggregations