Defining Views and Aggregations
Multiple topic aggregation creates a view using more than one topic as a data source. This allows you to enrich messages as they are processed by AMPS, enabling aggregate calculations using information published to more than one topic. You can combine messages from multiple topics and use filtered subscriptions to determine which messages are of interest. For example, you can set up a topic that contains orders from high-priority customers.
You can join topics of different message types, and you can project messages of a different type than the underlying topic.
To create an aggregate using multiple topics, each topic needs to maintain a SOW. Since views maintain an underlying SOW, you can create views from views.
To define an aggregate, you decide:
The topic, or topics, that contain the source for the aggregation
If the aggregation uses more than one topic, how those topics relate to each other
What messages to publish, or project, from the aggregation
How to group messages for aggregation
The message type of the aggregation
Message types provided with AMPS fully support views, with the following exceptions:
binary
message types cannot be the underlying topic for a view or the type of a view.protobuf
message types can be the underlying topic for a view, but cannot be the type of a view.composite-global
message types can be the underlying topic for a view, but cannot be the type of a view.struct
message types can be the underlying topic for a view, but cannot be the type of a view.
If you are using a custom message type, check with the message type developer as to whether that message type supports aggregation.
Single Topic Aggregation: UnderlyingTopic
For aggregations based on a single topic, use the UnderlyingTopic
element to tell AMPS which topic to use. All messages from the UnderlyingTopic
will appear in the aggregation.
Multiple Topic Aggregation: Join
Join
definitions tell AMPS how to relate underlying topics to each other. You use a separate Join
element for each relationship in the view. Most often, the Join
definition describes a relationship between topics:
The topics specified must be previously defined in the AMPS configuration file. The square brackets []
are optional. If they are omitted, AMPS uses the first /
in the expression as the start of the field definition. You can use any number of Join
expressions to define a multiple topic aggregation.
A Join
definition is an equality comparison between the values of two fields. The Join
definition is not evaluated as an AMPS expression, so functions, operators (other than =
) and so forth are not evaluated in these definitions.
Within a Join
definition, values are always compared as strings. This means that values such as 12345
, 12345.00
, and 1.2345E+04
can be considered to be different values by the Join
expression since these are different strings, even though these strings contain the same numeric value.
If your aggregation will join messages of different types, or produce messages of a different type than the underlying topics, you add message type specifiers to the Join
definition:
In this case, the square brackets []
around the messagetype are mandatory. AMPS creates a projection in the aggregation that combines the messages from each topic where the expression is true. In other words, for the expression:
AMPS projects every message where the same CustomerID
appears in both the Addresses
topic and the Orders
topic. If a CustomerID
value appears in only the Addresses
topic, AMPS does not create a projection for the message. If a CustomerID
value appears in only the Orders
topic, AMPS projects the message with NULL
values for the Addresses
topic. In database terms, this is equivalent to a LEFT OUTER JOIN
.
In a Join
expression, AMPS does not consider NULL
values, including empty strings, to be equivalent. This behavior matches ANSI SQL behavior, but differs from previous releases of AMPS.
You can disable this behavior and cause NULL
values to match by including the JoinNullEquivalency
option in the View
definition and setting that option to enabled
.
60East does not recommend setting this option unless it is necessary to preserve backward compatibility with previous versions of AMPS.
You can use any number of Join
definitions in an underlying topic:
In this case, AMPS creates a projection that combines messages from the Orders
, Addresses
, and Catalog
topics for any published message where matching messages are present in all three topics. Where there are no matching messages in the Catalog
and Addresses
topics, AMPS projects those values as NULL
.
Setting the Message Type
The MessageType
element of the definition sets the type of the outgoing messages. The message type of the aggregation does not need to be the same as the message type of the topics used to create the aggregation. However, if the MessageType
differs from the type of the topics used to produce the aggregation, you must explicitly specify the message type of the underlying topics.
For example, to produce JSON messages regardless of the types of the topics in the aggregation, you would use the following element:
Defining Projections
AMPS makes available all fields from matching messages in the join specification. You specify the fields that you want AMPS to project and how to project them.
To tell AMPS how to project a message, you specify each field to include in the projection. The specification provides a name for the projected field and one or more source fields to use for the projected field. The data can be projected as-is, or aggregated using one of the AMPS aggregation functions, as described in the section on Aggregate Functions in the Constructing Fields topic.
You refer to source fields using the XPath-like expression for the field. You name projected fields by creating an XPath-like expression for the new field. AMPS uses this expression to name the new field.
The sample above uses the CustomerID
from the orders topic and the shipping address for that customer from the Addresses
topic. The sample calculates the sum of all of the orders for that customer as the AccountTotal
. The sample also renames the ShippingAddress
field as DestinationAddress
in the projected message.
For more information on constructing fields in a view, see the Constructing Fields topic.
Data Types and Projections
When projecting views, AMPS converts the original values into the AMPS internal type system and serializes those values into a new message. This approach allows AMPS to efficiently aggregate messages of different types and produce predictable results. The data type of the serialization is determined by the message type of the projected message: the message types provided by 60East in this release project the AMPS internal type.
This means that, for message types that rely on type markers to identify the type (such as bson
), the type of the field in the projected message may reflect the AMPS internal type rather than the original type. This conversion is typically a widening conversion for numeric types (for example, input typed as a 32-bit integer will typically be widened to a 64-bit integer). For other types, the most common conversion is from a specific data type (such as regular expression) to a string type.
A projection is evaluated as projecting a single value in the AMPS type system. This means that complex or nested data types are typically projected as the string equivalent. For example, a nested set of XML elements could be projected as an empty string (the text value of the containing element), or an array could be projected as the first value in the array.
If necessary, and if the destination data type supports nested data structures, you can project the individual fields of a complex type. For example, given a set of messages like the following (in a Topic with keys of /orderId
and /line
):
You could produce summaries for each detailed product by using a projection like:
For the messages above, this would produce the following summary record:
For details on the AMPS data types, see the section that describes the AMPS Data Types.
Grouping
Use Grouping
statements to tell AMPS how to aggregate data across messages and generate projected messages.
For example, an Orders
topic that contains messages for incoming orders could be used to calculate aggregates for each customer, or aggregates for each symbol ordered. The Grouping
statement tells AMPS which way to group messages for aggregation.
The sample above groups and aggregates the projected messages by CustomerId
. Since this statement tells AMPS to group by CustomerId
, AMPS projects a message for each distinct CustomerId
value. A message to the Orders
topic will create an outgoing message with data aggregated over the CustomerId
.
Fields used in the Grouping
element must be fields in the underlying topics.
Each field in the projection should either be an aggregate or be specified in the Grouping
element. Otherwise, AMPS returns the last processed value for the field.
Unlike ANSI SQL, AMPS allows you to include fields in the projection that are not included in the Grouping
or used within the aggregate functions.
In this case, AMPS uses the value present in the last message inserted or updated within the grouping as the value for these fields. The value of the field in this case depends entirely on the order in which this instance of AMPS processes inserts and updates to the underlying topic (or topics). Deleting the last message processed does not update this value. Unlike an aggregation function (which processes deletes), a non-aggregated field is not changed by a delete.
Upon recovery, AMPS enforces a consistent order of updates when rebuilding the view from the SOW topic to ensure that the value of the field is consistent across recovery and restart.
Inline Update Conflation
AMPS has the ability to conflate updates to a view. Conflation is particularly useful when a view receives a high velocity of updates and subscribers to the view have no need to track every update, but instead want to see the current state of the view as quickly as possible. For applications that have a high update rate and relatively complicated view processing, inline conflation can significantly reduce the total number of updates processed for the view and increase overall throughput.
Inline conflation changes how AMPS manages pending updates for a view. Without inline conflation enabled for a view, AMPS processes all messages for a view strictly in the order in which those messages were published. Even if there are multiple pending updates to the same record, AMPS processes each of those messages in turn and updates the view for each message.
When inline conflation is enabled and a message arrives with the same Grouping
value as a message waiting to be processed, AMPS replaces the pending message with the new message, and only processes the new message. Inline conflation does not cause AMPS to slow down the rate at which AMPS processes updates for a view. AMPS continues to process updates for the view as fast as possible, and makes no guarantees as to the number of updates to a view produced by a given set of updates to an underlying topic.
The diagram below shows a simplified representation of inline conflation for a view where the underlying SOW uses the id
field of the message as the Key
. With conflation set to none
(the default for a view), each message is added to the end of the messages waiting to be processed, whether or not an update for that group is already waiting. Both updates are processed. By contrast, when conflation is set to inline
, if there is an existing update waiting, the new update replaces the existing update, and only the new update is processed.
Given that inline conflation replaces messages while processing is pending, the following considerations apply to views that enable inline conflation:
Not every update to the underlying topic will produce an individual update to the view: when multiple updates occur to the same record in a short period of time, AMPS may only process the last update.
Updates to the view may be produced in an order different than the order in which the messages were published to the underlying topic, since AMPS replaces messages waiting to be processed.
The final state of the view will be exactly as if each update were processed, since it will be based on the latest values in the underlying topic (or topics).
To enable inline conflation, add the Conflation
element to the configuration for the View
, as shown below:
Filtering Single Topic Aggregations
When a view aggregates a single topic, you can use a Filter
element in the view definition to limit the messages included in the view to only those messages that match the filter. For example, to aggregate only messages from an underlying topic where the /status
is complete, you could define your view as follows:
The Filter
element is not supported for multiple topic aggregation.
Last updated