Regular Expressions
Regular expression matching provides precision, power, and flexibility for matching patterns. AMPS supports regular expression matching on topics and within content filters. Regular expressions are implemented in AMPS using the Perl-Compatible Regular Expressions (PCRE) library. For a complete definition of the supported regular expression syntax, please refer to:
http://perldoc.perl.org/perlre.html
To use regular expressions for topic matching, provide a regular expression pattern where you would normally provide a topic name.
To use regular expressions in content filtering, compare strings to regular expressions using the LIKE
operator. The syntax of the LIKE
operator is:
In this context, a string is any expression that provides a string and pattern is a literal regular expression pattern.
This chapter presents a brief overview of regular expressions in AMPS. However, this chapter is not exhaustive. For more information on regular expression matching, see the PCRE site mentioned above.
Examples
Here is an example of a content filter for messages that will match any message meeting the following criteria:
Regular expression match of symbols of 2 or 3 characters starting with “IB”
Regular expression match of prices starting with “90”
Numeric comparison of prices less than 91
The corresponding content filter would be:
The tables below contain a brief summary of special characters and constructs available within regular expressions.
Here are more examples of using regular expressions within AMPS:
Use (?i)
to enable case-insensitive regular expression searching. For example, the following filter will be true regardless if /client/country
contains “US” or “us”.
To match messages where tag 55 has a TRADE
suffix, use the following filter:
To match messages where tag 109 has a US
prefix and a TRADE
suffix, with case insensitive matching, use the following filter:
AMPS recognizes the following regular expression metacharacters:
Character | Meaning |
^ | Beginning of string |
$ | End of string |
. | Any character except a newline |
* | Match previous 0 or more times |
? | Match previous 0 or 1 times |
() | Grouping of expression |
[] | Set of characters |
{} | Repetition modifier |
\ | Escape for special characters |
AMPS recognizes the following repetition constructs:
Construct | Meaning |
a* | Zero or more a's |
a? | Zero or one a's |
a{m} | Exactly m a's |
a{m,} | At least m a's |
a{m,n} | At least m, but no more than n a's |
The table below lists some of the modifiers AMPS recognizes:
Modifier | Meaning |
i | Case insensitive search |
m | Multi-line search |
s | Any character (including newlines) can be matched by a . character |
x | Unescaped white space is ignored in the pattern |
A | Constrain the pattern to only match the beginning of a string |
U | Make the quantifiers non-greedy by default (the quantifiers are greedy and try to match as much as possible by default) |
Raw Strings
AMPS additionally provides support for raw strings, which are strings prefixed by an 'r' or 'R' character. Raw strings use different rules for how a backslash escape sequence is interpreted by the parser. When a string literal is provided as a raw string, the characters in the raw string are matched exactly, even when those characters are special characters for a regular expression.
In the example below, the raw string - noted by the r
prefix of the string literal in the second operand of the LIKE
predicate causes AMPS to search for the literal characters ++
in the results, without requiring those characters to be escaped. In this example we are querying for a string that contains the programming language named C++
. In the regular string, we are required to escape the '+'
character since it is also used in a regular expression as the “match previous 1 or more times” regular expression character. In the raw string we can use r'C++'
to search for the string and not have to escape the special '+'
character.
An expression using the raw string capability would look like the following:
This can be simpler and easier to read then the escaped equivalent, shown below:
Subscribing to a Set of Topics Using Regular Expressions
As mentioned previously, AMPS supports regular expression filtering for topics, in addition to content filters. Regular expressions use the same grammar described in content filtering. Regular expression matching for topics is enabled in an AMPS instance by default.
Subscriptions or queries that use a regular expression for the topic name provide all matching records from AMPS topics where the name of the topic matches the regular expression used for the subscription or query. For example, if your AMPS configuration has three SOW topics, Topic_A
, Topic_B
and Topic_C
and you wish to search for all messages in all of your SOW topics for records where the Name
field is equal to “Bob”, then you could use a sow
command with a topic of ^Topic_.*
and a filter of /FIXML/@Name='Bob'
to return all matching messages that match the filter in all of the topics that match the topic regular expression.
Notice that, as with the LIKE
expression, a regular expression will match at any position in the topic name. To anchor the match to the beginning of the string, use the ^
directive at the beginning of the regular expression. To anchor the match to the end of the string, use the $
directive at the end of the string.
For example, to match a topic with "order"
anywhere in the topic name, you could use the regular expression order.*
(the ending .*
matches zero or more characters, but lets AMPS know to interpret this as a regular expression). To match only topics that start with order
, you would use the regular expression ^order
. To match topics that end with order
, you would use the regular expression order$
.
Results returned when performing a topic regular expression query will follow “configuration order” — meaning that the topics will be searched in the order that they appear in your AMPS configuration file. Using the above query example with Topic_A
, Topic_B
and Topic_C
, if the configuration file has these topics in that exact order, the results will be returned first from Topic_A
, then from Topic_B
and finally the results from Topic_C
. As with other queries, AMPS does not make any guarantees about the ordering of results within any given topic query.
Last updated