Operations Best Practices
This section covers a selection of best practices for deploying AMPS.
Monitoring
AMPS exposes the statistics available for monitoring via a RESTful interface, described in the Monitoring AMPS section and the AMPS Monitoring Guide. The interface is available through the address specified in the Admin
section of the configuration. This interface allows developers, administrators, and monitoring tools to easily inspect various aspects of AMPS performance and resource consumption using standard monitoring tools.
At times, AMPS will emit log messages notifying that a thread has encountered a deadlock or stressful operation. These messages will repeat with the word “stuck” in them. AMPS will attempt to resolve these issues, however after 60 seconds of a single thread being stuck, AMPS will automatically emit a minidump to the previously configured minidump directory. This minidump can be used by 60East support to assist in troubleshooting the location of the stuck thread or the stressful process.
Monitor the contents of dmesg
on the instance for errors that affect the AMPS process. For example, if the operating system runs low on memory and begins shutting down processes, this information will be recorded in dmesg
. Likewise, system events such as hardware failures that can affect AMPS are most likely to be recorded in the dmesg
output.
Another area to examine when monitoring AMPS is the last_active
monitor for the processors. This can be found in the /amps/instance/processors/all/last_active
url in the monitoring interface. If the last_active
value continually increases for more than one minute and there is a noticeable decline in the quality of service, then it may be best to fail-over and restart the AMPS instance.
Logging
60East recommends that an instance of AMPS used for production log at info
level (at a minimum). This provides a basic record of the operations requested in AMPS, and is the minimum level of logging needed to troubleshoot most issues. Further, a production instance should have the capacity available to log at a more verbose level if necessary, for troubleshooting and diagnostic purposes.
An instance used for development or UAT purposes should typically log at trace
level so that the interaction between an application and AMPS is captured.
60East also recommends capturing stdout and stderr for the AMPS process. This can provide information about operating system or runtime errors in the event that a problem occurs outside of the control of AMPS (and, therefore, cannot be recorded in the AMPS event log).
Stopping AMPS
To stop AMPS, ensure that AMPS runs the amps-action-do-shutdown
action. By default, this action is run when AMPS receives SIGHUP
, SIGINT
, or SIGTERM
. However, you can also configure an action to shut down AMPS in response to other conditions. For example, if your company policy is to reboot servers every Saturday night, and AMPS is not running as a system service (or daemon), you could schedule an AMPS shutdown every Saturday before the system reboot.
When AMPS is installed to run as a system service (or daemon), AMPS installs shutdown scripts that will cleanly stop AMPS during a system shutdown or reboot.
SOW Parameters
Choosing the ideal SlabSize
for your SOW topic is a balance between the frequency of SOW expansion and storage space efficiency. A large SlabSize
will preallocate space for records when AMPS begins writing to the SOW.
If detailed tuning is not necessary, 60East recommends leaving the SlabSize
at the default size if your messages are smaller than the default SlabSize
. If your messages are larger than the default SlabSize, a good starting point for the SlabSize
is to set it to several times the maximum message size you expect to store in the SOW.
There are three considerations when setting the optimum SlabSize
:
Frequency of allocations
Overall size of the SOW
Efficient use of space
A SlabSize
that is small results in frequent extensions of your SOW topic to occur. These frequent extensions can reduce throughput in a heavily loaded system, and in extreme cases can exhaust the kernel limit on the number of regions that a process can map. Increasing the SlabSize
will reduce the number of allocations.
When the SlabSize
is large, then the risk of the SOW resize affecting performance is reduced. Since each slab is larger, however, there will be more space consumed if you are only storing a small number of messages: this cost will amortize as the number of messages in the SOW exceeds the number of cores in the system * the number of messages that fit into a slab.
To most efficiently use space, set a SlabSize
that minimizes the amount of unused space in a slab. For example, if your message sizes are average 512 bytes but can reach a maximum of 1.2 MB, one approach would be to set a SlabSize
of 2.5MB to hold approximately 5 average-sized messages and two of the larger-sized messages. Looking at the actual distribution of message sizes in the SOW (which can be done with the amps_sow_dump
utility) can help you determine how best to size slabs for maximum space efficiency.
For optimizing the SlabSize
, determine how important each aspect of SOW tuning is for your application, and adjust the configuration to balance allocation frequency, overall SOW size, and space to meet the needs of your application.
Given AMPS is highly-parallelized, AMPS operates more efficiently when it is able to run tasks in parallel. When considering options for SlabSize, be sure that the value you choose will result in a number of slabs that is at least equal to the number of cores in the system. A SlabSize setting that results in only a few slabs could cause reduced query performance. For example, a system with a single publisher and a SlabSize large enough to hold all of the records produced by that publisher, doesn’t allow a query to be parallelized since all of the records will be in a single slab.
Slow Clients
As described in Slow Client Management, AMPS provides capacity limits for slow clients to reduce the memory resources consumed by slow clients. This section discusses tuning slow client handling to achieve your availability goals.
Slow Client Offlining for Large Result Sets
The default settings for AMPS work well in a wide variety of applications with minimal tuning.
If you have particularly large SOW topics and your application is disconnecting clients due to exceeding the offlining threshold when the clients are retrieving large SOW query result sets, 60East recommends the following settings as a baseline for further tuning:
60East recommends that you use these settings as a baseline for further tuning, bearing in mind the needs and expected messaging patterns of your application.
WAN Traffic and Slow Client Settings
In some installations, a single AMPS instance will serve both applications that are local to the instance and applications that retrieve data over a higher-latency network. For example, applications in a small regional office may use a server in another region over a WAN.
In these situations, consider either adjusting the slow client settings so that those clients can complete operations such as large SOW queries successfully, or consider creating a separate transport with higher capacity settings that will be used only by the small number of clients that require these settings due to network limitations. In particular, if you set a ClientMessageAgeLimit
for an instance or transport, ensure that this limit is large enough that the network can consume the results of the SOW queries that clients are expected to make within the allotted time.
Minidump
AMPS includes the ability to generate a minidump file, which can be used by 60East support, to help troubleshoot a problematic instance.
The minidump captures thread state information: a snapshot of where in the source code each thread is, the call stack for each thread, and the register information for each frame of the call stack. A minidump also contains basic information about the system that AMPS was running on, such as the processor type and number of sockets. Minidumps do not contain other internal state of AMPS or the contents of application memory. Minidumps do not contain detailed information about the host system, and have no information about the state of the host or operating system. Instead, minidumps identify the point of failure to help 60East quickly narrow down the issue without generating large files or potentially compromising sensitive data.
Minidumps can be produced much faster than a standard core dump, and use significantly less space since the minidump contains only a small subset of the information a core dump would contain (see the ulimit section in Linux OS Settings for more configuration options). Because minidumps are relatively inexpensive, the AMPS server may produce minidumps for temporary conditions that the server subsequently recovers from. AMPS also allows creation of a minidump on demand.
Generation of a minidump file occurs in the following ways:
When AMPS detects a crash internally, a minidump file will automatically be generated. This includes cases where an AMPS thread or critical internal component has not reported progress for an extended period of time (typically 300 seconds).
When a user clicks on the
minidump
link in theamps/instance/administrator
link from the administrator console (see the AMPS Monitoring Reference for more information).By sending the running AMPS process the
SIGQUIT
signal.In response to a configured action.
If a thread fails to report progress with the AMPS thread monitor for approximately 60 seconds, a minidump will automatically be generated. This should be sent to AMPS support for evaluation along with a description of the operations taking place at the time (typically, info level or more verbose logging).
By default the minidump is configured to write to /tmp
, but this can be changed in the AMPS configuration by modifying the MiniDumpDirectory
. 60East recommends monitoring the minidump directory.
If minidumps occur, contact 60East support for diagnosis and troubleshooting. Bear in mind that minidumps are often a symptom of a slowdown in the server due to resource constraints rather than an indication that the server has exited.
Once a minidump is submitted to 60East (and acknowledged as received), there is no further need to retain that minidump. 60East recommends removing minidumps when they are no longer needed.
Deployment and Upgrade Plan
60East offers a deployment checklist for use when planning or upgrading an installation of AMPS. The checklist covers recommendations for operations considerations such as:
Capacity Planning
Operating System Configuration
AMPS Configuration
Developing and Configuring Maintenance Plans
Creating a Monitoring Strategy
Creating a Patching and Upgrade Plan
Creating a Support Plan and Verifying the Support Process
The checklist may not cover all aspects of deployment in a particular environment but can be used to create a checklist and deployment plan for your environment.
Last updated