Hochverfügbarkeit

Planning for High Availability

08.05.2003
Von Bob Zimmerman

Operations management integration: Monitoring and management tools may adequately manage watcher and heartbeat functions, but operations integration can go much deeper. Applications may incorporate management APIs to raise alerts (e.g., SNMP traps), enable full monitoring and management (e.g., SNMP MIBs) and write errors to logs that are monitored by a management tool.

Automatic restart: When a watcher or management tool detects a failure, restart must perform necessary application cleanup, reinitiate application processes, reconnect them as appropriate and reregister them with application naming services.

Version migration: The highest levels of availability require eliminating planned downtime, which may involve upgrading application versions while the application is running. The two basic approaches for this are (1) parallel operation of multiple versions and (2) a "flash cut" to a hot standby (in-flight transactions complete on the old version; all new transactions go to the new version). Supplemental approaches include auto-update clients and version awareness within application interfaces (or within the infrastructure, as in .NET's version management). The biggest issue arises when a new version changes data structures -- without a downtime window in which to perform the conversion, the application must be written to handle data conversion on the fly.

Connection management: The application must be designed to handle connection failures (e.g., network, DBMS) by recognizing connection timeouts and re-establishing connections to alternate providers, most likely found via an application naming service.

Multi-threaded resource requests: For resource requests that have the possibility of a timeout, an HA application may spawn separate threads for making such requests. This allows the application to more effectively manage response to its users when it experiences a timeout due to a resource failure.

Zur Startseite