APJ - ISV - Database - O'Reilly eBook: An Introduction to Cloud Databases

Page 41 of 47

Logs and system monitoring Determine the events that can indicate imminent problems as well as problems that have already occurred. These can be incorporated into automated tracking. The tracking should con‐ vey enough information to tell you the source of failure: for instance, whether they stem from a user action such as a reboot of a service, from an attack, or from other changes in the envi‐ ronment. Some failures can be considered normal and can be addressed by your automated tools; these should be recorded but do not need to issue alerts to the administrator. Change monitoring Administrators should always know what changes to database configuration, instance sizing, or cluster topology can affect availability. Modern development environments use robust pro‐ cesses for change tracking and version control so that every change goes through a vetting process and can be reversed. System testing Try to determine the weak points in your system and anticipate failure. Some teams go through "pre-mortem" exercises to iden‐ tify and remove potential sources of failure. Large sites can afford to bring down systems deliberately and watch whether recovery is adequate; this kind of testing, called chaos engineer‐ ing, was popularized through Netflix's Chaos Monkey. Just as you do regular restores to make sure backups are working, you should test your recovery procedures. Performance Optimization Performance can benefit from the practices in the preceding section, notably monitoring. Performance monitoring should allow you to determine the relationships between events and changes in the data‐ base metrics, as well as to see discrepancies between the predicted and actual performance trends. Performance can additionally be maintained and improved through additional processes. Workload testing Growth in the size and complexity of data, as well as application behavior changes, affect performance. Test performance regu‐ larly so that you learn of degradation before your customers tell you about it; you can then scale or make other changes to adapt. It can take a while for the database cache and table statistics to 36 | Chapter 3: Moving Your Databases to the Cloud

Articles in this issue

Links on this page

https://netflix.github.io/chaosmonkey/

view archives of APJ - ISV - Database - O'Reilly eBook: An Introduction to Cloud Databases

APJ - ISV - Database

O'Reilly eBook: An Introduction to Cloud Databases

Contents of this Issue

Navigation

Page 41 of 47

Articles in this issue

Links on this page