Disaster Recovery System Testing
Statistics show that most small to mid-sized businesses will experience at least one instance of system downtime a year, at an average cost of $74,000 an hour, according to Aberdeen Group. Unfortunately, only a handful of small to mid-sized businesses have ever even tested their backup to find out if complete recovery of their data, applications and systems after a disaster is possible. For those that do test, it’s often conducted infrequently — a result of outdated thinking that DR testing is time- and cost-prohibitive or too complex an endeavor. Of course, if an organization is using tape or disk backup for disaster recovery or cloud backup alone, these complaints are not altogether unfounded. hybrid cloud solutions are more reliable and less cumbersome, with on-demand and automatic testing that can be performed in minutes.
Regardless of your DR solution of choice, the fact remains that frequent changes in your environment make the case for “Always Be Testing” — weekly, full DR tests that account for every change that was made to a company’s system.
Common Pitfalls Emphasize Importance of Weekly Testing
Many organizations simply aren’t aware of everything that can go wrong when recovering emergency backups, and if you never actually try to restore a file, application or server, you don’t really know if you can. But because of the time and cost issues associated with tape, disk and cloud backup, IT professionals resort to some workarounds, such as performing a scaled- down version of a test in either a partial environment or in a partial format. The problem here is that an organization’s infrastructure is constantly changing. Servers, applications and systems are added, modified and removed, and your backup and recovery system must take these changes into account.
Microsoft’s Patch Tuesdays also introduce changes that could spell trouble if regular testing is ignored. These security patches are released once a month, but Microsoft also releases daily updates or extras on “extraordinary Patch Tuesdays.” And it doesn’t stop with Microsoft: SAP advises users to install security updates on “Security Patch Days,” which coincide with Microsoft’s Patch Tuesdays. Adobe Systems’ update schedule for its Flash Player joins the fray on Patch Tuesdays as well.
Changes in hardware, too, can be overlooked. When installed, new hardware is tested once. Regular testing after that seems superfluous, the argument being that the main DR software hasn’t changed, so the test results won’t change from the first time. But if any new hardware has been installed or upgraded and the DR software hasn’t been tested on the new platform, cracks can emerge.
Unfortunately, a common practice is to test a tape to ensure the data on it is good and that files can be recovered. After this type of test, IT professionals have the false sense of security that an entire server can be restored. But if the server itself hasn’t been tested regularly (let alone the drivers, RAID controllers or NICs), there is no way to know that everything will perform as expected.
Backup corruption is especially prevalent in tape environments. Tape media spins at a high speed, and over time, it becomes unusable. Because tapes are rarely switched out, there’s no way to know it’s corrupt until a data restoration is attempted. Corruption can also occur if a backup device has been damaged or incorrectly processed, or if the storage device has some sort of physical defect. Worse yet is if invalid data has been unknowingly backed up repeatedly over a long period of time. This means all backup copies have been corrupted.
While corruption with disk backup is far more infrequent, it has drawbacks that should be mentioned. Organizations are still limited by how much data they can retain for a long time. Disks may be added, but this approach still only allows for backup and recovery, not full DR testing.
Human error and flaws in the execution of backups occur frequently, especially in tape environments. If backups are set up correctly, tapes will be “locked” for a specific amount of time, preventing overriding of data. But if the backup was set up improperly and the tape was somehow not locked, data on that tape will be overridden.
Clearly, even a tiny change to your IT system increases the odds that something might go wrong later on. And if you wouldn’t feel comfortable erasing your hard disk right now and restoring it from your backups, it’s time to commit to a new weekly routine and adopt the “Always Be Testing” approach.
By Larry Lang, Chief Executive Officer
Larry brings more than twenty years of global business-building experience to Quorum, which offers appliance and hybrid cloud solutions for one-click backup, recovery and continuity. His innovative views on business demands for rapid assured disaster recovery have been shared through industry forums like the International Legal Technology Association and the World Conference on Disaster Management and quoted in publications like the Wall Street Journal and Forbes. He has led Quorum through substantial revenue growth and several rounds of financing.