
Disaster Recovery, Business Continuity and Contingency Planning
NFPA 1600 Standard on Disaster/Emergency Management and Business Continuity Programs was published by NFPA, the National Fire Protection Association.
Disaster Recovery Institute (DRI) International, a not-for-profit contingency planning organization
originating two decades ago, offers training, certification and advice for disaster recovery as well as other aspects of information security. They have published a best practice guide to business continuity and a comprehensive suite of professional practices in BC. In conjunction with Disaster Recovery Journal, they
maintain a useful glossary of DR-related terms.
The Be Ready Campaign advises “Prepare. Plan. Stay informed.” Part of the site covers readiness for small businesses with an excellent summary of the things you need to do to plan for continuity of essential
business processes, plus a more detailed glossy. If you have yet to make a start on your contingency
planning, or if it’s time for a back-to-basics review of the approach your organization is using, the Be Ready website is recommended reading.
A special-issue boardroom briefing on business continuity and disaster recovery from Directors and Boards magazine goes into some depth in issues such as DCP audit, governance and leadership for senior management.
FEMA, the US Federal Emergency Management Agency (part of the DHS, the Department of Homeland Security), offers contingency planning advice including a helpful checklist for items needed in your emergency
kit (grab bag).
Citizen Corps and PandemicFlu.gov concentrate on bird flu, the feared global pandemic of a mutant form of
avian influenza that attacks humans.
The US Environmental Protection Agency is also concerned about bird flu.
If a bird flu pandemic strikes, many more people than normal will opt or be asked to work from home. The
effect this will have on the capacity and reliability of home or mobile broadband and telephone services is uncertain but the DHS suggests that organizations should pre-register for priority services for vital home workers. Jump the queue if your job is judged vital.
The US National Oceanic and Atmospheric Administration and National Weather Service provide severe
weather warnings and information. Perhaps you’ll have a few hours to evacuate? Better still, use their statistics, along with those of the US Geological Survey to pick a safer place to live and work!
The Nuclear Regulatory Commission governs the US nuclear industry, doing all they can to avoid nuclear disasters.
The US Fire Administration has a kids page - a useful way to draw little’uns into the process of fire
avoidance/prevention, fire escape and contingency planning.
The US Department of Agriculture Forest Service Southern Research Station is a great place to find out why
wild fires are so dangerous in rural areas.
A CERT project describing various working definitions of “incident” and related terms in an IT context hints at
the complexity in this area.
The UK-based Business Continuity Institute is aligned to DRI. In conjunction with British Standards Institute,
it produced PAS56 which became BS 25999-1:2006 Business Continuity Part 1 - Code of Practice.
The Oops List is a collection of images of (mostly) aircraft disasters. Warning: these are truly graphic
images - not much blood and gore as such but undoubtedly passengers or crew were injured or killed in at
least some of them. A few look like fakes or set-ups but, subject to any copyright restrictions, they would make captivating slides for contingency planning/DR presentations.

Ross Campbell & Associates is a crisis management consultancy providing planning, training,
resources, risk and threat analysis to control and manage the worst case scenario. Ross is the author of Crisis control - preventing and managing corporate crises (available
secondhand via Amazon).
Contingency testing was the subject of an ISACA briefing.
Read the UK Department of Trade and Industry’s guide to Business continuity management - preventing
chaos in a crisis. UK resilience advises on disaster preparedness. Even Britain’s secret service MI5 is now offering public advice on business continuity.
The Definitive Guide to Exchange Disaster Recovery and Availability is a free eBook by Paul Robichaux
explaining resilience and DR techniques for Microsoft Exchange servers (you are supposed to register to download).
Business Continuity Guideline: A Practical Approach for Emergency Preparedness, Crisis Management, and
Disaster Recovery has been published by ASIS. ASIS focuses primarily on physical security, complementing many of the other resources listed here covering the information security aspects (such as ISSA) and governance (such as ISACA).
The Rothstein catalog on disaster recovery lists over 1,000 books, software tool, videos and reports on the topic.
Resilience engineering
Keeping the computer room working is the core role for IT Operations, requiring a level of resilience and contingency planning. If you’ve ever had to plan a computer room DR test, or if you are preparing to do so,
take a look at the article.
Disk drive manufacturers quote MTBF (Mean Time Between Failures) of around a million hours under ideal
conditions, suggesting a failure rate of less than 1% per year, but some studies show significantly worse performance (2-10% p.a. failure rate) in the real world. It seems the “bathtub” reliability curve has a sharply upward sloping or even stepped bottom, not the long flat period of stability often assumed. If your data are vital and their availability is critical, monitor drive age, error rates and temperatures carefully.
FMEA (Failure Modes and Effects Analysis) is an engineering discipline with potential applications in systems
engineering and information security.
Surviving natural & unnatural disasters
The US Agency for Toxic Substances and Disease Registry advises on chemical and biological incidents.
Among other things, US CDC (Centers for Disease Control [and Prevention]) advises on preparing for “climate change” - not “global warming” as such, oh no. It wouldn’t do to confuse those two, would it?
Anyway, whatever the (alleged) cause, extreme weather events are (allegedly) more (or less) likely so storm and tornado preparations are (potentially, if you believe the “hype”) A Jolly Good Idea.
The offices of Aenias Internet and Telephone, a Tennessee-based ISP, were totally destroyed by a hurricane.
The company had made contingency plans ... but unfortunately had neglected to take copies of critical backup media off-site. The on-site tapes proved unusable, although fortunately a specialist data recovery
firm was able to retrieve the customer database from hard drives rescued from the rubble ... several days later (hard lesson learned).
A space shuttle crew completed the ‘delicate task’ of removing ceramic fabric spacer strips protruding
between the shuttle’s tiles by pulling them away rather than using a makeshift tool. The tool, the
cutting/pulling instructions and indeed the whole Boy Scout response to this incident were themselves the product of a well-rehearsed contingency process that prepared those involved to deal positively with
whatever comes up (the Apollo 13 film with Tom Hanks is a popular case study on contingency and teamworking).
It is sometimes suggested that you should imagine all the possible disasters in order to prepare
comprehensive contingency plans ... but although this might be a fascinating exercise, it’s not actually very
helpful in practice. The word “contingency” implies “regardless of the situations we’ve planned for and the
controls we have implemented, something else will happen and/or something unexpected will go wrong”. Similarly, some people advise very detailed contingency plans: “Have a DR plan that is so detailed an average person could recover your systems without the IT staff” says Julian Morris, IT Director at
DraftWorldwide. Julian’s other tips include having pre-prepared templates and checklists, reliable communications mechanisms and structured plans (it may be useful to tell users “We are step X, your
system will be ready at step Z”).
A checklist from HHS and CDC covers just 35 items relating to pandemic planning and risk management.
Such a high level assessment makes a good starting point for management to review an organization’s state of preparedness.
The World Health Organization published reliable advice on the anticipated course and effects of a bird flu
pandemic, including a page with news on the latest outbreaks. Another definitive source of scientific, medical
and epidemiological information is the US Center for Disease Control. NERC explains the anticipated features
of a pandemic that should be taken into account in contingency/business continuity planning in a critical infrastructure context.
Contingency plans for terrorism should prepare the organization to deal with conventional attacks such as bombings and probably others such as chemical, biological or radioactive attacks.
The repercussions of 9/11 on contingency planning are explored in this article. A consultant specializing in
high-availability database systems recounts in Security Administrator eZine how he was called upon to help 5
of 12 clients in the World Trade Center recover after September 11th 2001. The other 7 clients no longer
existed as viable businesses. In the special circumstances of 9/11, stakeholders were impressed that any recovery was possible. Would you be able to access suitable contractors to help rebuild your organization if
you had to? Would you have access to contingency funds to start from scratch?
It appears that the increased terrorist threat, post 9/11, has prompted more organizations to make
contingency plans. Just over a third of firms surveyed said that the threat of terrorism was the biggest
reason for boards to assume responsibility for business continuity, whereas a quarter cited growing reliance on IT systems and 23 percent cited forthcoming industry regulations such as the Combined Code of
Corporate Governance. “The responses fly in the face of SunGuard’s own research, which suggests that
hardware failure is the main cause of business interruption, at 17 percent, followed by power outages at 13
percent.” [Regardless of the reasons cited, we would argue that contingency planning is a legitimate investment in risk reduction.]
In Disaster Recovery and Business Continuity Planning, the SEC specifies that financial institutions should
resume vital clearing and settlement operations on the same day as a major incident such as 9/11, ideally within 2 hours. This implies highly resilient systems with dual-live/multiply-redundant or hot standby
arrangements and significant investment in IT by the entire [US] financial services industry.
Other availability resources
The ISA website notes an ongoing project to develop ANSI/ISA security standards for SCADA (Supervisory
Control And Data Acquisition) systems used to control industrial machinery including large chunks of the critical infrastructure (e.g. power plants, water treatment works). Many old-fashioned SCADA systems pre
-date modern thinking on information security controls other than availability, perhaps: the reason old SCADA systems remain a problem is that many of them have continued running more or less unchanged for
decades. True information security requires a balance between confidentiality, integrity and availability.
What appears at first to be a simple news story about a systems overload caused by people downloading a
large video looks somewhat odd on closer inspection. It is reported that the systems concerned belonged
to the British armed forces. “Computer screens controlling British air defences and warplanes around the
world are reported to have gone blank for five hours” says the London Evening Standard. Um. Well maybe.
The BBC reported that two fake banking websites, only one of which was protected by a firewall, were put
on the web as a honeypot to attract and monitor hacker attacks. They were both attacked, of course.
Apparently, “more than a third of the attacks on the protected website were so severe that they crashed
the site and could have resulted in the loss of data”. [Call me a cynic but the fact that the ‘experiment’ was
funded by an ISP and a security firm hardly inspire me with confidence in the validity of their scientific methods ...]
The US State Department’s Consular Lookout and Support System (CLASS) for checking visa applications apparently taken out of service “by a virus” (actually the Welchia worm). Malware could hit any of us but it
would appear that contingency arrangements were simply not adequate to keep the service running or get it back in operation before the media picked up the news story. Any problems caused by the unplanned
service outage were compounded by the media interest.
For practical advice about performance monitoring on Windows systems, the Computer Performance
website includes a range of useful advice for managing Windows, including some interesting “Litmus test” best practice suggestions relating to information security.
Here is some simple advice on backing-up software and data. Backups are the primary corrective control
against system or data corruption and loss. Don’t forget to verify that you can restore successfully from
your backups, especially if you change equipment, software or configuration settings. [That little gem of advice comes courtesy of the School Of Hard Knocks, faculty of Once Bitten, Twice Shy].
Safety-critical systems are specifically designed for high-availability, but the risk of unplanned downtime cannot be entirely eliminated, even at a nuclear plant. The Register reported that the US Nuclear Regulatory Commission issued a formal information notice to the nuclear industry in relation to the Slammer worm incident.
Organizations in the throes of merging usually look to consolidate redundant or duplicate systems to cut
costs - but of course availability could be seen as a driver to retain the parallel systems, albeit with data interfaces and some means to reconcile differences.
Related NoticeBored links collections
Incident management, accountability, information security management, physical security, IT Operations, Bugs!, hacking, IT fraud and Internet security.
NB: we do not necessarily endorse or agree with the third party websites accessible through the links. Use at your own risk. Please let us know about new or broken links.
|