Disaster-Recovery Planning for Telecommunications Security
Disaster-recovery planning is a corporate necessity in any location.
The recent Kobe earthquake has brought the matter to the forefront of attention,
but other causes of disaster could cause equal if not greater impact to
your business ó fire, flooding, or wind damage from a typhoon, for
example. Massive earthquakes are attention grabbers, however, so with dire
predictions that "the Big One" will hit the Kanto area sometime
during the next decade, Tokyo-based IT managers are probably in a better
position than ever before to make the case for disaster-recovery planning
and expenditures to their management.
by Thomas Giuffre
Advancing technologies and new services have changed the disaster-recovery
landscape. New building designs can better withstand the vibration and shock
of earthquakes or the spread of fire, and improved telecommunications infrastructures
provide the cautious information technology (IT) manager with options for
circuit diversity and alternate route backups. On the other hand, today's
businesses more than ever depend on information systems and telecommunications.
The risk to businesses that do not plan effectively for a disaster can be
enormous. A once-thriving business can utterly fail during the days or weeks
required to recover normal operations.
Business managers often fail to recognize the extent to which their business
depends on reliably functioning technology. As the complexity and transaction
rate of a business process increases, so too does the risk and potential
impact of an extended outage. In the stocks and securities business, for
example, where systems are designed to give traders an edge in terms of
minutes or seconds, an outage of an hour or more is serious. An outage for
a day could be catastrophic.
It is only after conducting a thorough business impact analysis that the
full picture of the interrelationships and complex dependencies between
the business and the technology becomes clear. But the planning process
is complex. For medium- and large-scale enterprises, it requires consideration
of myriad elements of vital business processes and the technology systems
that support or execute them. Because disaster-recovery planning is by necessity
a complex and detailed process, it should involve all of an organization's
management.
An important part of the process is education. With information technology
becoming increasingly widespread, business managers are often unpleasantly
surprised by the degree and extent to which their operations could be impacted
by a technology system failure. And because modern businesses must stay
in close communication with other organizations, customers, and associates,
the dependency on telecommunications is critical.
The difficult task of planning
For a foreign enterprise, the difficult process of disaster-recovery planning
becomes even more complex. Foreign enterprises necessarily must view the
problem in a global context, because national and local standards, regulatory
policies, and matters of national defense differ from country to country.
In this sense, local IT managers need to understand the particulars of Japan
as well as their corporate headquarters' policies. Being a foreign company
adds an additional disadvantage: During a national disaster, there will
be a general procedure that is followed to bring services back on-line ó
and your enterprise is probably not at the top of the list. The local MIS
(management information systems) manager must be prepared to deal with the
situation.
Disaster-recovery planning for telecommunications is a component of the
entire business recovery planning process. It requires special attention,
though, because of dependencies on technology systems that are beyond the
immediate control of the enterprise. Lease-line circuits, satellite links,
store-and-forward data services, data banks, and real-time data feeds: all
of these are technology services for which your company is dependent on
outside vendors.
Probably you have never visited these vendors' operation facilities to see
how they are set up, yet your business success is contingent on the reliable
delivery of their services. To prepare your own telecommunications disaster-recovery
plan, it is a valuable exercise to understand how your carriers and data
service vendors are set up to operate, what type of contingency plans they
have in place, and to what extent your service agreements cover events at
their facilities that can impact your business.
This is the area where most contingency plans fall short. Inexperienced
planners will do the obvious by backing up a leased line and requesting
circuit diversity, but they may not bother to find out that the backup line
is part of the same cable, or that it terminates into the same central office.
If that cable is cut by a careless construction crew, for example, both
the primary and the backup link are out of service. And if both circuits
of your mission-critical link go through the same central office, a fire
there could knock out your service. While carriers are generally eager to
help their clients draw up viable contingency plans, it is your responsibility
to lay out your needs and ask the right questions.
Secure telecommunications
Telecommunications carriers are generally eager to brag about how well organized
and prepared for disaster they are. Reliability, after all, is the life
blood of all carriers. Carriers in Japan are relatively open to questions
about their facilities; managers dealing with this issue should ask some
tough questions about a provider's service-level capability and backup readiness.
Most carriers include in their literature a description of facilities locations,
backup systems, marine cables and satellite earth stations, as well as the
interconnections among them. There are some things to watch out for, though.
If you use a Type II carrier in Japan, for example, you may encounter an
NTT subcontract for the use of NTT cables to link your facility to your
carrier. Since NTT owns most of the infrastructure in Japan, all of the
Type II carriers lease NTT circuits. This detail is generally hidden from
the end user. MIS and network mangers should ask to see a schematic diagram
of the cable route from the carrier to their building. (Note especially
the location of sub-stations and central office facilities, and who owns
them. If the primary and backup sites are in the same city ó or worse,
within a few kilometers of each other ó think seriously about earthquake
integrity.)
For organizations with multiple international circuits running mission-critical
applications, circuit diversity should probably take the form of alternative
carriers. Using KDD and one of the Type II carriers in combination can be
an effective method for reducing the risk of impact to your business, both
from widespread disasters and those isolated to a single carrier. While
I do not endorse the use of KDD per se, it does have the most extensive
and sophisticated network facility in place in Japan. KDD once had a national
mandate to be the international service provider for Japan. After deregulation
in 1986, KDD became semi-privatized, and Type II carriers entered the market
in competition with KDD. While KDD is still the most expensive service,
it maintains numerous points of entry into Japan via both marine cable and
satellite. Most of the Type II players lease circuits from KDD directly,
but they do not enjoy the full range of circuit diversity and alternate-path
options that form the KDD infrastructure.
One value of working with KDD is its Plan-H and Plan-M services that allow
businesses to locate all or part of their telecommunications equipment within
a KDD facility as either primary, or backup, or both. Customers also have
the option of placing their own staff at the KDD site or contracting KDD
personnel for maintenance and network management tasks. Other carriers offer
these types of services to varying degree, but generally not to the extent
that KDD does. KDD is also exploring the expansion of this service to include
EDP (electronic data processing) functions.
Aside from the obvious benefit of outsourcing a highly technical task to
a skilled operator, these types of services offer the additional value of
being able to locate your corporate communications equipment in purpose-built
facilities. The building codes of such structures must meet substantially
higher levels of structural integrity and resistance to disruption. Your
office building almost certainly cannot match these standards. (And can
you be certain that your facilities' backup equipment is properly maintained
and tested?) Ask your carrier what services of this type it can offer, and
then determine what level of support is appropriate for your business.
Coping with disaster
By looking at recent events in Kobe, we can learn some lessons about effective
planning, readiness levels, and what to expect if a similar event happens
to us. The Kobe earthquake presents a rare case study for evaluating the
integrity of various technology implementations, including commercial building
codes, power and telecommunications cabling infrastructure, and cellular
telephony infrastructure. Experts will be collecting and analyzing data
from the disaster for months to come, but some preliminary lessons can be
discerned from the experiences of Reuters and its customers.
Reuters, which is one of the world's largest providers of data and information
services, is one provider that takes disaster-recovery planning seriously.
Businesses that use Reuters data depend on its timely and reliable delivery.
According to Geoffrey Flynn, managing director of Reuters Japan, the company
implements numerous steps to ensure that its customers receive reliable
and secure service. Reuters takes a proactive approach and integrates recovery
planning into its basic business model to substantially reduce the likelihood
of a severe outage.
Nine of ten Reuters' customers in Kobe experienced brief disruption of service
connections. The subscriber that did not was using Reuters' Small Dish Service
to receive its feed via satellite. This subscriber was operating off a small
island in the bay area near Kobe, so its telecommunications were supported
by microwave line-of-sight (LOS) links and power was backed up locally with
generators. These precautions minimized the impact to business operations
of the enterprise.
The Reuters' data operations center (known as the MTC, or Main Technical
Center) in Tokyo was constructed to precise specifications ó purpose-built
to meet stringent standards similar to a carrier's facility. While Reuters
does not own the building, they fully occupy it and have established a close
working relationship with the building management to ensure that contingency
systems are available. Another facility, across town, serves as the backup
site and subscriber data depository. These two sites are a reasonable distance
apart, though one can argue the case for having a site outside Tokyo.
For Reuters, this means out of the country, in Singapore (site of another
regional MTC). This may seem extreme and impractical for small- to mid-size
organizations, but if your business has at least one leased-line link to
another country, it is possible and practical to have your critical data
backed up in this way. A well-engineered link can utilize a combination
of powerful technologies, such as frame relay and ISDN, to supply an aggregate
emergency bandwidth many times the normal value. And since these types of
technologies are connection-oriented, the cost of maintaining contingency
capability is near zero.
Aside from Reuters internal use of technology to mitigate the risk of system
failure, the company offers several services to its customers to enhance
the reliability of service delivery. One of these, the Small Dish Service,
relies on a dedicated satellite channel to broadcast Reuters data to subscribers
with small roof-mounted parabolic dish antennas. Reuters has leased a channel
on JSAT-1. Currently, this is a low-bandwidth service and does not provide
for full subscriber support, but an enhanced version of the service is scheduled
for this summer that will provide for full subscriber support in addition
to Reuters Financial Television. One limitation of the technology is that
subscribers cannot interactively subscribe to additional data, as they can
over the traditional leased line subscription service. Subscribers can continue
to trade in most respects, however, thus limiting the impact of an outage
isolated to the domestic carrier or nearby vicinity of the business. There
are, however, some exciting technologies that will alleviate such problems
altogether very soon, and you can expect a proactive organization like Reuters
to aggressively deploy these technologies to maintain its competitive quality
and value of service advantages.
NTT takes it well
Among telecommunications carriers, NTT suffered the greatest damage to its
facilities from the Great Hanshin Earthquake. This was only to be expected
given the proportionally extensive amount of infrastructure that it operates.
KDD, ITJ, and Sprint all reported no damage to their facilities in the Kobe
and Osaka areas. Although these international carriers were able to provide
service, many of their customers suffered disruptions because of the presence
of an NTT cable in the local loop.
The degree of damage to cellular telephony differed by service provider.
Again, NTT maintains the most extensive coverage in the area, and it suffered
the most notable damage. The NTT network seemed robust, though, with many
users relying on cellular phones for basic communications during the initial
hours after the quake hit.
While NTT reported losing only six satellite communication dishes, facilities
damage to the main switching office effectively downed the remaining 163
uplink stations covering the area. (The reports revealed that backup power
generators were not able to come online due to cooling systems damage.)
However, the speedy recovery of basic communications services is a testament
to the readiness level of NTT to deal with the event. While the company
has taken a media bath for the initial downed communications in the Kobe
area, most other carriers have praised NTT and the company's response to
the disaster.
Looking to the future
Personal communications services (PCS) terminals and the INMARSAT mobile
systems satellite service will enable businesses to enjoy full-duplex transmission
of voice and data traffic, just as they have over leased circuits in the
past. These technologies have been around for some time, but regulatory
issues in Japan (and other countries) have restricted their use. The continuing
deregulation of telecommunications services will usher in new and broader
applications of the technology.
Contingency systems will be one of the early applications. In Japan, the
regulatory policy of the Ministry of Post and Telecommunications (MPT) currently
restricts the use of INMARSAT technology to businesses that are clearly
mobile in nature (such as the shipping and aviation industries). Subscribers
to the INMARSAT service must hold a license in order to operate the mobile
terminal systems. KDD and several of the other international carriers are
expected to offer PCS service sometime during 1997 or 1998.
There are two factors to bear in mind regarding telecommunications services.
First, public policy issues and the regulatory policy of the MPT ensure
that telecommunications facilities are built to substantially higher standards
than ordinary commercial buildings. National law requires all carriers to
obtain licensing that includes, among other things, compliance with stringent
building codes. This helps to maintain a quality assurance level not always
observed in other countries or industries.
The second point is that carriers have to maintain the image, especially
for basic telephony services, that their network is available 100% of the
time. Naturally, equipment breaks, circuits fail, and human errors occur,
but the carriers go to great lengths to ensure that these localized events
are transparent to their users. (It is likely that some of the damage suffered
by carriers during the Kobe quake ó a damaged sub-station or two,
localized power loss, microwave relay towers brought down ó will
never be reported fully). The extent to which a carrier can isolate its
customers from outages and local disasters, and minimize service disruptions,
is a testament to the carrier's disaster planning and ability to respond
to events as they occur. Disaster-recovery planning is a process needed
by all businesses, regardless of their technology level. Technology merely
complicates an already difficult task. If you manage technology or business
functions within your enterprise, you should be aware of all existing contingency
plans. When was the last time you dusted off those documents and had a good
look? (If you can't remember, then it is too long.)
If your employees were prevented from entering your office tomorrow, could
you adequately recover your business functions? What steps should you take?
If you expect to stay in business, you can't afford not to know all the
right answers.
The phases of business impact analysis
Analyze the business environment
* Clarify vital business processes and their supporting applications.
* Identify interim disaster impact-reduction measures.
* Raise corporate level of disaster planning awareness.
Assess the processes and applications
* Determine current recovery status.
* Specify IS environment in which each application functions.
* Identify application recovery challenge.
Determine anticipated business impact if the process cannot function
Prioritize application recovery
* Determine business recovery requirements and individual application recovery
priorities.
* Specify each application's data requirements (e.g., data currency, data
loss, and catch-up workload).
Analyze the probable impact
* Develop aggregate definition of enterprise impact.
* Identify feasible recovery options.
* Form a consensus among management and business process leaders on assigned
criticality level, acceptable level of residual risk, recommended recovery
model, and needed level of readiness.
Develop a workable business recovery plan
* Define a disaster-recovery strategy and its implementation steps.
* Develop a step-by-step business recovery plan.
|