"Things to Consider Before Choosing a Primary Site Recovery Approach or Telecommunications Vendor"

by Randolph A. Fisher-CBCP


INTRODUCTION
More and more companies are revisiting the need for business continuity plans. The ability to protect and recover from primary data center failures is no longer a luxury but an absolute necessity for today's businesses. Whether you are a large global multinational or a mid-sized company, data has and continues to grow in importance as a corporate asset. For many companies, data is their lifeblood.

FEMA statistics indicate that the 1990s had a high incidence of natural and man-made disasters; the early 2000s seem to be following this trend. Within the last few years we have seen the World Trade Center disaster in New York City and continued concerns about terrorism. Besides the financial damage estimated to be approximately $400B in New York City alone, there are many intangible costs that have not yet been estimated.

A key learning point for all of us from the NYC WTC disaster is to include a clear delineation of management succession and emergency powers to allow timely decisions to help minimize losses, assess situations, and act toward resuming critical functions. Those that had this included in their plan were able to quickly resume near normal conditions those that did not were interrupted for days if not weeks. Some still have not recovered and others may never fully recover.

In today's global economy isolation of your internal clients and external customers from key corporate data sites could drive you out of business. This article is a primer on things to consider before choosing a particular recovery or protection approach or specific vendor.

THINGS TO CONSIDER
Safety and security of personnel always comes first. After that there are three areas of consideration:

  1. Financial,
  2. Vendor capability, and
  3. Secondary site attributes

Consideration Triad

Let's take a look at each member of this "Consideration Triad" in a little more detail…

Financial considerations are usually the first item on everyone's list after safety & security. Points to consider are:

  • What are you trying to guard against or recovery from?
  • What are the most likely risks?
  • How much do you have to spend?
  • What is the approximate cost of downtime or isolation to your business?
  • What will you need to spend?

That's why knowing the cost of downtime or isolation will enable you to determine how much to spend for recovery features. The more risk averse you are, the more you will need to spend. However, never spend more than the estimated cost of downtime. Practical rough estimates are fine. Six months of assessment or intensive Business Impact Analysis (BIA) isn't going to change the picture that much.

With the average cost of downtime across all industries within the US estimated to be $1400 per minute it is easy to understand why millions of dollars per hour could be lost and some businesses never recover from a severe outage. Besides the tangible cost of downtime there is the equally important intangible costs of isolation or downtime:

  • Lost business
  • Eroded market share & decreased investor confidence
  • Potential lawsuits

Remember the financial and medical industries have the 'lions share' of the industry and governmental regulations. Recent business scandals surely will keep the heat on and company executive boards under the microscope. Rather than seeing less regulation…I believe we will see more.

Vendor capability is the next item of the Consideration Triad to investigate. Available disaster recovery features run the gamut of sophistication, flexibility, speed and functionality. Recovery times of 24 hours were commonplace just a few years ago. Today end users expect circuit recovery within minutes. Many carriers have similar feature names with very different capability. Be careful; know exactly what you are subscribing to and the limitations of each of the features that are available. Business Continuity & Disaster Recovery (BCDR) is a culture, a philosophy, a mindset, a way of life. Some carriers have dedicated disaster recovery trained and certified technicians while some carriers direct you through their normal maintenance or provisioning trouble queue. For that reason, redirections can take minutes or hours depending upon your vendor's approach.

Be sure your vendor has:

  • A robust set of features
  • Dedicated disaster recovery trained personnel
  • Perform nightly unobtrusive proactive circuit readiness tests
  • Features that are easy to use
  • Secondary path performance that matches the protected primary paths
  • Quick recovery times

Bottom line…be sure your carrier has a robust set of features that will allow all of these capabilities. No one size fits all. Forcing square pegs in round holes rarely works.

What type of services does your vendor provide? Is their backbone network robust and regularly upgraded? Does your carrier have a single point of contact for all of your HSPS disaster-recovery needs? Does the carrier have trained and certified disaster-recovery professionals, and regularly scheduled disaster-recovery drills?

Customers should demand that their network provider have demonstrable disaster-recovery expertise and a proven track record. Ask if you can attend one of their quarterly mock-disaster drills. Additionally, you should ask if their personnel are dedicated and disaster-recovery trained and certified by the Disaster Recovery Institute International (DRII).

Disaster recovery is all about relationships. The relationship between the customer, the telecommunications carrier, and the recovery site provider is crucial. It could be thought of as a three-legged stool. Any one party not fully understanding their responsibility or end-user expectations can negatively impact the ability to recover. Check to see if your providers have Quality Improvement Teams (QITs) to work out operational issues that will ultimately lead to better service to you the end user.

Finally, be sure your carrier has the robust capabilities and expertise that will allow you to activate multiple simultaneous scenarios or have hybrid configurations between Private Line, High Speed Packet, and IP networks. Many networks now employ all three layers of capabilities.

Secondary Site attributes is the final item in the Consideration Triad that must be decided upon:

  • Do you have your own backup site or will I subscribe to a third party vendor?
  • Do you want a dedicated port or a shared port at the backup location?
  • Are you guarding against local or more wide scale failures?
  • Is permanent or temporary connection to the backup site required?
  • Do you want to reuse bandwidth?
  • Do you want predefined recovery actions or 'on-the-fly' redirections?
  • Do you want to move all or portions of your network?

If you require the flexibility of directing your data traffic to one of many backup data centers to guard against large regional geographic failures, then subscribing to a third party vendor referred to as Disaster Recovery Vendors (DRV) should be considered since they have many geographically diverse sites.

Use of a DRV is more economical since you have the advantage of using their shared secondary site port and local access channel circuit. Additionally, the backup locations are professionally staffed, maintained, and always ready.

One of the very key decisions one needs to make is whether they require dedicated components or shared components at the secondary site. Although this has been a standing industry concept for approximately 12 years, there are still users and even DRVs confused about the differences of dedicated ports versus shared ports.

Dedicated ports are just that …a port dedicated for your personal use. With dedicated ports, coordination is minimized since the access port and access channel is in place for immediate use. Hence there is no source of contention of this hardware. Dedicated ports tend to be about seven times more expensive than a shared port.

The vast majority of customers do not own their own back-up site because of the expense and personnel expertise levels required to maintain its readiness. The majority of users employ the services of a third party's back up site commonly referred to as a Disaster Recovery Vendor (DRV). Customers rarely purchase dedicated ports or access to a DRV. Most use the 'shared port and access' owned by the DRV. Shared ports allow multiple users…but only one user at a time.

Use of this shared port requires closer coordination between the end-user and the DRV. There is potential for some contention for these resources during a wide-scale geographic natural disaster. If you need a guarantee of access to the port, subscribe to a dedicated port. Most users find that shared ports are more than adequate and are much more economical since most BCDR activity in the US is testing and not recovering from a major catastrophic failure.

RECOVERY APPROACHES
The paradigm has indeed shifted. Rather than installing duplicate static circuits, from the remote locations to both primary and secondary sites, many users are now subscribing to circuit redirection capabilities. This is more economical and flexible and allows you to recover from regional geographic and local area failures. These secondary PVCs are activated only during a test or response to a real disaster situation.

There are two separate schools of thought; some prefer recovery on-the-fly while others prefer a predefined recovery action plan. My years of experience indicate that most people feel that developing and maintaining a predefined recovery action plan is well worth the effort. None of us are clairvoyant as to when or where a disaster will strike or which personnel will be on duty. Having the predefined recovery plan will reduce or even eliminate impromptu decisions by inexperienced personnel or seasoned veterans that could lead to serious errors when they are made under stress.

Be sure you understand what you are subscribing to and the redirection intervals being quoted by the telecommunications provider. It is a good idea to benchmark one IXC against another to see if actual performance meets the promises being made to you. Ask for a demonstration on a few of your circuits.

RECOMMENDATIONS
Smart people learn from their own mistakes, smarter people learn from the mistakes of others, and the smartest people learn from the collective wisdom of the industry. Many mistakes can be avoided with a little foresight and good judgement.

There are no hard and fast rules that dictate which technology or approach is better. No single answer can cover all possibilities. Know what you are protecting, its value, its criticality, and the recovery time frame required.

Finally, remember the Consideration Triangle: financial factors, vendor capabilities, and site requirements. Assess the risks for each of your key network components. The more risk-averse you are, the more you will need to spend for additional reliability. However, the cost of the recovery tools should not exceed the cost of downtime.

The solution you select should be based on a number of criteria:

  • Application-technology fit,
  • Recovery interval,
  • Cost,
  • Ease of use,
  • Complexity of set-up and
  • Flexibility.

About the Author
Randy Fisher is Product Manager of AT&T Bandwidth Management (ABM) and AT&T Frame Relay Disaster Recovery Option (DRO) Services. For more information on business recovery for data network management, contact Randy at (908) 234-4655.

For additional information see:
Meet the Pros, Randolph A. Fisher