Now, better DR through virtualization has moved front and center. There continues to be growing interest in VMware’s Site Recovery Manager, as well as other vendors who have updated their product sets to incorporate better virtualization support. This includes recent product updates from XOsoft and NSI Double-Take.
What wave, you say? A review of our 50 most recent virtualization projects found that nearly 90% of clients that have virtualized their main systems have rolled some level of the technology into their DR plans.
One interesting trend: folks who virtualize their disaster recovery environments but not their primary networks. Virtualization enthusiasts may scoff, but this makes sense on a few levels. It’s much less expensive than building a standard DR site. It’s a great way to introduce virtualization and build up internal skills without affecting the production network. And it skirts the pesky issue of vendors that don’t officially support virtualized versions of their applications.
A midsize private school in Rhode Island took this route. After a disaster befell the institution, IT opted to fill a hole in its DR plan using a virtualization appliance rather than virtualizing the production network. The creator of an accounting application critical to the school doesn’t (yet) support running the app in a virtualized environment. We proved that the software would run in a VM and set it up at the DR site. When the vendor eventually adds VM support, the school is a step ahead.
SIZE MATTERS
For organizations that have already virtualized their servers, the challenge is deciding on the size of the DR site configuration and setting a failover level. If you’ve
just implemented virtualization, you’ve likely got a bit of a budget windfall (which will disappear quickly once the CFO figures out what’s going on) and some unused but perfectly functional gear. So if your production environment is 275 virtual servers running on 30 physical hosts with 10 TB of data on a SAN, do you need the same capacity for your DR site? Or can you whittle down to fewer servers to save money?
How do you decide? You don’t. This is one place IT must get business leaders involved. Push the DR plan back to the CEO and COO for clarification on anticipated
usage. That will be a major factor in decisions on bandwidth, host servers, and storage configuration.
Any disaster-recovery plan needs to define mission critical access parameters. A manufacturing or construction company that relies on IT systems for backoffice support may opt to build the DR site to support half the normal usage and workload, giving priority to department heads or critical apps. If you run customer-facing, mission-critical data on your systems, you’re going to want matching configuration and associated
bandwidth.
What happens if you undersize your DR site? A major benefit of virtualization and SAN usage is the ability to quickly expand the supporting hardware infrastructure. Let’s say you have a 10-node host server cluster at the home office. Your DR site holds five servers that would support the main office in the event of a disaster. Your plan can include a provision that if a failover lasts for more than 48 hours, you would add host servers to improve performance and spread the load.
MAKING THE IMPOSSIBLE POSSIBLE
It doesn’t matter what stage of virtualization your organization is currently in—if you have server virtualization, you’ve got the basic elements of a cost-effective disaster-recovery plan. Separating physical hardware from logical servers and applications means those ultimate objects of CIO desire, namely tight recovery point objectives (RPOs) and recovery time objectives (RTOs), are finally attainable, and without gutting the budget. Digging in and defining RPOs and RTOs for every major piece of data within your organization will set the stage for your overall DR plan as well as daily backup, replication, and archiving operations. Get consensus on what objectives make sense based on estimated costs of lost time and data. Not only will this exercise sharpen the focus on DR planning, it helps justify the expenditure.
Next, you need a data replication/backup plan that integrates with your virtualized environment. Virtualization is about servers and applications, not specific data. If you’ve done server virtualization but don’t have a SAN, you’re missing out on some of the built-in replication and snapshot features that will help flesh out your strategy. Finally, make sure your current backup or continuous data protection software supports virtualization backup, data replication, and remote restores. Major vendors, including CA, Symantec, and Tivoli, have expanded or retooled their product lines to provide better functionality for virtual data protection.
JUST DO IT
Typically, virtualized disaster-recovery environments fall into four categories: cold standby, remote site, hot site, and nonvirtualized office.
Cold standby is the most popular. Saving raw data to tapes and shipping them off-site is simple and is an easy check box on the “in case the building blows up” checklist. Don’t stop there though—if there were a disaster, you’d have to rebuild your infrastructure, restore data, and get everything back online. So rather than saving just data, backup tapes should now include relevant virtual machines. Typically, we see clients take copies of their base virtualized servers, set them up at the DR site, then power down. Changes to databases and data volumes are either replicated or transferred to the site. Failover would involve booting the virtual servers and restoring relevant data from tape or disk.
A midsize distributor in the Northeast made good use of the spare equipment that came as a result of its virtualization project by building cold standby servers. Rather than simply changing tapes, the company set up a smaller version of its entire network, using servers freed up by virtualizing. The gear was configured, loaded with the company’s core VMs, then shut down and shipped off-site. Data tapes are still rotated on a regular basis, but the company has added a “refresh” plan for updating server images on a biannual basis. It already had the base servers, so costs were limited to additional memory, configuration effort, and licensing.
Many companies are looking at remote offices as “found” DR sites that make the most of existing infrastructure and telco costs. For example, a New York investment
firm retooled its DR plan based on its virtualization and storage project, which employed CA’s XOsoft replication app in the VMware environment for servers and replicated core data using an EqualLogic SAN array. The result?
“We were able to create a complete failover configuration without having to move toward a hosted site,” says the firm’s IT director. “This included replicating all core data and building redundancy into our Domino mail system.”
One concern if you’re pursuing this strategy is bandwidth. The financial firm has a 100-Mbps link between offices that are open only during normal business hours. This gives IT plenty of time for replication and updates. If you’ve got a small pipe between sites, don’t expect to replicate gigabytes of data overnight. In theory, a clean T-1 can push 550 MB to 650 MB per hour, but that’s best case and leaves scant room for error when moving differential data of 40 GB or more a night.
Hot sites generally are maintained only by publicly traded companies, especially financial firms, and those who are truly paranoid and flush with cash. Determining capacity of a hot DR site ties back to RPO and RTO mandates. If your recovery point is zero data loss and recovery time is less than 10 minutes, you’re looking at a major investment at all levels, including bandwidth (fibre is a must) and replication (specialized failover, plus data replication, plus database and mail-specific specialty apps). This is where we’re seeing the strongest interest in VMware’s Site Recovery Manager. Introduced late last year, SRM essentially scripts the entire process you’d go through when failing over a site. It’s not a replication system—you’d still need to use your SAN’s replication utility or a third-party product like CA’s XOsoft or Double-Take. But SRM takes over whenever there’s a major outage, essentially managing the cutover of your servers between locations. The system even has a neat feature to print out a DR run book outlining your plan—auditors love that stuff.
There are a few items to make sure you factor into your design: virtualized servers only, of course, plus there’s currently no support for replication of raw data LUNs on a SAN.
STARTING SMALL
For organizations that aren’t virtualized yet want to use virtualization technology for DR, several vendors offer appliances that leverage virtualization and inexpensive storage to build out a cost-effective DR system. One example is PlateSpin’s Forge product, which
has grown in popularity as an appliance alternative to building a mirrored (or smaller version) of your production environment. The system is built on VMware’s virtualization engine and leverages PlateSpin’s conversion and recovery software. Replication is configured based on bandwidth and desired recovery windows. The boxes come in two sizes with the ability to support 10 or 25 servers. The $50,000 price for the 25-server edition seems steep, but not when you consider that it comes with 2.5 TB of storage and 16 GB of RAM, includes all virtualization licensing, and supports virtualized and nonvirtualized boxes. Nonvirtualized servers must be configured with PlateSpin’s converter and will require some tweaking of images and settings.
A few caveats: the devices currently support Windows servers only. However, given PlateSpin’s recent acquisition by Novell, we expect Linux support in
short order, maybe even a physical-to-virtual utility for NetWare servers. In addition, your investment doesn’t buy bandwidth enhancements or speed improvements versus replication between servers, and finally, PlateSpin doesn’t address all SAN storage or database replication.
MAKE YOUR CASE
Money talks, so we ran some numbers comparing the cost to set up virtualized and standard DR sites for a network with 30 servers and 4 TB of storage capacity,
assuming the main production network isn’t virtualized. The lion’s share of savings comes from server hardware. In a standard setup, mirrored configuration of the main facility, requiring like-for-like hardware, would run $150,000. In a virtual scenario, we’d use larger servers with additional memory and processors. Target consolidation is 10-to-1 at a cost of $30,000, for a net savings of $120,000. We tallied software licensing costs at $100,000 for the standard site and $120,000 for the virtualized site, to account for core licenses plus virtualization and associated backup software. Storage capacity is a wash. We assumed standard site engineering at $40,000, then doubled that for the virtualized site to account for the additional work required to perform physical-to-virtual conversions and create an update plan. Even without the long-term utility and other savings represented by maintaining fewer physical servers, in our example scenario, we find the virtual setup costs $60,000 less than the standard site.
For more information please call GreenPages 800–989–2989.
Copyright © 2008 GreenPages Technology Solutions. All rights reserved.
Privacy Policy.
GreenPages Technology Solutions | 33 Badgers Island West | Kittery, ME 03904