Key Benefits:
- Reduced backup storage capacity requirements — Realize up to 98 percent reductions in cumulative back end storage and up to 90 percent faster backups or restores.
- Flexible deployment options — Leverage data deduplication at the source or target to best meet your specific requirements.
- Lower shared server resource impact — Achieve up to 50 percent increase in server consolidation through more efficient VMware backup.
- Reduced bandwidth utilization — Reduce infrastructure impact by moving up to 98 percent less data than traditional methods.
- Policy-based data deduplication — Optimize your backup performance by configuring policies to deduplicate data immediately at backup or scheduled for a later time.
- Long term data retention at the cost of tape.
Virtualization of servers is rapidly and profoundly changing server deployment, power and HVAC planning, and data protection strategies. The whole nature of application and data recovery changes when your physical server is no longer a consideration, and your application server is now a file or set of files located on a SAN-connected disk array.
Replication, snapshots and images now become ways of protecting all of your servers without relying on the old methods of packaging and copying that data, and the target for these copies are changing as well. The old ways of backing up data from each individual application are being shoved aside as technologists look at backup requirements with a whole new view point.
Why All the Buzz About Data Deduplication?
The amount of data created and under management is growing rapidly. What is frequently overlooked however is that the backup footprint, with dailies, weeklies, monthlies, quarterlies, annuals, fulls, incrementals, differentials, local AND remote often represent as much as 30 times the capacity of your primary storage footprint. Compounding that is the fact that regulatory compliance or IT governance initiatives are requiring companies to keep more information for a longer period of time.
IT is increasingly enjoying the benefits of server virtualization. Workloads flexibly move within your infrastructure, and physical H/W utilization rates have improved dramatically as virtual to physical consolidation ratios improve. This creates new and interesting challenges:
- When new servers can now be created in an instant with a point and a click, how can I ensure they are associated with the correct backup policy?
- How do I manage backup schedules across constantly changing infrastructure workloads?
- How do I make complete physical backups using scarce shared resources?
- How and when should I backup?
Where Is Deduplication Being Used Today?
Backups are the sweet spot for deduplication technology today. Why? It is, by its nature, an inefficient process that involves repetitively moving mostly the same data again and again. There are a variety of approaches to deduplication for backup including deduplication at the source and at the target—both of which have their usage “sweet spots.”
Archiving is another use, in the sense of removing static or stale data from the primary storage to a secondary tier of storage. The more effectively an archive platform can reduce or deduplicate data, the more cost-effective it becomes for long term retention.
Deduplication of primary storage is the goal of many storage technologists now. This would allow very efficient use of storage, enabling longer storage of data for longer periods of time on the same primary storage footprint. When this can be done unobtrusively, without significantly impacting application performance, it means that you can reduce your capacity needs, lower storage acquisition costs, and wait longer between capacity expansions. The big issue here is “without significantly impacting application performance.” This is difficult to achieve with the best of planning and configuration, and is not possible with very demanding (high I/O or transaction rates) applications.
Deduplication Defined
“The process of detecting and identifying the unique data segments within a given set of information, enabling the elimination of redundancy when stored or moved.”
The deduplication process uses well understood concepts such as cryptographic hashes and content addressed storage. Unique segments are stored along with metadata needed to reconstitute the original data set. By running a deduplication algorithm, these internal similarities are detected and identified as common segments. These common segments represent the essential set of unique information, reduced to its minimal size. Different methods can be used for this deduplication, depending on the technology and vendor.
What Factors Impact Deduplication Ratios?
- Type of Data: Office files have higher deduplication ratios.
- Encrypted and compressed data are not ideal candidates for deduplication.
- Data Change Rate: Less change equals higher deduplication ratios. Small data change rates will result in large amounts of duplicate data in subsequent backups.
- Retention Policy: Longer Retention Polices equal higher deduplication ratios
- Full to Incremental Backup Ratios: More Full Backups equal higher deduplication ratios
What Do You Mean by Source-Based vs. Target Based Deduplication?
Source-based:
This refers to using technology that deduplicates the data at the source, where the data is being generated or stored. The technology we are reviewing here that uses this approach is Avamar, which has a host agent deduplicating data on the server to be protected, sending that deduplicated data over the network to a central backup server and data repository.
Target-based:
This refers to using technology that deduplicates data where the copy or backup application is writing to. Backup targets can be tape or disk. Deduplication technology exists that can write deduplicated data to disk or tape, but this primarily refers to disk. Data Domain is one of the best examples of this type of deduplication technology today.
Two Technologies, One Company: EMC Data Domain and Avamar
Our experience with deduplication and backups goes back to before Avamar or Data Domain were purchased by EMC. Both have been and are viewed as leaders in their respective areas of technology. They now are both being managed and integrated into storage and virtualization technology at EMC.
With these two technologies, you can, in EMC’s words:
- Retain more: 10-30x data reduction eliminates the use of tape for operational recovery
- Replicate smarter: only move deduplicated data offsite 99% bandwidth efficiency and cheaper DR
- Recover reliably: leverage end-to-end data verification to insure data recoverability
Data Domain: Inline Deduplication Storage Systems
Inline deduplication performs the data reduction step prior to writing the data to disk. This is the preferred method in our opinion, as the total amount of disk required is greatly reduced. Data Domain sizes the appliance line so that the processing power matches the rate at which data can be taken into the system, allowing for reliable rates of performance.
Data Domain can be put into an existing backup architecture simply as a new target, changing nothing except that in your systems.
- Supports backup and archive software
- Backup Software: NetWorker, Symantec, CommVault, IBM TSM, …
- Application utilities: Oracle RMAN, SQL Server, …
- F5 ARX file virtualization
- Archive: SourceOne, Symantec Enterprise Vault, Mimosa, …
- Data Domain Retention Lock software option
- Supports any protocol
- SAN: VTL software option
- NAS: NFS, CIFS
- Custom: NetBackup OpenStorage software option
- Scalable for Local and Distributed Recovery
- Up to 5.4 TB/hour
- Up to 71 TB addressable capacity per system
- Data Domain Replicator software option
- Advanced dedupe architecture for high speed & resilience
- Stream Informed Segment Layout (SISL) scaling architecture
- Data Invulnerability Architecture
Avamar: Deduplication Backup Software
EMC’s Avamar is a complete backup and recovery software and hardware solution. Avamar’s unique global, source data deduplication technology eliminates redundant backup data that is sent over the network and stored. Avamar deduplicates at the source and across sites and servers. Avamar has a complete management console to manage all scheduling, policies, retention and recoveries, and has several deployment options for flexibility – Avamar Data Store with Avamar software, Avamar software and Avamar Virtual Edition for VMware environments.
- Integrated software & hardware solution with global source-based deduplication
- Deduplicates across sites and servers globally
- Effective full backup every time
- Single step recovery
- Backup process reduces data sent over the network and stored
- Variable-length subfile segments for optimal deduplication
- Integrated high availability and reliability
- RAIN for high availability and fault tolerance
- Avamar server and data recoverability verified daily
- Replication between servers
- Flexible deployment options
- Avamar software
- Avamar Data Store
- Avamar Virtual Edition for VMware environments
Avamar Virtual Edition for VMware
A quick picture of a virtualized infrastructure using a consolidated backup strategy (VCB):
Avamar using the VCB strategy:
Avamar Agents and Guest Machine Backups:
Avamar Virtual Edition: Avamar Software Deployed as a Virtual Appliance:
INDUSTRY’S FIRST DEDUPLICATION VIRTUAL APPLIANCE FOR BACKUP, RECOVERY, AND DISASTER RECOVERY
- Leverages existing servers and storage
- Can utilize existing iSCSI, SAN or DAS disk storage
- Replication (of applications and storage) eliminates shipping tapes
- Replicate between virtual Avamar servers and physical Avamar servers
- Facilitates rapid, cost-effective deployment and return on investment
- Supports VMotion for deployment flexibility
- Up to two Avamar Virtual Edition virtual appliances per ESX server for scalability
- File level backup with Scaling!
The Latest and Greatest
The 4th quarter updates of both Avamar and Data Domain deserve mention here.
Avamar
- Avamar 5.0
- One solution protects data center, remote offices, and desktops/laptops – simply and affordably
- Extends Avamar’s VMware advantages with vSphere 4 integration
- Avamar Data Store Gen3
- More capacity in same footprint – 65% increase in node density
- Avamar Data Transport
- Move deduplicated data to tape for up to 50x reduction in long-term storage
- Integration with vSphere
- Universal support for all vSphere 4 backup options
- vStorage API integration
- Greater backup and restore flexibility
- VMware vCenter Server integration
- Single point of management within Avamar for all VMware backup options
Data Domain
New appliances: DD630, DD610 & DD140 (Remote Office)
- Up to 420* TB Logical Capacity
- Internal Expansion
- 7 HDD, 12 HDD + Field Upgrade
- 20 to 1 Replication Contexts
- 2U, Redundant Power
- DD610 – 6TB
- Up to 1.6x more raw capacity
- Up to 1.46x more usable capacity
- Up to 1.5x higher throughput @ 675 GB/hr Peak
- DD630 – 12TB
- Up to 1.6x more raw capacity
- Up to 1.48x more usable capacity
- Up to 2x higher throughput @ 1.1 TB/hr Peak
- DD630 – 12TB: Local disk-based backup, networked DR, centralized tape consolidation
- Up to 2x more raw capacity
- Up to 2.3x more usable capacity
- Up to 1.5x higher throughput @ 450 GB/hr Peak
Next Steps? GreenPages’ File System Storage Assessment
Create a Smarter Storage Strategy
- Discover key attributes about your heterogeneous file storage environment, such as volumes, shares, exports, security settings, file system settings, and more
- Get a detailed look into the file data environment, including which file types are being created, who’s creating them, how quickly they age, and which resources they consume
- Rich data profiling capabilities and powerful reporting tools enable you to identify trends in your file data and create effective file management policies
- File System Inventory
- Enhances visibility into file storage environments
- Monitors usage file storage resources
- Provides detailed file system statistics and trend reporting
Call your GreenPages Account Manager for more details on our File Storage Assessments. And for a deep dive into the subject, listen to GreenPages’ most recent webinar: Virtualization Backup Challenges—Deduplication: How to Decide on Source-based or Target-based.
‹ ‹ ‹ Back to Newsletter