One of the most frustrating issues for IT staff (an issue that continues to drive sales of storage [disk or tape]), is how long data has to be kept around. Anyone who has ever moved has been faced with closets, attics and basements full of stuff that they have paid to be moved or stored, that they have not looked at or used in years! No one is making us keep this stuff, and yet we do. We also do not have to keep most of those backups and emails either. The only reason we do is for the same reason that we keep the stuff around the house – no one wants to throw it out. Unless you have a specific regulation or rule that guides your business, you really can throw it out – because it costs you money to keep it. So, let’s talk about this briefly.
Email and file system backup and archive retention spans, when not bound by specific business or regulatory compliance requirements, can still vary considerably. The cost of longer term retention, as well as its usefulness or relevance to the business as the backup (or archive) ages, are the main factors in deciding on a policy.
In my experience as the lead engineer for a managed backup service which handled backups for all AT&T hosted data centers in the United States (over 3 PB/year in volume and several hundred corporate customers), 99% of the backup retention requirements were 4 weeks. That's it, 4 weeks. It is a known statistic in the backup industry that over 90% of restore requests are made within one month of the data’s creation (file or email). Many customers will add a few layers of backups, known as the Grandfather/Father/Son scheme, as follows:
- Daily: retained 2 to 4 weeks
- Weekly: retained 1 to 3 months
- Monthly: retained 6 months to 1 year
The main purpose of the Grandfather backup, the long term retention of a monthly backup, is for archiving. Archiving is long term data storage of older data, which (strictly speaking) has been removed from the primary storage. This is necessary to keep the primary data storage medium growth to manageable and affordable levels. This often comes into play in user data (files, spreadsheets, PDFs) and in email, which can quickly grow out of control when users treat their inbox as a defacto file system.
Email archiving has additional useful characteristics such as Single Instance Storage (removing all duplicate attachments from group emails, keeping one copy), mailbox storage management (archiving/removing old emails from the mailbox), journaling (keeping a copy of every email sent or received or sent throughout the day) and compliance (searching those journaled copies for key words or addresses for legal discovery). Retention of the archived messages and attachments is often no more than 2 to 3 months but can be as long as 6 months. If journaling and legal discovery are an important consideration for the business, the retention of the journaled emails could be up to a year. If there are no compliance or regulatory requirements, email does not need to be retained for any longer than 6 months to a year. The cost of the storage can become financially unsustainable for longer retention periods.
In conclusion: backups do not need to be retained for longer than 4 to 8 weeks, in most cases. Archived backups or emails can be kept for longer periods, especially if deduplicated, but most companies do not keep such data for longer than one year.
Retention of electronic data should be consistent with written record keeping policies and should be documented for use in the event of a legal discovery. Any requests for data outside of the documented retention period can then be safely denied, reducing the costs and risk incurred with legal discovery events.
These opinions and best practices are based on over 10 years of storage and backup administration experience, several of which were at a managed storage service provider handling thousands of backups daily for major internet data center managed hosting and collocation facilities. These standards are broadly applied and observed in many commercial sector organizations retention policies for electronic data.