The Risk of Keeping Too Much Data



Many marketers and product strategists get chills imagining the potential value of “big data,” and often believe that the more data the organization retains, the better the analytic outcome. Some organizations think that saving all data will protect them from e-discovery penalties in the event of litigation.

Unfortunately, both of these motivations for a “save everything” strategy are flawed because a critical part of the data value equation is missing. The only way to determine the real value of a data retention strategy is to calculate the financial and strategic benefit gained from using the data, minus the cost of the information infrastructure—including its management—and the legal and compliance risks associated with keeping all that data. When you do the math, it is clear that the “save everything” approach undermines the very purposes for which it is intended.

The Compliance Need to Delete
Companies that operate in the European Union have watched as an array of privacy directives emerged that contain “purpose of use” limitations dictating that organizations keep private or confidential information only as long as it is being used for the purpose for which it was originally collected. In the United States, an increasing number of laws, including the Health Insurance Portability and Accountability Act (HIPAA) and the Gramm-Leach-Bliley Act, have begun to impose similar privacy limitations. Meanwhile, in February 2012, the White House issued “Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy.” This states, “Companies should securely dispose of or de-identify personal data once they no longer need it, unless they are under a legal obligation to do otherwise.”

A strategy that involves saving all data will eventually put most companies on a collision course with evolving privacy regulations. Therefore, a company must assess the financial impact of not deleting data in accordance with all the regulations in all the jurisdictions within which it operates.

The Legal Need to Delete
While there may be little companies can do to reduce the number of lawsuits faced, they may be able to significantly reduce the cost of e-discovery. The median cost of e-discovery collection, processing and review is about $17,000 per gigabyte, according to the RAND Institute for Civil Justice. When organizations unnecessarily store thousands of gigabytes of data that get caught up in e-discovery requests, costs can soar needlessly.

The Business Need to Delete
If you work in a large organization, you have almost certainly experienced the frustration of wasting time trying to locate information on sprawling data servers and in massive email repositories. Perhaps it doesn’t happen too often or seem too onerous, but when this happens to thousands of employees across an organization, the result can be significantly reduced productivity, along with all the strategic, competitive and financial consequences that entails.

While this should be sufficiently compelling to spur desire for a data deletion strategy, companies looking to leverage big data initiatives should also take note. Some in the big data space believe that more data equates to better results, but experts say that is often not the case. “Once you delete data that’s stale, the algorithms actually function much better from an analytics standpoint,” said Jake Frazier, director of lifecycle governance at IBM and program director of legal and e-discovery for the Compliance Governance & Oversight Council (CGOC). “Leaving stale data can actually skew the algorithms towards older facts.”

The IT Desire to Delete
Most IT departments don’t need to be convinced of the need to delete valueless data, or “data debris.” They see the rising cost. As Frazier wrote on the Information Systems Audit and Control Association blog, “According to the Gartner IT Key Metrics Data 2012 Report, the total cost of storing and managing one petabyte of information is nearly $5 million per year. This means that a large enterprise saddled with 10 petabytes of data is spending about $50 million per year.”

CGOC research suggests that typically only 1% of corporate information is on litigation hold, 5% is in records retention and 25% has current business value. This means that as much as 69% of all the data organizations collect has no business, legal or regulatory value at all.

The math is staggering. Of the $50 million per year spent by an enterprise with 10 petabytes of data, as much as $34.5 million is wasted on data that should be deleted.

IT teams do not necessarily have any insight into data value, however. Technology staff cannot always determine what information is no longer of value to business users, what information can be deleted without risking e-discovery penalties, and what information needs to be deleted to satisfy privacy regulations.

While some IT departments engage in wholesale elimination of the oldest data in order to create new capacity, this just increases risk to the organization. IT can only begin to safely delete data by coordinating with the other information stakeholders. That is where information lifecycle governance comes into play.

Information Lifecycle Governance
Information lifestyle governance (ILG) is the practice of applying overarching storage and information policies that drive management processes. An ILG program should unify all information stakeholders from legal, compliance, business, IT privacy and security around key elements like a shared process model and vocabulary.

A successful data strategy—especially one designed to support big data analytics—requires an effective defensible disposal strategy that supports the automatic elimination of any data debris. While defensible disposal is a core element, an ILG program also requires technology solutions that can improve, automate and sustain shared processes.

When fully implemented, defensible disposal can reduce legal, compliance and IT costs and risks while increasing business user productivity and access to data. By deleting more often and more wisely, businesses decrease potential liability and reap strategic benefits from the valuable information retained


More articles by »

About the Author

Derek Gascon is executive director of the Corporate Governance Oversight Council.



  • i liked your framing of the governance challenge

  • Quality posts is the main to interest the people to visit the website, that’s what this site is providing.

  • Shahid Hussain

    "CGOC research suggests that typically only 1% of corporate information is on litigation hold, 5% is in records retention and 25% has current business value. This means that as much as 69% of all the data organizations collect has no business, legal or regulatory value at all."

    Can you share which type of data was reviewed? Structured, Unstructured or both?
    Also which industry, Financial, Non-Financial, Medical, etc. etc?


Leave a reply