Ten years ago, there was no such thing as too much data. Notions about data being the “new oil” prompted organizations to horde every byte they could, hoping that they might be able to harness it down the road. Combined with the notion that “storage is cheap,” this belief has led many companies to exponentially increased their risk rather than their opportunity.
New data privacy regulations in Europe and the United States impose a significant burden of care on organizations regarding their data collection processes. In fact, data minimization is a fundamental principle within the European Union’s General Data Protection Regulation (GDPR). Whether governed by the GDPR or state privacy regulations like the California Consumer Privacy Act (CCPA), businesses must now limit the personal data they collect and dispose of it once it is no longer needed for a legitimate business purpose.
New obligations require organizations not only to rein in data collection practices, but also to reduce the data already held. Furthering this imperative, over-retention of records or other information can lead to increased fines in the case of a data breach. As a result, organizations are moving away from the practice of collecting all the data they can toward a model of “if you can’t protect it, don’t collect it.” The focus now is on mitigating risk by adopting a strategy of data minimization.
Because regulations do not specify precisely what data should be erased, determining where to start with data minimization can be difficult. For most organizations, mapping their data estate is the most pragmatic way to begin.
Data governance experts frequently use the acronym ROT (redundant, obsolete, trivial) to describe data that provides no business value to an organization. Some experts expand that acronym to ROTT, adding “transitory” as another area of data vulnerability or duplication.
Using the ROTT acronym as a guide, the search for data that can be easily disposed of should be organized along the following lines:
Redundant: People tend to grossly underestimate how much of their data is redundant. Redundant information is duplicated in multiple places, either in a single system or across multiple systems. At any given time, up to 30% of an organization’s storage might be duplicate data. That is a huge amount of information that can be removed and will also make searching for information easier.
Obsolete: The value of information decreases precipitously over time, while its risk-to-value increases. Information can become obsolete if it is incomplete, outdated or incorrect. Using obsolete information can lead to poor decision-making, further posing risk to the business. An easy way to quickly assess obsolescence is by checking the creation or last accessed date.
Trivial: A surprisingly large amount of the information circulating around an organization’s systems does not have a legitimate business purpose. Documents detailing who is bringing what to the office potluck and back-and-forth conversations about meeting schedules do not provide value and should be deleted as soon as practical.
Transitory: Data in motion—information moving around a private or corporate network—should have a classification of its own for data minimization. Transitory data is not exactly duplicate data, but often includes data that is secured elsewhere. This type of data has sometimes fallen into the wrong hands, revealing sensitive information that was subsequently misused. It needs to be culled to minimize the risk it poses by remaining accessible and ungoverned.
Using Data Mapping to Improve Data Hygiene
Few organizations understand the totality of their data, and even fewer have a single, focused data protection or governance officer. Instead, there is a broad constituency in the C-suite overseeing data collection and usage for their separate domains. The CISO obsesses about leaks, hacks and phishing attacks. The CIO focuses on ensuring the business can monetize its data. The CTO looks at data through the lens of storage. And the legal team—some of the most vocal champions of data minimization—sees data as a legal and regulatory threat.
Even though only one is usually the designated custodian for information governance, each executive also has a different perspective on data hygiene. While each officer will naturally champion their own area’s needs, the one thing that will bring these widely differing interests together is understanding all the data the organization owns and the risks it poses. Data mapping is therefore key to minimization.
To start, it is essential to locate all data that an organization retains and expose the hidden areas of “dark data.” Gartner defines this as “assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes,” such as analytics, business relationships and direct monetization. This is the only way to measure the depth of an organization’s exposure, assess the location and magnitude of the data, and identify what needs remediation and what is overdue for erasure. To do so, a company can use a range of technology to:
- Gain a cross-repository inventory of all content to identify the real data of value in your business
- Add pre-defined rules for handling a wide range of e-trash, stale, outdated and transitory (ROTT) content
- Adapt rules to make them actionable in your environment
- Test minimization policies across large volumes of files to understand their impact
First and foremost, shed light into all potential data repositories. For example, even if one department has decided to use a specific data storage solution like Dropbox, there is a good chance that different departments might be using others, like SharePoint or Box. All must be examined.
Once the data mapping is complete, organizations must digest and process the extent of their data chaos. Internally, this includes discussions on budget, responsibilities, authority, timing, people and processes. Organizations need to understand the various business drivers, stakeholders and risk areas in data privacy, such as audits, litigation, GDPR and CCPA regulations, and data migrations to the cloud. With this step complete, organizations can then decide what to fix and what type of tools or support they will need to remediate the issues.
Whatever the company needs to do, bear in mind that it cannot be done overnight. No one wants to get bogged down in a multi-year, multimillion-dollar initiative. After the initial sense of urgency, efforts can sometimes stall as a result of thinking too big. Instead, break down a project into modular, bite-sized chunks.
Throughout the journey, start to build in minimization concepts during the earliest phases of a data accumulation program. Acquire data progressively, and only when genuinely needed. The less information collected, the less to store and manage. This is the way to achieve good data hygiene, and a path to ongoing data minimization.