Data Ethics: Risk Management for the Algorithmic Age

Dennis Hirsch


November 1, 2019

Feature Data Ethics

Many companies today use big data analytics—a term that encompasses machine learning and many forms of artificial intelligence—to provide value throughout the organization, including applications for everything from operations to marketing to human resources. Much of the discussion around big data analytics focuses on the promising benefits, but ignores the equally significant risks.

Big data analytics can harm individuals in at least three important ways: privacy invasion, manipulation and bias. These threats to individual consumers are, in turn, threats to the reputation and brand of the companies deploying such technology. And most of these companies do not fully appreciate these very real business risks.

As the Facebook/Cambridge Analytica scandal made clear last year, irresponsible use of big data analytics can harm people and inflict major damage on corporate reputation and shareholder value. As a result, some companies are starting to proactively manage these risks, calling this new focus area “data ethics.”

Two years ago, Ohio State University launched an interdisciplinary study of corporate data ethics management, conducting interviews with chief privacy officers, legal counsel and others who advise companies on these matters. The researchers sought to learn how companies perceived the risks that using big data analytics raised, and why they chose to actively manage these risks. The interviewees told a largely consistent tale with important applications for the risk management profession.
Examining the Risks

The companies in the study explained that the use of big data analytics poses several key risks to individuals and to the companies themselves:

  • Privacy risk. Big data analytics can take surface data about individuals and infer sensitive, latent information from it. In a frequently cited example, Target analyzed the purchasing histories of its female customers to determine which of these women were likely pregnant, and then flooded them with baby-related coupons and advertisements. This inadvertently tipped off a father that his 15-year-old daughter who received the coupons was pregnant, a clear privacy violation. The ability to take relatively innocuous personal data and infer pregnancy status, mental health conditions, sexual orientation, political affiliations and many other types of sensitive information creates an ongoing risk of privacy invasion.

  • Manipulation risk. Companies can use data analytics to infer people’s psychological vulnerabilities and take advantage of them. Manipulation of this type was at the heart of the Facebook/Cambridge Analytica scandal. Cambridge Analytica analyzed individual users’ Facebook “likes” to infer their personality types, then targeted these unsuspecting individuals with political ads that appealed to them in ways that they would find difficult to resist. This not only manipulated and therefore harmed the people in question, it may have even swayed the results of the 2016 U.S. presidential election. It also had disastrous consequences for Cambridge Analytica and Facebook.

  • Bias risk. Data analytics takes massive sets of past data (also known as training data), finds patterns and uses these correlations to make predictions. Where human bias against protected classes shaped the training data, it will influence the final predictions as well. For example, Amazon developed an artificial intelligence tool to sort through the tens of thousands of resumes that it receives each year. It trained the tool on the resumes of its current employees, who represented “successful” candidates. Unfortunately, due to well-documented biases in the tech industry, most of the existing employees whose resumes were used were men. Amazon asked the AI tool to identify incoming resumes that most resembled those of current workers, so the tool accordingly learned to select men’s resumes and to discard those that had attributes that were characteristic of women, such as attendance at an all-women’s college, or playing on a women’s sports team. Bias had shaped the training data, and the algorithm incorporated and perpetuated this bias. Thankfully, Amazon caught this problem before making the tool operational and abandoned the project.

The Target pregnancy incident appeared in the New York Times and continues to be featured in articles (like this one) about the risks of data analytics. The Facebook/Cambridge Analytica scandal caused Cambridge Analytica to go out of business and contributed to Facebook’s loss of $100 million in stock value and a recent $5 billion fine from the U.S. Federal Trade Commission for violating an earlier consent decree concerning user data privacy. Amazon’s AI tool risked perpetuating gender discrimination at the company. Clearly, the fallout from such incidents demonstrates the magnitude and potential long-term risks of big data analytics.

The Importance of Managing Data Risks

Many companies do not yet perceive the risks stemming from their use of big data analytics, but those that do confront the question of whether to invest resources in managing them and, if so, how to go about it.

In the Ohio State study, representatives of companies in technology, retail, pharmaceutical, health and other industries that decided to undertake this new form of data ethics management explained why their companies made this investment. Some focused on the reputation damage the company would suffer if it was found to have violated peoples’ privacy, exploited them, discriminated against them or otherwise used data analytics in ways that cause harm. Some spoke about this in terms of trust—a particularly important quality for a digital era in which many companies depend on individuals sharing their personal information.

“There are a lot of different people like me at other companies who are trying to ensure that the trust in their brand is maintained and extended because trust is a fundamental part of all human relationships,” one manager said. “That is why, if you act ethically and ensure the data use is ethical, and you are fully accountable for that, then your brand is trustworthy. I think that is the most important. That is what we’re all trying to achieve. There are many, many companies [that] get it and are trying to start or extend programs that really get at this fundamental level of trust and ethical operation.”

Another interviewee explained that consumers generally are not in a position to evaluate a company’s analytics practices, but that business customers are. The interviewee’s company, which provides technology-related services to other companies, cares greatly about whether its business customers trust it with the personal data they possess. These business customers have the resources to conduct due diligence on the company’s data practices and analytics operations and, if they do not like what they see, will take their business elsewhere, the interviewee explained. This provides the company with a strong incentive to improve its data ethics management.

Other companies focused on employee recruitment and retention. As one interviewee explained, in the big data economy, a company’s success depends on recruiting the best engineering talent, and these high-demand technologists can decide to leave companies whose values or actions they find offensive, giving companies even more reason to not let this happen. Google’s well-publicized responsiveness to employee objections to certain U.S. Department of Defense contracts is but one example.

Other interviewees focused on regulation. They saw data ethics as a means to show that companies could act responsibly and, in doing so, preempt future regulation of big data analytics. Others assumed that such regulation is forthcoming and saw the ethical use of data as a way to position their company for this regulatory future. In their view, companies that anticipated such laws and designed their systems to account for them would be better positioned to handle future regulations than competitors who did not have such foresight.

Others emphasized corporate values and the desire to live by them. For these interviewees, the risk of discriminating against protected classes loomed especially large. This was not something that they or their companies wanted to do and they saw data ethics management as a way to prevent such incidents.

Data Ethics and Risk Management

Most companies understand the importance of managing big data risks. Until recently, companies addressed these risks mainly by complying with privacy laws. They assumed that, as long as they adhered to these laws, they had sufficiently protected consumers and shielded themselves. Big data analytics has changed this, however, because privacy laws generally seek to give individuals control over the collection, use and sharing of their personal data. It does this by providing them with notice of such data processing, and a degree of choice as to whether to allow it. This “notice and consent” approach to privacy protection has been under strain for some time as privacy notices have become longer and more prolific.

However, because of the capabilities of big data analytics, individuals who agree to share their surface information cannot know what latent, inferred information they are also inadvertently sharing and thus cannot meaningfully choose to make the disclosure in the first place. In the era of big data analytics, people can no longer use “notice and consent” to protect themselves, and so companies cannot protect them by simply complying with privacy laws.

Companies must do more to protect their customers and, in doing so, their own reputations. They need to go beyond what the law requires to ensure that their data analytics activities do not invade privacy, institutionalize bias or manipulate people. These “data ethics” activities also include implementing responsible data practices and assessing and mitigating the serious risks to their customers and themselves. In other words, companies are not only doing what the law requires, but what is ethical in a broader sense, acknowledging that even though a practice may be perfectly legal, it still may not be a good business decision.

The term “data ethics” suggests that companies must educate themselves about human rights frameworks and other ethical philosophies, then make an effort to conform their business practices to them. Many business leaders who discuss data ethics speak of it in these terms. But that is not necessarily what companies are doing. Certainly, they are going beyond what the law requires, and calling this “data ethics,” but their goal is to reduce the risk that their data analytics activities will injure people and harm the company. For them, data ethics is about more than compliance risk mitigation. Data ethics is risk management for the algorithmic economy.
Dennis Hirsch is professor of law and director of the Program on Data and Governance at Ohio State University’s Moritz College of Law. He also holds the title of professor of law at Capital University Law School and is a research fellow at The Risk Institute at Ohio State’s Fisher College of Business.