Just data?… White House confronts subtle algorithmic discrimination

Updated: On rereading this post from the weekend I realized that this was possibly the most boring post ever written on a topic that isn’t boring at all.  So I’ve re-edited this  – for fun-ness.


I’m loving yesterday’s White House report on big data. It takes on how to hold commercial use of big data  to the same ethical standards we hold traditional business practices (theoretically anyway).  There’s also good discussion of important security and privacy issues.  But the really welcome stuff in Big Data:  Seizing Opportunities, Preserving Values is its discussion of how commercial data analysis is already shaping our daily lives in less obvious but potentially profound ways.

Let’s start with digital red-lining. As I alluded to in my previous post on high frequency trading, whether its bots or bank managers doing the “red-lining” of no-loan districts, it should be equally illegal.  Or as the New York Times put it in their coverage of the WH report:

“the same technology that is often so useful in predicting places that would be struck by floods or diagnosing hard-to-find illnesses in infants also has “the potential to eclipse longstanding civil rights protections in how personal information is used in housing, credit, employment, health, education and the marketplace.””

But in addition to direct threats to civil rights protections, the report surfaces the small “d” discrimination inherent in new far more sophisticated mechanisms to characterize and segment consumers.  The structure of these algorithms is largely invisible, often proprietary and intentionally “seamless”, but their effect is to differently shape the world (the information, products, prices, opportunities) that each of us has access to through the internet and, increasingly, offline as well.

There’s nothing necessarily wrong with this, indeed, many of these algorithms are essential and powerful tools without which we’d struggle to make sense of the sea of data in which we already swim.  But if we’re blithely oblivious or treat algorithmic decision-making as somehow inscrutable (see my post on HFT),  it doesn’t take a lot of imagination to come up with a whole host of ways things could go south in a hurry.  As the report rather drily puts it in the recommendations section, for example:

“Consumers have a legitimate expectation of knowing whether the prices they are offered for goods and services are systematically different than the prices offered to others.”

So its great to see the Administration launch a civic discussion of how we enhance transparency and accountability in big data analytics.  Whats more, the discipline of  carrying this out could very well have knock-on benefits for the rest of the economy.

As the report also flags, big data and the tools to analyze it also hold immense potential to improve the performance, transparency, and accountability of complex systems like healthcare – which (I’d add) has long been rife with waste and inequity masked by old-fashioned complexity and technological obfuscation.

Its time we got used to asking what’s happening behind the curtain.


Below, some highlights on these topics from the report itself:

On “Algorithms, Alternative Scoring and the Specter of Discrimination”

The business models and big data strategies now being built around the collection and use of consumer data, particularly among the “third-party” data services companies, raise important questions about how to ensure transparency and accountability in these practices. Powerful algorithms can unlock value in the vast troves of information available to businesses, and can help empower consumers, but also raise the potential of encoding discrimination in automated decisions. Fueled by greater access to data and powerful analytics, there are now a host of products that “score” individuals beyond the scope of traditional credit scores, which are regulated by law.110 These products attempt to statistically characterize everything from a consumer’s ability to pay to whether, on the basis of their social media posts, they are a “social influencer” or “socially influenced.”

While these scores may be generated for marketing purposes, they can also in practice be used similarly to regulated credit scores in ways that influence an individuals’ opportunities to find housing, forecast their job security, or estimate their health, outside of the protections of the Fair Credit Reporting Act or Equal Credit Opportunity Act.111 Details on what types of data are included in these scores and the algorithms used for assigning attributes to an individual are held closely by companies and largely invisible to consumers. That means there is often no meaningful avenue for either identifying harms or holding any entity in the decision-making chain accountable.””

On consumer and search related issues:

“The fusion of many different kinds of data, processed in real time, has the power to deliver exactly the right message, product, or service to consumers before they even ask. Small bits of data can be brought together to create a clear picture of a person to predict preferences or behaviors. These detailed personal profiles and personalized experiences are effective in the consumer marketplace and can deliver products and offers to precise segments of the population—like a professional accountant with a passion for knit-ting, or a home chef with a penchant for horror films.

Unfortunately, “perfect personalization” also leaves room for subtle and not-so-subtle forms of discrimination in pricing, services, and opportunities. For example, one study found web searches involving black-identifying names (e.g., “Jermaine”) were more likely to display ads with the word “arrest” in them than searches with white-identifying names (e.g., “Geoffrey”). This research was not able to determine exactly why a racially biased result occurred, recognizing that ad display is algorithmically generated based on a number of variables and decision processes.17 But it’s clear that outcomes like these, by serving up different kinds of information to different groups, have the potential to cause real harm to individuals, whether they are pursuing a job, purchasing a home, or simply searching for information.

Another concern is that big data technology could assign people to ideologically or culturally segregated enclaves known as “filter bubbles” that effectively prevent them from encountering information that challenges their biases or assumptions.18 Extensive profiles about individuals and their preferences are being painstakingly developed by com-panies that acquire and process increasing amounts of data. Public awareness of the scope and scale of these activities is limited, however, and consumers have few opportunities to control the collection, use, and re-use of these data profiles.”

Relatedly, in the privacy sphere

“As techniques like data fusion make big data analytics more powerful, the challenges to current expectations of privacy grow more serious. When data is initially linked to an in-dividual or device, some privacy-protective technology seeks to remove this linkage, or “de-identify” personally identifiable information—but equally effective techniques exist to pull the pieces back together through “re-identification.” Similarly, integrating diverse data can lead to what some analysts call the “mosaic effect,” whereby personally identifiable information can be derived or inferred from datasets that do not even include personal identifiers, bringing into focus a picture of who an individual is and what he or she likes.

Many technologists are of the view that de-identification of data as a means of protecting individual privacy is, at best, a limited proposition.19”

And finally, in general:

“Recognizing that big data technologies are used far beyond the intelligence community, this report has taken a broad view of the issues implicated by big data. These new technologies do not only test individual privacy, whether defined as the right to be let alone, the right to control one’s identity, or some other variation. Some of the most profound challenges revealed during this review concern how big data analytics may lead to disparate inequitable treatment, particularly of disadvantaged groups, or create such an opaque decision-making environment that individual autonomy is lost in an impenetrable set of algorithms.

These are not unsolvable problems, but they merit deep and serious consideration. The historian Melvin Kranzberg’s First Law of Technology is important to keep in mind: “Technology is neither good nor bad; nor is it neutral.”22 Technology can be used for the public good, but so too can it be used for individual harm. Regardless of technological advances, the American public retains the power to structure the policies and laws that govern the use of new technologies in a way that protects foundational values.

Big data is changing the world. But it is not changing Americans’ belief in the value of protecting personal privacy, of ensuring fairness, or of preventing discrimination. This report aims to encourage the use of data to advance social good, particularly where markets and existing institutions do not otherwise support such progress, while at the same time supporting frameworks, structures, and research that help protect our core values.”

The whole report and its recommendations are here

Tagged , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: