Part of “Complexity Theory,” a column on the tangled questions of our technological age.
In October I had the opportunity to interview some speakers at the fall conference on AI Ethics, Policy, and Governance for the Stanford Institute for Human-Centered Artificial Intelligence. I stood in a long line to meet Joy Buolamwini, founder of the Algorithmic Justice League and a keynote speaker at the conference, whose work in the area of algorithmic bias I really admire. Waiting in line, I brainstormed potential interview questions and tried to anticipate her responses, excited to hear her thoughts on AI — but I would never have guessed that she would compare the need for continuous oversight of algorithms to dental hygiene.
“Addressing algorithmic bias is like hygiene,” she told me. “You don’t brush once, you don’t floss once, you do it over and over again.”
In 2015, Google Photos accidentally tagged pictures of two African American individuals as gorillas. Three years after this shocking incident, Wired reported that Google had censored “gorilla,” “chimp,” “chimpanzee,” and “monkey” from searches and image tags in an apparent attempt at a quick fix or PR move. The difficulty in addressing the original error demonstrated the limitations of nascent image-labeling technology, which largely lacks the ability to incorporate an understanding of context or abstract concepts into its decision-making. Significantly, the situation raised concerns that Google needed to be more transparent about its techniques to address algorithmic bias, as well as about the technical immaturity of current object-recognition systems.
The Google Photos debacle shows that Buolamwini’s hygiene analogy is very fitting. Oversight of algorithms requires the persistence and thoroughness associated with hygiene rituals like brushing and washing, not a quick-fix strategy. In the face of algorithmic bias and discrimination, such oversight helps ensure that technology is representative of and beneficial to society. This article presents two further examples to illustrate the need for continuous review of algorithms: first, Facebook’s ad-targeting tools allowing for housing discrimination and second, Buolamwini’s own work investigating bias in facial analysis technology. These examples depict instances of algorithms perpetuating biases and enabling discrimination, and also demonstrate some of the challenges involved in the process of monitoring algorithmic flaws.
Facebook and ad targeting
The story of Facebook’s ad-targeting algorithm facilitating housing discrimination launched a series of journalistic exposés and legal challenges spanning over three years. In 2016, ProPublica reported that Facebook’s ad-targeting tools enabled advertisers to exclude specific groups it calls “Ethnic Affinities” (such as anyone with an “affinity” for African American or Asian American people); journalists were able to purchase a housing-categories ad excluding minority groups and have their ad approved 15 minutes later, raising concerns about Facebook potentially violating the Fair Housing Act. In February 2017, Facebook published updates to its ad policies and tools to strengthen its prohibition against discrimination in ads for housing, employment or credit. However, in November of that year, ProPublica retested — and successfully bought dozens of rental housing ads on Facebook that excluded groups such as African Americans, people interested in wheelchair ramps and Jews, with most of their ads receiving approval within minutes.
In 2018, fair housing groups and ProPublica found that Facebook had blocked the use of race as an exclusion category, but they were still able to buy an ad excluding people interested in Telemundo, the Spanish-language television network, “suggest[ing] that advertisers could still discriminate by using proxies for race or ethnicity.” Facing several lawsuits, Facebook ultimately announced in March 2019 that the company will implement changes to prevent Facebook advertisers from targeting users by age, gender and ZIP code for housing, employment and credit offers. Even after this settlement, researchers at Northeastern University and Upturn found that Facebook’s ad-delivery algorithm could still perpetuate biases based on proxy characteristics and ad content. (For example, an ad’s sample audience of software engineers may be considered a proxy for male profiles — one of the many unfortunate consequences of the lack of diversity in the tech industry — and if the ad itself featured a man, the algorithm can take ad content into account as well and thus might end up targeting the ad toward a predominantly male audience.) Protection against discrimination in ads on Facebook is still a work in progress.
Buolamwini and Bias in Facial Analysis
Joy Buolamwini’s work analyzing bias in facial analysis technology exemplifies the importance of continuous oversight of algorithmic decision-making. Her 2018 paper coauthored with computer scientist Timnit Gebru evaluated the gender classification algorithms of three companies: IBM, Microsoft and Face++. These commercial systems boasted high accuracy overall, but Buolamwini and Gebru argued that this aggregate statistic is deceptive: On further investigation, differences in the error rates between different groups revealed gender and skin-type bias. All three systems performed better on faces of males than females and lighter individuals than darker individuals. Conducting intersectional error analysis, the researchers found that the maximum error rate on darker female faces was over one in three, on a task that involves a one-in-two chance of being correct. In contrast, the maximum error rate for lighter-skinned males was less than one in 100.
Buolamwini didn’t stop there. She sent her findings to IBM, Microsoft and Face++, and then in 2019 retested their systems in a new study with Deborah Raji. The new study found improvements in accuracy for these three commercial systems but also uncovered misclassifications by the algorithms of two other companies, Amazon and Kairos. Amazon’s error rate has particularly raised concerns as the company has marketed its facial recognition technology to police departments and federal agencies for the last two years, and AI researchers have joined Buolamwini in calling on Amazon to stop selling its facial recognition system Rekognition to law enforcement. Amazon has challenged Raji and Buolamwini’s study, but Buolamwini has countered the company’s claims and has called on Amazon to sign the Safe Face Pledge — a project that encourages companies to commit to the ethical development and use of facial analysis technology. Buolamwini’s work illustrates the need for thorough oversight: conducting intersectional error analysis rather than only looking at aggregate accuracy, continuously monitoring algorithms to reveal flaws and ensure they are addressed, and highlighting the real-world applications of algorithms and the implications of their mistakes.
While considerable work still remains to be done in the area of addressing algorithmic biases, noteworthy progress is definitely being made. Companies have launched oversight efforts, such as Microsoft’s Fairness, Accountability, Transparency, and Ethics in AI (FATE) research group and IBM Research’s AI Fairness 360 toolkit. Academics like Buolamwini are working on exposing and mitigating algorithmic bias; organizations like Data & Society and the AI Now Institute are studying the social implications of AI; and media like ProPublica and Wired aim to raise awareness about algorithmic failings.
With a growing emphasis on persistent and thorough oversight, committing to algorithmic hygiene is essential to ensuring a more inclusive digital future.
Contact Ananya Karthik at ananya23 ‘at’ stanford.edu