It’s no secret that the tech industry has historically lacked racial and gender diversity. And when Stanford’s computer science (CS) department isn’t creating founders of the next Google, Snapchat or Netflix, it’s busy churning out employees for these big tech companies. Given the industry’s limited progress in increasing diversity, this pipeline between Stanford’s CS department and the tech sector highlights the importance of diversity within its students. In this article, we will examine the race and gender demographics of CS majors, and how this breakdown has changed over the past five years.
In 2015, Jorge Cueto published a Medium article examining race and gender within Stanford’s computer science students. He pulled the names of 707 Stanford CS majors in the classes of 2015-2018 from a list maintained by the CS department. Through online searches, the social media profiles of students and the perceived origin of their last name, Cueto classified these students into men and women for gender and white, Asian, Black or Hispanic for race. This manual data collection and categorization, though imperfect, remains necessary due to the administration’s refusal to provide demographic breakdown by major.
Of the 707 students he analyzed, Cueto classified 69.7% as men and 30.3% as women. For race breakdown, 46.4% of students were Asian, 38% were white, 9.5% were Latinx, and 6.1% were Black.
Five years later, we wanted to see how much has, or has not, changed within Stanford’s CS department, so we performed an analysis of today’s 944 declared CS majors from the classes of 2020-2023. Using a methodology similar to Cueto’s, we primarily used Facebook and LinkedIn profiles of the students to determine race and gender. (See the Methodology section for more information.)
Stanford’s total undergraduate student population is 50% male and 50% female. The following charts demonstrate a slight change in gender demographics in computer science over the past five years — women CS majors have increased by 4.1%. In 2020, non-binary gender identities make up less than 1% of CS majors.
The Daily reached out to Mehran Sahami ’92 M.S. ’93 Ph.D. ’99, the associate chair for education in the CS department, for comment on the data. Sahami said that increasing the number of women in CS has long been a priority for the department, citing that in 2007, only 10% of declared majors were women. Efforts by the department to broaden the appeal of the major through curriculum revision, equity programs such as CS Pathfinders, and outreach by student groups such as Women in Computer Science (WiCS) have all contributed to the improvement of these numbers.
“Of course, we still have work to do,” Sahami explained. “We’d really like to achieve gender parity in CS and continue to work toward this goal. For example, CS106A is already often close to gender parity in many quarters.”
To compare how racial demographics have changed over time, the first chart below used the same aggregated race categories from Cueto’s 2015 article. Middle Eastern/North African students are aggregated into “White” and Southeast Asian students are aggregated into “East Asian.”
While the number of East Asian and white students continues to proliferate, the percentage of Black and Latinx students, two identities underrepresented in computer science, have barely changed. Additionally, these metrics show smaller proportions of Black and Latinx identities in the CS department compared to the proportions of these groups within the student population as a whole.
The following chart shows a disaggregated race breakdown of the 2020 data, including Middle Eastern/North African students and Southeast Asian students as distinct categories. Native American students, a category that does not have data from 2015, were found to make up less than 1% of CS majors in 2020.
Sahami explained that increasing the percentage of underrepresented-minority students in CS is an area of focus. The department has recently made efforts to connect with underrepresented minority (URM) student groups such as Black in CS and SOLE to learn how to become more supportive, particularly in regards to encouraging students to get involved as section leaders and teaching assistants as well as increasing URM student participation in CS research.
“We know we still have a long way to go and we have redoubled our efforts recently,” Sahami said.
The graph below shows the gender ratio for each race across categories with at least a 1% share of the data.
Within every race, the fraction of men was larger than the fraction of women. The ratio between men and women is best among East Asians, with approximately 54% being men and 45% being women. Similarly in 2015, the most equality existed for Asian majors, with the ratio being about 64:36 for both East Asians and South Asians. For East Asians, that ratio improved. It stayed about the same for South Asians while decreasing for Southeast Asians. The largest gender disparity exists for Latinx majors, where the number of women is less than a third of the number of men. Of the aggregated analysis done five years ago, Latinx students had the least gender parity.
In 2020, the majority of men are white or East Asian, with white students making up a slightly larger percentage (30.86% vs. 26.01%). When using the aggregated model, approximately 36% of male students are white, which is a drop from 41% in 2015, and 32% are East Asian, which is slightly down from 33.5% five years ago.
East Asian women are the plurality of women CS majors, and the proportions have approximately stayed the same since 2015. In 2015, 42.5% of women CS majors were East Asian. In 2020, that percentage dropped slightly to 41.05%. Even when the categories are aggregated, the number of East Asian women clearly exceeds the others.
The graph below shows the overall demographics of the CS department.
White men, East Asian men and East Asian women make up more than half of today’s computer science students, with white men and East Asian men being the two most prevalent categories.
Stanford’s CS department is certainly taking steps in the right direction. Sahami noted that the department works to make the introductory classes, like CS106A, welcoming and accessible for all. And this analysis of CS majors does not tell the complete story of the department. Stanford has a variety of CS-related majors, such as Symbolic Systems, Mathematical and Computational Science, and Science Technology and Society, that the department aims to prepare students for. However, our analysis reveals that the diversity of CS majors has a long way to go before it reflects the demographics of the entire student body.
In order to determine gender, we examined pictures of students on their social media platforms and used conventional appearances of cisgender men and women. Combining this information with pronouns evident on social media profiles, we did our best to conclude the gender of each individual.
To determine race, we sorted students into categories of white, Middle Eastern/North African, Black, Latinx, East Asian (such as Chinese, Korean, or Japanese), South Asian (such as Indian or Pakistani), Southeast Asian (such as Vietnamese, Thai or Filipinx), and Native American. Since Cueto’s analysis did not disaggregate to this extent, we sometimes reaggregated the numbers in order to perform a direct comparison.
We then analyzed physical appearance and the origin of last name. Since this information alone could be misleading, we also looked at languages spoken, which could often be found on LinkedIn profiles. In some social media profiles, students included hometowns, flags, or pictures with relatives which were helpful. If we identified a student as mixed race, we recorded both races. But for classification purposes, we assigned mixed students a single race as evenly as possible. For example, if we believed there were 10 white and East Asian students, we categorized 5 students as white, and the other 5 as East Asian. Using this methodology, we tried to accurately assign a race from one of the eight categories for each of the 944 students.
We felt that these methods provided enough information for an informal analysis of Stanford’s CS department. Interpret this data keeping in mind that these are not official statistics on race and gender. Traditionally, data on race and gender are self-reported, which reflects that such characteristics are related to personal identity and may not conform to social stereotypes that we relied on for our categorization purposes. Still, in the absence of such data, we believe it is helpful to look at the broader trends in race and gender, even if it relies on imperfect socially-observed data.
Contact Sophie Andrews at sophie1 ‘at’ stanford.edu and Lucia Morris at luciam ‘at’ stanford.edu.