Overview – DH101: UC Admission Trends

Sources

The primary datasets we used were the California Department of Education (CDE) and UC Admissions datasets, which contains a wealth of data about public high schools in California. These datasets helped give insights into student demographics and performances at the state, district, and school level. Part of the reason we chose these datasets was the wide range of fields that we can perform analysis from, such as race, gender, and socioeconomic status.

In addition to our primary sources, our group also reviewed an extensive set of secondary sources, which mainly consisted of scholarly journals and news articles. Although most of the sources are focused on education in the US rather than California, these sources help contextualize common notions of success in public and post secondary schools in the US. For instance, we found that graduation rates and college going rates were predominately used in measuring success in academic literature. However, many sources also investigated how confounding variables, like neighborhood (Wodtke et al., 2011), race (Santos et al., 2010), and ESL status (Kanno and Sara, 2014) can affect graduation rates. In general, these sources illuminated the issues surrounding notions of success in schools and helped guide our research questions.

Processing

For the most part, the data provided by the California Department of Education was relatively clean, requiring minimal processing to remove null values and filtering rows to only include high schools. Because of this, much of our work focused on merging the different CDE datasets together so that we can analyze different fields together (ex. test scores with income level). We attempted this by using Python and Pandas to merge the data frames by school code, which was a common field among all CDE datasets. However, because some school ids were not unique, we opted to perform analysis on the county level for many of our visualizations, which we believed was sufficient to determine trends in our data.

As for the University of California data, the data was easy to understand, only having three attributes. However, we had to figure out how to process the data as, when downloading the data as a CSV file, the data became a mess, separating rows into multiple rows. Each school ended up having multiple rows, one row for each attribute, so we ended up having to merge these rows together with Python and Pandas. Doing this process was tedious as there were no Pandas functions to merge the rows because of how the CSV file turned out, so we merged the dataset into a new CSV file.

One of our visualizations also required comparing data from the USNews and CDE datasets. Because the USNews dataset contained only the school name as an identifier, this made it difficult to merge with the CDE datasets. Ultimately, we decided to manually join the two datasets together, since the USNews dataset contained only 50 rows.

Presentation

We chose to build our website using WordPress because of its ease of use and wide selection of themes. When designing our website, we kept in mind Ben Shneiderman’s Eight Golden Rules of Interface Design, specifically his notes on consistency and usability. Therefore, we prioritized making the website clean, accessible, and visually appealing by adding alt texts to images, ensuring consistent font sizes and layouts, and adding a simple but useful navigation bar at the top of the site. We initially tried to model the website after the UC Admissions website, but ultimately decided to stick with a theme that was easy to read and navigate. We also tried to embed as many visualizations into the website as possible (as opposed to static images) in order to promote interactivity and responsiveness.

One challenge of designing the website was presenting all of our content from the dataset, our findings, and outside research in a way that was organized and intuitive to navigate. Thus, we had extensive discussions on which pages to include and where to place them in the navigation bar of the website. We ultimately decided on an order that made chronological sense, first introducing our topic and providing background information, followed by diving deeper into our analysis of the data, and finally providing supplemental information on the authors and sources. We also paid attention to organization on a more granular level within each page, especially with the analysis. It was important that we examined the various factors that affected school success in a logically flowing manner, and that the visuals accompanying the findings were clear and fit the primary arguments. By ending our analysis with a visualization showing an alternative way to analyze school success, we reinforce the importance of fairness in the education system and facilitate further discussions on how success should be measured.