Gun Violence in Chicago

Key Libraries Employed

Data Processing & Cleaning

Pandas, Numpy, Duckdb

Exploratory & Evaluative Data Visualizations

Matplotlib, Seaborn, Folium, Folium.plugins

Regressions and OLS Tables

Scikit-Learn, Statsmodels

65000+ Data Points, 300+ Lines of Code, 58 Pages.

All in One

Data Cleaning

  • We employ Pandas, NumPy, and DuckDB to process 65000+ datapoints.

Exploratory Data Analysis

  • We employ Folium, a library built on Leaflet.js, to explore whether certain zip codes in Chicago experience higher rates of gunshot violence.

Subquestion I – Is there a significant interplay between race, age, zip code, educational attainment, and one’s likelihood of experiencing gun violence?

  • We use a Negative Binomial Generalized Linear Model
  • We note the model to be statistically significant, therefore showing significant interplay.
  • Lower ages, specific zip codes, and specific races increase one’s chances of experiencing gun violence significantly.

Subquestion II – Do occurrences of gunshot violence vary by month?

  • We use one-hot encoding to create binary indicators for month.
  • We note that, in most cases, month coefficients are significant in explaining gunshot violence, and 78% of the variation in the data can be explained by the model with an MSE of 0.001.
  • July has the highest occurrence of gunshot violence.

Subquestion III – Do average test scores by zip code correlate with gunshot violence in that zip code?

  • We create a relative ranking of Chicago zip codes based on average standardized test scores.
  • We note a fairly low R Squared statistic but a significant model test statistic of 10.60.
  • On average, one rank higher (i.e lower average test scores) relative to other zip codes seems associated with 1.72 higher gunshot injuries.