Data

For this project, I visualized the Police Department Incident Reports dataset from DataSF. This is a public dataset uder the following license:

"This data is made available under the Public Domain Dedication and License v1.0 whose full text can be found at: www.opendatacommons.org/licenses/pddl/1.0/"

The original dataset has 26 columns. I am interested in the following columns for this project:

  1. Incident Datetime: The date and time when the incident occurred.
  2. Incident Category: A category mapped on to the Incident Code used in statistics and reporting.
  3. Incident Subcategory: A subcategory mapped on to the Incident Code used in statistics and reporting.
  4. Analysis Neighborhood: The neighborhood where the incident occured.
  5. Latitude: The latitude where the incident occured.
  6. Longitude: The longitude where the incident occured.

Wrangling

Prototypes

The original dataset contained 337K rows. I used the data filtering tool on the DataSF website to narrow the dataset down to only the rows I am interested in. First, I filtered the dataset to only show rows where the incident category contains the word "theft". I furthered narrowed it down to rows where the incident subcategory contains the word "vehicle". This left me with all the rows that were an incident of motor vehicle theft or larceny from vehicle. These are the types of incidents I am interested in.

Next, I filtered the dataset down using the DataSF filtering tool to only show incidents that happened in the last 30 days. These are the incidents which are most relavent for this project. I did this by filtering to only include rows in which the incident datetime is between 2020/03/29 and 2020/04/27. For the future, I will use the DataSF API and dynamically filter rows to show the last 30 days instead of hardcoding the dates.

Finally, I filtered the dataset using the DataSF filtering tool to remove rows which have empty data. I noticed I was getting some errors because certain rows did not have a value for latitude or longitude. To fix this, I filtered out any rows in which latitude or longitude are empty.

Final Visualizations

For the final visualizations, I wrangled the data using the same filters as the prototypes. However, for the date, I dynamically filter the last 28 days of data to get a snapshot of what is currently happening. I also use the API for the data, as opposed to the filtering tools on the DataSF website. Filtering is done by editing the URL for the API request.

About

Ahmed Kaddoura
CS Major, University of San Francisco
Expected Graduation: Fall 2020
amkaddoura@dons.usfca.edu

I am a computer scientist from San Francisco with an interest in front-end development and game design. I enjoy writing songs, playing basketball, and baking.

Skills
Python C Java JavaScript HTML CSS SVG