The original dataset has 26 columns. I am interested in the following columns for this project:
Incident Datetime: The date and time when the incident occurred.
Incident Category: A category mapped on to the Incident Code used in statistics and reporting.
Incident Subcategory: A subcategory mapped on to the Incident Code used in statistics and reporting.
Analysis Neighborhood: The neighborhood where the incident occured.
Latitude: The latitude where the incident occured.
Longitude: The longitude where the incident occured.
Wrangling
Prototypes
The original dataset contained 337K rows. I used the data filtering tool on the DataSF website to narrow the dataset down to only the rows I am interested in.
First, I filtered the dataset to only show rows where the incident category contains the word "theft". I furthered narrowed it down to rows where the incident
subcategory contains the word "vehicle". This left me with all the rows that were an incident of motor vehicle theft or larceny from vehicle. These are the types
of incidents I am interested in.
Next, I filtered the dataset down using the DataSF filtering tool to only show incidents that happened in the last 30 days. These are the incidents which are
most relavent for this project. I did this by filtering to only include rows in which the incident datetime is between 2020/03/29 and 2020/04/27. For the future,
I will use the DataSF API and dynamically filter rows to show the last 30 days instead of hardcoding the dates.
Finally, I filtered the dataset using the DataSF filtering tool to remove rows which have empty data. I noticed I was getting some errors because certain rows
did not have a value for latitude or longitude. To fix this, I filtered out any rows in which latitude or longitude are empty.
Final Visualizations
For the final visualizations, I wrangled the data using the same filters as the prototypes. However, for the date, I dynamically filter the last 28 days of data
to get a snapshot of what is currently happening. I also use the API for the data, as opposed to the filtering tools on the DataSF website. Filtering is done by
editing the URL for the API request.
I am a computer scientist from San Francisco with an interest in front-end development and game design. I enjoy writing songs, playing basketball, and baking.