Examining crime reports in San Francisco with SQL

22 Aug 2016

When I first moved to San Francisco, my mother warned me “not to go looking for trouble”, but this can be difficult when you do not know the city well. Luckily, SF OpenData provides a great public safety dataset, and our good friend Redshift has a few helpful tricks to help make sure trouble doesn’t find us.

To ease my mother’s concerns, we’ll examine all of the crime reports in San Francisco from the start of 2003 to the end of 2015.

We can start by plotting the total monthly crime and overlay trendlines for each year to get a full picture of our data.

Let’s break down the data into three areas: crime by day of the week, hour of the day, and the most criminally active days of the year.

Analyzing Crime by Day of Week and Hour of Day

Redshift’s extract() function allows us to easily retrieve the day of the week and the hour of the day from a timestamp. Applying it to our dataset we can build a query that extracts the dayofweek and hour from the date:

selectextract(dayofweekfromdate),extract(hourfromdate),count(1)fromsf_crime_2003_2015groupby1,2orderby3

Redshift’s dayofweek datepart returns a value from 0 to 6, with 0 representing Sunday and 6 representing Saturday. We can build a table with these mappings, and we can join it to our query above to make the results more readable.

select0,'Sunday'unionselect1,'Monday'unionselect2,'Tuesday'unionselect3,'Wednesday'unionselect4,'Thursday'unionselect5,'Friday'unionselect6,'Saturday'

After converting to readable day of week names, we compile our final result

It looks like 5 and 6 pm are the most crime-ridden parts of the day, with Friday being the most criminally active day. It could also be an effect of how this data is collected - officers may be more likely to mark the time they are filing their report at the end of the day (and end of the week) than the time of a crime’s occurrence.

Most Criminal Day of the Year

Continuing with our use of the extract() function, we can now count the number of reports and group by the extracted day of year. This allows us to determine which days see the most criminal reports. There are two issues that arise when we use this method:

Grouping by dayofyear does not account for leap years
The query will return the groupings based on the numeric day of the year, rather than human readable MM-DD values

We can account for the leap year issue by matching which years are leap years in a case when statement and subtracting a day after we cross Feb. 29th, the 60th day of the year

withcrime_by_doyas(select-- Check to see if it is a leap yearcasewhen(extract(yearfromdate)%4=0andextract(yearfromdate)%100<>0)orextract(yearfromdate)%400=0thencasewhenextract(dayofyearfromdate)=60then'Feb. 29'whenextract(dayofyearfromdate)>60then(extract(dayofyearfromdate)-1)::textelse(extract(dayofyearfromdate))::textendelse-- If it is not a leap year, no conversions are necessary(extract(dayofyearfromdate))::textendasdoy,count(1)asnum_incidentsfromsf_crime_2003_2015groupby1)

Our outermost case when identifies whether or not we are looking at a timestamp from a leap year. When we are, we shift the dayofyear extraction to move all values on February 29th to be their own text string. The remaining extractions ensure we match up the dayofyear values with those of non-leap years.

Now that we have addressed the first issue, we can move on to translating the dayofyear back into a readable format. To do this, we will use Redshift’s to_char() function. We will first need a list of dates from a non-leap year. To help generate that list of dates, we turn to our post on generating series in Redshift.

,generated_seriesas(select(getdate()::date-row_number()over(orderbytrue))::dateasnfromsf_crime_2003_2015limit730-- value for 2 years),doy_mappingas(select(extract(doyFROM(n)::timestamp))::textas"doy",to_char(n,'Month DD')as"mm_dd"fromgenerated_serieswhereextract(yearfromn)=2015)selectcrime_by_doy.doy,"mm_dd"asdate,crime_by_doy.num_incidentsfromcrime_by_doyjoindoy_mappingoncrime_by_doy.doy=doy_mapping.doyorderby3desc

The doy_mapping CTE uses the to_char() function to convert our dates from YYYY-MM-DD into Month DD. In the doy_mapping where clause, we can use any year that we know is not a leap year to ensure we are returning 365 results. Finally, our main query joins our CTEs together to give us an easily readable result set!

Interestingly, it looks like the first day of a month is the prime time for crime! In particular, New Year’s Day makes it seem that some of the New Year’s Resolutions may be less than lawful. To know for sure, we would have to investigate whether this is an artifact of how this data is reported and collected, or if it is a real phenomenon.

On the opposite end of the spectrum, we can order by the count ascending to identify the least criminal days of the year. Here, we see crime reporting drops off sharply with the Holiday season.

We can plot this to see how the number of crimes reported fluctuates over the course of a year.

Crime is fairly consistent over the course of the year. There is a small but notable increase in September-October (when the weather is nicest in San Francisco) before quickly dropping off in the Holiday season.

Looking at different time and date series is a great way to build familiarity with a dataset. We hope extract() helps with your time and date analytics!

Thank you

Examining crime reports in San Francisco with SQL

Analyzing Crime by Day of Week and Hour of Day

Most Criminal Day of the Year

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112

Want to make data analysis fast for everyone?

Analyzing Crime by Day of Week and Hour of Day

Most Criminal Day of the Year

Trending Articles