Crime Statistics in South Africa ;An SQL Data Storytelling
Introduction ; the backstory
A few months back I did a Tableau visualization on Crime statistics in South Africa please see this link (https://medium.com/@kiiomgps/crime-statistics-in-south-africa-b8eec219bb01) to read the analysis and look at at the visualizations. I was still getting my feet wet in the data field having been laid off from my Federal Contracting job in the middle of the Coronavirus epidemic.
So stuck at home with no job and a lot of time in my dingy and dimly lite basement I stumbled upon Data Analysis by mistake and got the bug yes that one the data bug. I embarked on an amazing journey to learn as much as I could as I knew that the job that I had treasured so much for the past two years of my life had lead me to a dead end career path . I knew then and there that something had to give for me to move forward with my career; and mind you I ain’t a spring chicken.
So armed with $500 that was part of my severance package I enrolled in a Data Analysis Boot Camp, hoping that this career change will get me out of the career rot that I was in. It was not easy and trust me this medulla oblongata that had not seen the inside of a classroom or studied in a minute really struggled to get a gasp of what the instructor was saying half the time.
We had two sessions one an SQL Boot Camp and another on Power BI ,both of them were equally challenging then, now not so much. So I decided that since I was getting instructions on these two analytical tools I might as well get myself educated and learn Tableau and Python. Well ,I went to Udemy.com and registered for Tableau. I then went through the free classes for Python to see whether I will be able to learn a thing… Python was really a snake that I could not handle and forfeited that venture.
So armed with the little knowledge that I had about Tableau I decided that I have had enough training to do some visualizations. Looking at it today am wondering what the heck was I doing what business did I have putting that stuff out there for you all yonder out there to see. Here is the link to my current Tableau page (https://public.tableau.com/app/profile/kiio.mutua) .Well I have contemplated several times whether I should go and delete some of them but decided that it’s good to have them there so that I can at least see how far I have come.
It’s like looking at your baby pictures and wondering who the heck is that ? Is that really me nah must have been someone’s else kid not me … maybe the neighbor came with photos and they all got mixed up with ours … anyway that's a story for another day…went completely out of topic there…
So back to the business at hand so as I was saying I did those visualizations and now am back to do another one this time not creating fancy charts but running sql queries yeah you heard me right I have FINALLY graduated to coding …Congratulations to me lol…
Data Analysis Tool & Cleaning
The data that I will be using has been sourced from kaggle if you feel brave enough do analyze this dataset here is the link https://www.kaggle.com/slwessels/crime-statistics-for-south-africa and you are very welcome .I will be using my beloved online sql tool yes you know it there you go https://sqliteonline.com/ ( if you haven't tried this one my friends you are missing something the only thing I dread more than death is FOMU (fear of missing out). The dataset had two excel files CVS files one is the SouthAfricaCrimeStats_v2 and the other one is named ProvincePopulation. The dataset came pretty cleaned up so there wasn't much cleaning to do and I got down to immediately analyzing the data…
The Analysis : Using SQL
The purpose of this analysis is to answer several pertinent questions based on the data provided. I will not try to offer solutions as I am not in Law Enforcement.
- Is there a correlation between population density and crime?
- What Provinces have the highest crime rates?
- What types of crimes are most prevalent and has crime increased over the 10 year period from 2005 and 2016 ?
I ran the above command to see what columns the table had; there were three text type columns; Province, Station and Category while the others date type for the years 2005 to 2016. So basically this data set is about crime in South Africa specifically in different Provinces in South Africa in a 10 year span (2005 to 2016).
Using the above code I was able to find out the most poulated Provinces in South Africa and Gauteng Province topped the list while Northern Cape was the least populated Province in South Africa.
In the same manner using the same code I was able to determine that Gauteng was the most densely populated Province while Northern Cape was the least populated.
The code above listed the Most densely populated Province below.
The code above listed the Least populated Province below.
To find out the total population of South Africa the Select (SUM) population was used to arrive at the Total Population of South Africa. Of course this is the South African population 5 years ago in 2016 and will be different from the current South African population in 2023.