Coronavirus (COVID-19) Deaths
An Analysis Using SAS (Data Step and Proc SQL) & Visualization Using Tableau
COVID-19 is a serious global infectious disease outbreak. It is part of a family of viruses called coronaviruses that infect both animals and people. This one originated in China at the end of 2019, in the city of Wuhan, which has 11 million residents. In the past two decades coronavirus outbreaks have caused global concern, including one in 2003 with the severe acute respiratory syndrome (SARS) and more recently in 2012 with the Middle East Respiratory Syndrome (MERS). (gavi.org)
At the beginning of 2020 Covid 19 infections spread across the global causing severe respiratory disease and death. Schools, universities, sporting events were either postponed, cancelled and thousands of employees worked remotely. Hospitals become ground zeroes admitting only the severely sick where thousands died alone away from families and friends.
If you have not been infected you must have been affected. During this time massive amounts of data about Covid 19 infections ,deaths and ultimately vaccinations was collected.
The purpose of this project is to showcase my coding skills and help turn insights into actions .I will analyze the data using SQL and develop Visualization Charts using Tableau to showcase the devastating effect of Covid 19. The data is very valuable has not only deaths, vaccinations, hospitalization but also data on those who were smokers or had underlying medical conditions (preexisting conditions) that made them more susceptible to Covid 19.
To look at the data please see this link https://ourworldindata.org/covid-deaths
ANALYZING THE DATASET
To analyze this data, I first pulled open the Data File and created three new Excel files both containing the population column. The purpose of adding the population column in all the datasets was to avoid doing a join for most of the queries. I saved one file as Death_due_Covid, Covid _Vaccinations and US_Covid. From there I imported all the data sets into SAS using the import wizard so that I can use SAS and Proc SQL to carry out the analysis.
Most of the queries in this analysis will be done using PROC SQL and a few will be carried out using the Data Step. The software that I will use is SAS on Demand for Academics since I do not have access to SAS Enterprise Guide at my disposal but most of the commands are basically the same as they would be in a commercial SAS platform.
/* Importing the Excel Data files into SAS on Demand* the first file Death_due _Covid /
/* Generated Code (IMPORT) */
/* Source File: Death_due_Covid.csv */
/* Source Path: /home/u61022716 */
/* Code generated on: 7/20/22, 3:57 PM */
FILENAME REFFILE ‘/home/u61022716/Death_due_Covid.csv’;
PROC IMPORT DATAFILE=REFFILE
PROC CONTENTS DATA=WORK.IMPORT; RUN;
/* the proc contents command was used to see how the data looks to ensure everything was included*/
proc print data = work. Import (obs= 15);run;
/* To avoid working on the dataset I created TEMP dataset called Death_due_Covid using the data step */
set work. Import;
/* Using the proc sql commands I was able to look at the deaths, new cases ,total cases and order the output by date and location */
Proc sql; Select
from Death_due_covid (obs = 2000);
order by date and location;
/* Calculating the likelihood of dying from Covid in the different countries if you contract Covid, used the Total_Cases and Totals_deaths to get a Percentage death rate*/
Proc sql; Select
location,date,total_cases,total_deaths, (total_deaths/total_cases)*100 as percentagedeathrate
order by location and date;
/* to look at the max infection rate in the United States ,I created a new dataset named US_Covid */
select location,date ,population Max(total_cases)as HighestinfectionCount, (total_cases/population)*100
order by location,date;
/* Using the code below I was able to calculate the infection percentage rate by dividing the new_cases by total_cases */
Proc sql; Select
location,date,new_cases,total_cases, (new_cases/total_cases)*100 as PercentageofNewUS_infection_rate
order by location ,date;
The above Chart shows Covid 19 infections per million people in the various continents with Oceania having the lowest infection rate and Europe having the highest rate.
Using Proc SQL code below I was able to calculate the Total_deaths_Per_ Million in the US with Total_deaths_per_Million being at its highest on July 18th 2020.
select location ,date ,total_deaths_per_million,population
order by location, total_deaths_per_million desc ;
The Total_cases_per_million was also coded and the highest cases were also recorded on July 18th 2020.
select location ,date ,total_cases_per_million,population
order by location, total_cases_per_million desc ;