A Health Care Data Analytics Project
This analysis is based on data submitted on the LinkedIn learning platform hosted by Thais Cooke on SQL for Health Care Professionals. With the use of electronic records and the integration required to treat an individual as a whole, a strong understanding of computer languages, combined with an awareness of the challenges in healthcare, can help create better outcomes for patients, as well as solutions for problems within the industry (Thais Cooke).
Several data sets were utilized for this training all excel files (csv). The tables include Appointment Analysis, Hospital Records, Lab Results, Outpatient Visits and Patient Tables.
I will analyze the data using SQL Online IDE (sqliteonline.com). The first step is to upload the data into the platform. The analysis will involve the use of case statements, common table expressions (CTEs) and subqueries in addition to simple select statements plus aggregate functions. To pull in data from different tables we will make use of inner joins. The analysis made use of beginner, intermediate and advance SQL queries.
SELECT * FROM Patients_Table
LIMIT 10;
/* used the query to take a quick glance of the table, same query was
excuted on the rest of the tables*/
If you don't execute the limit clause in the select statement you will get all the records in the table. We want to ensure that make optimal use of resources and for query optimization.
SELECT patient_id,
patient_name,
department_name
FROM
Copy_of_Hospital_Records
WHERE department_name = 'Cardiology'
/* this query returns all patients who are in the Cardiology department*/
SELECT Count (patient_id),
department_name
FROM
Hospital_Records
WHERE department_name = 'Cardiology'
Department Cardiology ; Number of Patients 18
/*this query counts the of number patients that are in the Cardiology
department*/
SELECT COUNT (patient_id) as Number_Patients,
department_name
FROM
Hospital_Records
GROUP by department_name
Cardiology 18,Dermatology 15,Neurology 14,Oncology 22,Othopedics 12
and Pediatrics 9
/*Counts the number of patients in each department*/
SELECT
Round
(avg(days_in_the_hospital),1)
AS Number_of_days_Hospitalised
FROM
Hospital_Records
Number_of_days_Hospitalised 2.2 days
/*average number of days patients hospitalised*/
SELECT department_name,
Round
(avg(days_in_the_hospital),1)
AS Number_of_days_Hospitalised
FROM
Hospital_Records
GROUP BY department_name
Cardiology 3.7,Dermatology 0.9,Neurology 1.9,Oncology 2.6,Othopedics 1.7
and Pediatrics 1.6
/*average number of days each patient hospitalised in each department*/
SELECT
patient_id,
department_name,
days_in_the_hospital,
CASE
WHEN days_in_the_hospital <= 3 THEN 'Short'
WHEN days_in_the_hospital <= 5 THEN 'Medium'
ELSE 'Long'
END AS Hospital_Stay
FROM
Hospital_Records;
/*Categorizes patients based on lenght of Hospital stay*/
SELECT
CASE
WHEN days_in_the_hospital <= 3 THEN 'Short'
WHEN days_in_the_hospital <= 5 THEN 'Medium'
ELSE 'Long'
END AS Hospital_Stay,
COUNT (*) AS stay_category
FROM Hospital_Records
GROUP BY
CASE
WHEN days_in_the_hospital <= 3 THEN 'Short'
WHEN days_in_the_hospital <= 5 THEN 'Medium'
ELSE 'Long'
END
Long 11 , Medium 14 , Long 65
/*query counts the number of patients in each stay Category*/
SELECT COUNT (patient_id),
days_in_the_hospital
FROM Hospital_Records
WHERE days_in_the_hospital is 0
40 is the number of Patients never been hospitalised
/*query counts the number of patients never hospitalised*/
SELECT COUNT (patient_id), department_name
days_in_the_hospital
FROM Hospital_Records
WHERE days_in_the_hospital is 0
GROUP BY department_name
Cardiology 3,Dermatology 9,Neurology 9,Oncology 11,Othopedics 5
Pediatrics 3
/*query counts the number of patients never hospitalised*/
SELECT
P.patient_id ,
P.patient_name,
gender
FROM
Patients_Table AS P
INNER JOIN
Hospital_Records AS HR
ON
P.patient_id = HR.patient_id;
/*query lists all those patients who are in Patients table but have
not been hospitalised,are not in the Hospital records*/
SELECT
COUNT (patient_id),
diagnosis
FROM Outpatient_Visits
WHERE diagnosis = 'Diabetes'
Count of patienst with a diagnosis of diabetics is 74
SELECT Count (patient_id),
reason_for_visit
FROM Outpatient_Visits
WHERE reason_for_visit = 'Annual physical' and
diagnosis = 'Diabetes'
Count of patients who are diabetic and have had an annual visit
is 10
SELECT COUNT (patient_id),
reason_for_visit
FROM Outpatient_Visits
WHERE reason_for_visit = 'Checkup' and
diagnosis = 'Diabetes'
Count of patients who are diabetic and went for a checkup
is 4
SELECT O.patient_id,
reason_for_visit,days_in_the_hospital
FROM Outpatient_Visits as O
INNER JOIN
Hospital_Records AS H
ON O.patient_id= H.patient_id
WHERE reason_for_visit = 'Checkup'
AND diagnosis = 'Diabetes'
Count of 4
/* inner join query to find patients who were diabetic and after checkup
ended hospitalised*/
Smoking cigarettes while taking diabetes medication results in ‘insulin resistance’ making it harder for patients to control their diabetes. In the end it results in other complications like heart disease, kidney failure, vision loss and poor circulation.
SELECT
patient_id,
bmi,
CASE
WHEN bmi < 18.5 then 'Underweight'
WHEN bmi BETWEEN 19 and 24 then 'Normalweight'
WHEN bmi BETWEEN 25 and 30 then 'Overweight'
ELSE 'Obese'
END AS Weight_Category
FROM Hospital_Records
/* query classifies patients into different weight cateogories*/
Using the following formula =INT((TODAY()-C2)/365), I added a new age column to the patient's table.
SELECT
hr.patient_id,
hr.bmi,
p.age,
CASE
WHEN hr.bmi < 18.5 THEN 'Underweight'
WHEN hr.bmi BETWEEN 18.5 AND 24.9 THEN 'Normalweight'
WHEN hr.bmi BETWEEN 25 AND 29.9 THEN 'Overweight'
ELSE 'Obese'
END AS Weight_Category,
CASE
WHEN p.age > 50 THEN 'High Risk'
WHEN p.age BETWEEN 30 AND 49 THEN 'Medium Risk'
ELSE 'Low Risk'
END AS Risk_Category
FROM
Hospital_Records AS hr
INNER JOIN
Patients_Table AS p
ON
hr.patient_id = p.patient_id
WHERE
p.age > 18;
/*query catergorizes patients on weight and risk and filters all under 18
years of age*/
SELECT
CASE
WHEN hr.bmi < 18.5 THEN 'Underweight'
WHEN hr.bmi BETWEEN 18.5 AND 24.9 THEN 'Normalweight'
WHEN hr.bmi BETWEEN 25 AND 29.9 THEN 'Overweight'
ELSE 'Obese'
END AS Weight_Category,
CASE
WHEN p.age > 50 THEN 'High Risk'
WHEN p.age BETWEEN 30 AND 49 THEN 'Medium Risk'
ELSE 'Low Risk'
END AS Risk_Category,
COUNT(*) AS Patient_Count
FROM
Hospital_Records AS hr
INNER JOIN
Patients_Table AS p
ON
hr.patient_id = p.patient_id
WHERE
(hr.bmi >= 25)
AND (p.age > 50)
GROUP BY
Weight_Category,
Risk_Category;
Obese & High Risk is 11
Overweight & High Risk 10
/*Counts the number of patients who are Overweight,Obese and
Over 50*/
SELECT
a.patient_id,
h.patient_name,
COUNT(a.appointment_date) AS days_of_admission
FROM
Appointment_Analysis a
INNER JOIN
Hospital_Records h
ON
a.patient_id = h.patient_id
GROUP BY
a.patient_id,
h.patient_name;
/*query identifies patients who went for a visit and got admitted into
hospital*/
SELECT
patient_id,
patient_name,
CASE
WHEN bmi < 18.5 THEN ' Underweight'
WHEN bmi BETWEEN 19 AND 24.5 THEN 'Normal Weight'
WHEN bmi BETWEEN 25 AND 30 THEN 'Overweight'
ELSE 'Obese'
END AS BMI_Weight,
CASE
WHEN family_history_of_hypertension = 'Yes' THEN 1
ELSE 0
END AS Cardiac_Attack
FROM
Hospital_Records
/*this query classifies patients on weight and familiy history of cardiac
arrest*/
SELECT
patient_id,
patient_name,
family_history_of_hypertension,
bmi
FROM
Hospital_Records
WHERE BMI > 30 AND family_history_of_hypertension = 'Yes';
/* query looks at patients who are obese and have a familiy
history of hypertension*/
SELECT
H.patient_id,
H.patient_name,
H.family_history_of_hypertension,
H.bmi,
P.age,
P.gender
FROM
Hospital_Records H
INNER JOIN
Patients_Table P
ON
H.patient_id = P.patient_id
WHERE
H.bmi > 30
AND H.family_history_of_hypertension = 'Yes'
GROUP BY
H.patient_id,
H.patient_name,
H.family_history_of_hypertension,
H.bmi,
p.age,
P.gender;
/*query brought in patient's name age and gender who are obese and have
family history of hypertension*/
SELECT
patient_id,
patient_name,
department_name
FROM
Appointment_Analysis
WHERE department_name ='Cardiology'
UNION
SELECT
patient_id,
patient_name,
department_name
FROM
Hospital_Records
WHERE bmi > 30 and department_name = 'Cardiology'
ORDER BY patient_name
/*using UNION instead of an INNER JOIN */
SELECT
patient_id,
smoker_status,
COUNT(DISTINCT diagnosis) AS chronic_condition_count
FROM
Outpatient_Visits
WHERE
diagnosis IN ('Diabetes', 'Hypertension')
GROUP BY
patient_id
HAVING
COUNT(DISTINCT diagnosis) >= 2
/* Query identifies patients with multiple chronic conditions who might
be at higher risk of hospitalization*/
SELECT
department_name,
Round
(avg(days_in_the_hospital),2)
AS average_recovery_time
FROM
Hospital_Records
GROUP BY
department_name
ORDER BY
average_recovery_time
/*Compare patient outcomes (e.g., recovery times) between
different departments*/
SELECT
visit_date,
visit_id,
diagnosis,
COUNT(*) AS incidence_count
FROM
Outpatient_Visits
WHERE
diagnosis IN ('Diabetes' ,'Hypertension')
GROUP BY
visit_date
ORDER BY
visit_id
/* Query to analyze how the incidence of a specific disease
changes over time*/
SELECT department_name,
AVG((julianday(appointment_time) - julianday(arrival_time)) * 24 * 60)
AS avg_wait_time,
MAX ((julianday(appointment_time) - julianday(arrival_time)) * 24 * 60)
AS max_wait_time,
MIN ((julianday(appointment_time) - julianday(arrival_time)) * 24 * 60)
AS min_wait_time,
COUNT (*) AS Total_Appointments
FROM appointment_analysis
GROUP BY department_name;
/*query to find the wait time for patients at each department*/
SELECT
patient_id,
patient_name,
department_name,
ROW_NUMBER() OVER (ORDER BY department_name)
AS row_num,
RANK() OVER (ORDER BY department_name)
AS rank_num,
DENSE_RANK() OVER (ORDER BY department_name)
AS dense_rank_num
FROM
hospital_records;
*/query creates a window function based on department_name*/
SELECT
patient_id,
department_name,
appointment_date,
arrival_time,
LEAD (arrival_time)
OVER (PARTITION BY department_name ORDER BY department_name)
AS lead_arrival_time
FROM Appointment_Analysis;
SELECT
patient_id,
department_name,
appointment_date,
arrival_time,
LAG(arrival_time)
OVER (PARTITION BY department_name ORDER BY department_name)
AS prev_arrival_time
FROM Appointment_Analysis;
/* Query looks at lead and lag arrivals times partitioned by department*/
SELECT
COUNT (patient_id),
patient_id,
department_name,
appointment_date
FROM
Appointment_Analysis
WHERE
strftime('%m', appointment_date) = '06'
GROUP by department_name;
/*query counts number of patient appointments in June by departments*/
Since SQLite doesn't recognize Extract (month, appointment_date) syntax I utilized the python syntax (strftime) to get the month from the table.