IBM HR Analytics Employee Attrition & Performance
Introduction
In most of my analysis I have used SQL and either Power BI or Tableau to analyze the data and draw conclusions but in this project I will only be using SQL to find insights and draw conclusions on Employee Attrition. This is a fictitious dataset that I found on Kaggle and felt that it might be a great dataset to practice on. The dataset came as an Excel CSV File and you can find the dataset on this link (https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset)
Data Cleaning and Preparation
After downloading the data set from Kaggle I first made sure there were no duplicates. To ensure there are no duplicates I highlighted the entire Excel sheet went to the data tab then clicked on the remove duplicates option. I then recoded the legend in the dataset where there was a number I filled in the required legend. For example on the Education where it was
1 ‘Below College’,2 ‘College’ I renamed them as Below College , Collège etc., This applied also to all the other fields that had similar numbered coding.
As mentioned earlier the only tool of analysis that I will be using is SQL and the platform that I will use is https://sqliteonline.com/
Analyzing Employee Attrition & Performance
The first thing I did was take a look at the table to see the various Columns and I just limited it to the first 5 rows of the table. This gave me a simple overview of the data set. This Data set has 23 Columns and 1470 rows (records) as deduced by the Sql Count Function query below.
SELECT *FROM hr_employee limit 5
The next query was to Count the number of records and also find the number of Female and Male employees in the data set.
SELECT COUNT (*) FROM hr_employee
1470
/*total number of employees is 1470*/
SELECT
COUNT (CASE WHEN gender='Male' then 1 end) as male_cnt,
COUNT (CASE WHEN gender='Female' then 1 end) as female_cnt,
COUNT (*) as total_cnt
FROM hr_employee
Female Employees are 588
Male Employees are 882
Total Number of Employess 1470
/* Employee Gender Count*/
I also wanted to find out the Average Age of Employees and also the Average Age of Female and Male Employees.
SELECT ROUND (AVG (AGE) ,2) as AVERAGE_Employee_AGE
FROM hr_employee
AVERAGE EMPLOYEE AGE is 36.92 Years OLD
SELECT
Round (AVG (age),2)AS FEMALE_AVG_AGE
FROM hr_employee
WHERE gender IS 'Female'
AVERAGE FEMALE EMPLOYEE AGE IS 37.33
SELECT
Round (AVG (age),2) AS MALE_AVG_AGE
FROM hr_employee
WHERE gender IS 'Male'
AVERAGE MALE EMPLOYEE AGE IS 36.65
/* Calculated the Average Employee age and Average Employee Age by Gender*/
From the above analysis we found out that the Average Age of Female Employees was slightly higher than that of the Male employees. I also wanted to explore and see how many employees are Single, Married or Divorced and using the following codes below I was able to pull up the numbers.
Select COUNT(*) FROM hr_employee
where maritalstatus is 'Single'
SELECT COUNT(*) FROM hr_employee
where maritalstatus is 'Married'
SELECT COUNT(*) FROM hr_employee
where maritalstatus is 'Divorced'
/* Total number of Employees who are either Single, Married or Divorced*/
OR
SELECT
COUNT (CASE WHEN maritalstatus='Single' then 1 end) as Single_cnt,
COUNT (CASE WHEN maritalstatus='Married' then 1 end) as Married_cnt,
COUNT (CASE WHEN maritalstatus ='Divorced' then 1 end) as Divorced_cnt,
COUNT (*) as total_cnt
FROM hr_employee
Single Employees 470
Married Employees 673
Divorced Employees 327
Employees Total Count 1470
/* Total number of Employees who are either Single, Married or Divorced*/
SELECT JobRole,
ROUND(AVG(Monthlyincome), 2) AS avg_monthly_income
FROM hr_employee
GROUP BY JobRole
ORDER BY avg_monthly_income DESC;
/*Average Monthly Income by Job Role*/
Using the above Sql Code I was able to find out the Average Monthly income among the various job roles with Managers earning the highest amount.
Also I wanted to find out the Average Monthly Income by Department and Job Role. With this analysis Managers at different departments were still the highest paid employees.
SELECT Department,
JobRole,
ROUND(AVG(Monthlyincome), 2) AS avg_monthly_income
FROM hr_employee
GROUP BY Department,
JobRole
ORDER BY Department,
avg_monthly_income DESC;
/*Average Monthly Income by Department and Job Role*/
Using the Sql code below I was able to find out the educational level of the employees. Most of the employees had a Bachelor’s degree and a few were PhD holders.
SELECT
COUNT (CASE WHEN education ='Below College' then 1 end) as Not_Eduacted,
COUNT (CASE WHEN education ='College' then 1 end) as Slightly_Educated,
COUNT ( Case WHEN education = 'Bachelor' THEN 1 END) as Educated,
COUNT (CASE WHEN education ='Master' then 1 end) as Very_Eduacted,
COUNT (Case when education = 'Doctor' THEN 1 END) as Highly_Educated,
COUNT (*) as total_cnt
FROM hr_employee
Not Educated 170
Slightly Educated 282
Educated 572
Very Educated 398
Highly Educated 48
/* Education levels of the various employees*/
In this query I wanted to find out the attrition by Employee age and it seems like the age between 25– 34 had a high attrition rate among the different age groups.
SELECT CASE
WHEN Age BETWEEN 18 AND 24 THEN '18-24'
WHEN Age BETWEEN 25 AND 34 THEN '25-34'
WHEN Age BETWEEN 35 AND 44 THEN '35-44'
WHEN Age BETWEEN 45 AND 54 THEN '45-54'
WHEN Age BETWEEN 55 AND 60 THEN '55-60'
ELSE '60+'
END AS age_range,
Attrition,
COUNT(*) AS count_of_employees
FROM hr_employee
GROUP BY age_range,
Attrition
ORDER BY count_of_employees DESC
/*the attrition rate among the different age groups*/
SELECT JobRole,
ROUND(AVG(age), 2) AS avg_age
FROM hr_employee
GROUP BY JobRole
ORDER BY avg_age DESC
/*Average Age of Employees by Jobrole*/
Using the Sql query below Sales Representatives have the lowest job satisfaction while Manager and Sales Executive were highly satisfied in their roles.
SELECT jobrole, jobsatisfaction FROM hr_employee
GROUP BY jobrole
ORDER BY department
/* Job satisfaction by Job Role*/
In this query I was interested in finding out how many employees are satisfied with their work environment ,seems that most employees are either Satisfied or Highly Satisfied with their work environment.
SELECT EnvironmentSatisfaction,
COUNT(*) AS num_of_employees
FROM hr_employee
GROUP BY EnvironmentSatisfaction
ORDER BY EnvironmentSatisfaction IN
('Medium','Satisfied', 'Highly Satisfied');
/*work environment satisfaction*/
In this query I wanted to find out how involved the employees were with their jobs. A majority of the employees were Involved with their work.
SELECT jobinvolvement,
COUNT (*) as Num_of_Employees
FROM hr_employee
GROUP by jobinvolvement
ORDER By jobinvolvement IN
('Involved','Highly Involved','Medium','low')asc
/*how involved the employees were with their jobs*/
In this query I was interested in finding out what employees thought about Work Life Balance. A majority of the employees thought that their Work Life Balance was pretty good.
SELECT WorkLifeBalance,
COUNT(*)AS num_of_employees
FROM hr_employee
GROUP BY WorkLifeBalance
ORDER BY WorkLifeBalance
IN ('Low','Mid','Good','Very-Good')
/*Employees' view on Work Life Balance */
In this query I wanted to find out the distribution of employee performance ratings across different departments.
SELECT Department,
PerformanceRating,
COUNT(*) AS Count_of_employees
FROM hr_employee
GROUP BY Department,
PerformanceRating
ORDER BY Department,
Count_of_employees;
/*Finding the distribution of employee performance
ratings across different departments*/
In this final query I wanted to find out the distribution of employee performance ratings.
SELECT
PerformanceRating,
COUNT(*) AS Count_of_employees
FROM hr_employee
GROUP BY
PerformanceRating
ORDER BY
Count_of_employees;
/*distribution of employee performance ratings*/
Conclusion
The above analysis gives us an overview of the entire dataset. I hope you enjoyed reading this piece if you have any queries or have a better way of analyzing the dataset please free to comment or share below. Once again thanks for reading my page and please share with friends or Colleagues. And don’t forget to give me some claps for a job well done if you think so …