Survey Analysis of Data Professionals
The following is an Analysis of a Data Professionals Survey that was carried out by Alex the Analyst. My findings will slightly be different as I took a slightly different approach to cleaning the Data in Power BI before starting the visualizations. For instance my Average Salary will differ as I rounded up the figures instead of doing decimal points. I will also transform the Data into a Database and run Sql queries on it.
Data Analysis Tools
The Data was in an Excel Format and I will use Power BI to do Visualizations and Sql to run queries on it.
Data Cleaning
I pulled the Data from Alex the Analyst Github and moved it into Power BI. I did the following Transformations on the data in Power BI.
I did several transformations of the Data removed Blank Columns and also removed most of the Null Values which will not help in the analysis. Created a new Column to Calculate the Average Salary and then rounded up the Average Salary of the different Data Analysts. Removed ‘Others’ in the Data Professional Career Column , the Country and the Language Coding Preference of Data Professionals to create a single ‘Other’ in those Columns.
Analysis & Findings ; Power BI
The total number of Respondents were 630 and the Average Age of the Respondents was 29.87 which makes this a fairly young group of professionals.
The above Stacked Chart gives us a pretty good idea of the Average Yearly Salary and as we can see that Data Scientist are at the top with an Yearly Salary of US$ 77.36 per year and at the bottom are Students or those looking for Data Analyst positions most likely interns at US$ 7.93 per year.
From Chart 2 above the most popular coding language is Python followed by R and Sql at a distance second and third respectively. In the Chart 3 below the highest number of Respondents are from the USA with 261 while the fewest are from Canada at only 40 respondents.
Upward mobility satisfaction seems to be highest among US Data professionals with Canada having the lowest number of Data Professionals satisfied with Upward Mobility. When the Respondents were asked ‘How Satisfied were they with their current Salary?’ a large number of them from the USA 41% stated that they were satisfied with their Salaries followed by ‘Other’ at 35% . At the bottom was Canada only 5.1% are satisfied with their current Salary.
When the Respondents were asked ‘How happy are they with their Work/Life balance ? ‘ over half of them at 5.74 are happy with this while less then half of them at 4.27 are not satisfied with their current Salary as Data Professionals.
In the above donut Chart I wanted to find out what was the average salary annually , per the chart above the women earned slightly more than the women at US$ 55.37 per annum.
Analysis & Findings : Sql
The process of converting the Survey data into a database was not easy to say the least. The first thing I did was to convert the unique ID generated by the system into a numeric ID by using the =ROW(A1) function and dragging the formulae down to autofill the numbers I was able to get 630 respondents as above. Then I had to make Row 1 as my Column Header by using the Table option. I then deleted all the empty columns and renamed all the Column Headers as they had been named by questions respondents were asked to answer like ;Q4 — What Industry do you work in? Q6 — How Happy are you in your Current Position with the following? (Salary) and so forth . I changed the heading to be more suitable for analysis like ; yearly_salary,Happy _with_Salary, Work_Life_ Balance and so forth. After finishing with all the clean I saved the excel file as a CSV file (renamed it Survey_Data2) and uploaded it into https://sqlizer.io/ to convert this into a database that can easily be queried in a SQL platform in this case I will be using https://sqliteonline.com/ to run queries.
Here are the Columns that were created for the database;
Unique_ID INTEGER,
Current_Role TEXT,
career_switch TEXT,
yearly_salary TEXT,
work_Industry TEXT,
Programming_Language TEXT,
Happy_with_Salary INTEGER,
Work_Life_Balance INTEGER,
Happy_Coworkers INTEGER,
Happy_Management INTEGER,
Happy_Upward_Mobility INTEGER,
Happy_Learning INTEGER,
Entry_analytics TEXT,
important_thing TEXT,
Male_Female TEXT,
Current_Age INTEGER,
Country_origin TEXT,
Education_Level TEXT,
Ethnicity TEXT
The first sql command that I ran was to take a look at the numbers of records that I had . I then did a Count function to see if I will get 630 rows.
SELECT unique_id count from Survey_Data2
630 Unique IDs were listed with this command.
SELECT COUNT(unique_id) FROM Survey_Data2
I also wanted to find out what was the average age of the Data Professionals who took the survey and using the code below came with a number similar to the one that was calculated by Power Bi.
Select avg(current_age)as average_age from Survey_Data2
SELECT mode (current_age) as mode_age from Survey_Data2
Select unique_id,current_age,career_switch from Survey_Data2
where current_age = 22 and 65;
SELECT COUNT(career_switch) AS career_switchers
FROM Survey_Data2
WHERE current_age BETWEEN 22 and 50
599
SELECT COUNT(career_switch) AS career_switchers
FROM Survey_Data2
WHERE current_age > 50
12
SELECT COUNT(career_switch) AS career_switchers
FROM Survey_Data2
WHERE current_age > 22 and happy_with_salary
513
SELECT COUNT(career_switch) AS career_switchers
FROM Survey_Data2
WHERE current_age > 22 and NOT happy_with_salary
65
SELECT COUNT(career_switch) AS career_switchers
FROM Survey_Data2
WHERE current_age > 22 and happy_upward_mobility
509
SELECT COUNT(career_switch) AS career_switchers
FROM Survey_Data2
WHERE happy_coworkers
593
ALTER TABLE Survey_Data2
RENAME COLUMN male_female to gender;
I decided to change the name of the Column to make my analysis easy as far as gender is concerned.
SELECT COUNT (current_role) as current_position from Survey_Data2
WHERE yearly_salary > 150
309
Select COUNT (current_role) as current_position from Survey_Data2
WHERE yearly_salary < 40
336
SELECT AVG(yearly_salary) as AVG
FROM Survey_Data2;
Average yearly salary is US$38.6k per annum
SELECT MAX (yearly_salary) as max
FROM Survey_Data2;
Max Salary is 86k — 105k per annum
SELECT MIN (yearly_salary) as min
FROM Survey_Data2;
min Salary is 0– 40k per annum
SELECT country_origin, education_level,happy_with_salary
FROM Survey_Data2 WHERE happy_with_salary > 5 and education_level NOTNULL
From the above code most of those who scored Happy_with_salary > 5 were mostly from the United States and had a Bachelors or Masters degree.
SELECT country_origin,gender, education_level,happy_with_salary
FROM Survey_Data2 WHERE happy_with_salary > 5 and education_level NOTNULL
From the above code most men scored happy_with_salary > 5
SELECT country_origin,gender, education_level,happy_with_salary
FROM Survey_Data2
WHERE happy_with_salary < 5 and education_level NOTNULL
In the same manner more men than women scored happy_with_salary< 5
SELECT AVG (happy_with_salary) AS AVG FROM Survey_Data2
In Chart 2 above we calculated how happy_with-salary were the Data Professionals and we got a figure of 4.27 which matches my calculation using sql after being rounded off.
SELECT AVG (work_life_balance) AS AVG FROM Survey_Data2
In the same manner we got a similar figure when we calculated the average of how satisfied were the data professionals with work_life_balance.
From the sql queries below we see that the most popular Programming Language is Python , followed by R and Sql . Sql was listed under Others but came a distant 3rd as a favorite programming language among Data Professionals. These findings are consistent with what we found in Chart 2; Favorite Programming Language.
SELECT COUNT (programming_language) FROM Survey_Data2
WHERE programming_language = "R"
SELECT COUNT (programming_language) FROM Survey_Data2
WHERE programming_language = "Python"
SELECT COUNT (programming_language) FROM Survey_Data2
WHERE programming_language = "JavaScript"
I also wanted to answer the question how difficult was it for different data professionals to enter the Data Analytics field and if race played a factor in this. This was a mixed bag of results with Data Analyst stating that Entry into Analytics was ‘Very Difficult’ for them while Data Engineers said it was ‘ Easy’. Also most women found it ‘ Very Easy’ to enter the data analytics field. While on racial basis a significantly larger number of Blacks/African American found entry into the analytics field ‘Difficult’.
SELECT COUNT (entry_analytics),current_role FROM Survey_Data2
WHERE entry_analytics = 'Difficult'
SELECT COUNT (entry_analytics),entry_analytics,current_role
FROM Survey_Data2
WHERE entry_analytics = 'Easy'
SELECT COUNT (entry_analytics),entry_analytics,current_role
FROM Survey_Data2
WHERE entry_analytics = 'Very Difficult'
SELECT COUNT (entry_analytics),entry_analytics,gender,current_role
FROM Survey_Data2
WHERE entry_analytics = 'Very Easy'
SELECT COUNT (entry_analytics), entry_analytics,gender,ethnicity,
education_level
FROM Survey_Data2
WHERE entry_analytics = 'Very Difficult'
SELECT COUNT (entry_analytics), entry_analytics,gender,ethnicity,
education_level
FROM Survey_Data2
WHERE entry_analytics = 'Difficult'
SELECT COUNT (entry_analytics), entry_analytics,gender,
ethnicity,education_level
FROM Survey_Data2
WHERE entry_analytics = 'Very Easy'
SELECT avg (yearly_salary) as average_salary FROM Survey_Data2
The average salary also differed by Current_role they were in and also by County for instance the Data Analysts in India make significantly less than their United States and Canadian Counterparts.
SELECT country_origin,current_role,avg (yearly_salary) as average_salary
FROM Survey_Data2
WHERE current_role = 'Data Analyst' AND country_origin = 'India'
SELECT country_origin,current_role,avg (yearly_salary) as average_salary
FROM Survey_Data2
WHERE current_role = 'Data Analyst' AND country_origin = 'Canada'
SELECT country_origin,current_role,avg (yearly_salary) as average_salary
FROM Survey_Data2
WHERE current_role = 'Data Analyst' AND country_origin = 'United States'
SELECT country_origin,current_role,avg (yearly_salary) as average_salary
FROM Survey_Data2
WHERE current_role = 'Data Engineer' AND country_origin = 'United States'
SELECT country_origin,current_role,avg (yearly_salary) as average_salary
FROM Survey_Data2
WHERE current_role = 'Data Engineer' AND country_origin = 'Canada'
SELECT country_origin,current_role,avg (yearly_salary) as average_salary
FROM Survey_Data2
WHERE current_role = 'Data Engineer' AND country_origin = 'India'
Using the code below I was able to pull different average Salaries based on current age, current role and country of origin. In addition I was able to pull the most important issue to respondents and “Better Salary” was the most important issue while “ Good Culture” was ranked the lowest.
SELECT average_salary,current_age,current_role,country_origin
FROM Survey_Data2
WHERE average_salary < 100 and current_age < 30
SELECT COUNT (_important_thing) as most_important_issue
FROM Survey_Data2
WHERE _important_thing = 'Better Salary'
SELECT COUNT (_important_thing) as most_important_issue
FROM Survey_Data2
WHERE _important_thing = 'Remote Work'
SELECT Count (_important_thing) as most_important_issue
FROM Survey_Data2
WHERE _important_thing = 'Good Work/Life Balance'
SELECT Count (_important_thing) as most_important_issue
FROM Survey_Data2
WHERE _important_thing = 'Good Culture'
Finally a very interesting fact that I found out was that the oldest person working in analytics is a 92 year old from India who is currently a student, its never too late to make that switch. While the youngest one is 18 years old a Data Scientist in the United States.
SELECT MAX (current_age) AS MAX_AGE, current_role,country_origin
FROM Survey_Data2
SELECT MAX (current_age) AS MIN_AGE,current_role ,country_origin
FROM Survey_Data2
Conclusion
The analysis above was carried out using Power BI and running Sql queries thank you very much for reading my page; should you have any suggestions, comments or have a better way of running some of these queries or do them totally differently please let me know. Once again thank you very much for reading my post. Don't forget those claps please, thanks.