Sample Claims for a Hospital
Introduction
This is a fictional Sample Claims for a hospital that was submitted for analysis. There were a few questions that the Stakeholders and Management wanted answered;
- What are the top 5 most common valid procedure codes?
- How many patients are associated with at least one of those procedure codes?
- What are the top 5 most common valid diagnosis codes?
- Ensure that Date values are arranged in a logical and chronological manner.
- Review this file for overall data quality and highlight any potential
problems so that they can be discussed with the data provider.
In addition Management provided a data dictionary that I can use for analysis;
SQL, Power BI and Excel will be the tools that will be utilized to do analysis and data cleaning.
The Data Set
Using a simple SQL command the Select Command I was able to take a look at the dataset. I limited the number of returns to 5 only as I did not need to see the entire table. The Data set had 6 Columns namely Patient_id, Claim_id , diagnosis_code, Procedure_Code , date_Service and date_received. The final dataset had 74 patients that were treated in the month of January . The date_Service denotes the day the patient was treated and date_received refers to the date the Claims were submitted by the Hospital.
Data Clean Up and Preparation
The Data Set came in an Excel file and I started the clean up of the Data. I removed all the Blank Rows in the 6 columns that I found and saved them in a different file. That data will still need to be worked on to find out why they are blanks. Secondly I also found out that the procedure code 99999 was actually not a valid procedure code and hence all these were also eliminated and put in the different Excel sheet. All these blanks files will need further analysis to find the missing variables .Then I removed duplicates from the final Excel Sheet. As per request from the Managment the data was sorted using the date_service chronologically from the 1st January 2021 to January 31st 2021.
The final Excel Sheet was renamed Cleaned_Sample_Claims_DATA.
Analysis Tools
SQL, Power BI and Excel will the be the tools that will be utilized to do analysis and data cleaning. I uploaded the data to https://sqliteonline.com/ to run SQL queries on the dataset and do further analysis. The Data was also uploaded to Power BI to create visualizations .In Power BI there was no need use Power Query to Transform the Data Set any further I just made sure that they had the right data type.
Chart 1 above was produced using Power Bi and as we can see the total number of claims is 2436 . The number of patients that were seen in the month of January were 74 while 179 number of Procedure Codes were utilized in the same month. Please note that more than 74 patients were seen in January 2021 but there were issues in processing claims as some of them had blanks fields.
Using the Sql Count query below I was able to find the total number of patients as found in Power BI
In the above Chart 2 we see that the total number of Claims were submitted during the 15th of January which equates to the high number of procedures performed on the 14th of January.
What are the top 5 most common valid procedure codes? This is one of the questions asked by Management and using Sql I was able to find the 5 most common valid procedure codes.
What are the top 5 most common valid diagnosis codes? This was another question that management wanted to find out and using Sql I was able to find the 5 most common valid diagnosis codes.
Analyzing Missing Data Points File
I combined all the data that had some missing data such as patient_id,procedure_code, diagnosis_code and wrong code number 99999. I realized that to find the missing variables we will need a ‘Master dataset’ that I do not have. Despite this I was able to save the data in an Excel file renamed it Cleaned_Sample_ Claims and uploaded it into SQL Lite for analysis.
There were 1249 procedure_codes that were lsited as 9999, and 2 Null Claim_ids. Per the Data dictionary Patient_id is not Nullable.
Conclusion
In conclusion most of the questions that the Stakeholders and Management asked were answered. In addition it was noted that the Hospital had a large number of patients in the middle of the month and the number tempered down towards the end of the month. Using Power BI and Sql helped in creating dashboards and analyzing the data set effectively.