Career Foundry | Spring 2023

Main Focuses: Excel, Tableau

Purpose

During my CareerFoundry course, I was introduced to SQL through a project that analyzed movie rentals for a fictitious company called Rockbuster Stealth

Context

The United States has an influenza season where more people than usual suffer from the flu. Some people, particularly those in vulnerable populations, develop serious complications and end up in the hospital. Hospitals and clinics need additional staff to adequately treat these extra patients. The medical staffing agency provides this temporary staff.

Objective

Help a medical staffing agency that provides temporary workers to clinics and hospitals determine when to send staff, and how many, to each state for the 2018 flu season on an as-needed basis.

Scope

The agency covers all hospitals in each of the 50 states of the United States, and the project will plan for the upcoming influenza season.

 
 
  • Understanding business requirements

  • Data transformation

  • Data integration

  • Statistical hypothesis testing

  • Forecasting

  • Visual analysis

  • Storytelling in Tableau

  • Presenting results to an audience

 

Success Factors

The project’s success will be based on:

  • A staffing plan that utilizes all available agency staff per state requirements, without necessitating additional resources

  • Minimal instances of understaffing and overstaffing across states

Working under these assumptions:

  • Vulnerable populations suffer the most-severe impacts from the flu and are the most likely to end up in the hospital. Vulnerable populations are patients likely to develop flu complications requiring additional care, as identified by the Centers for Disease Control and Prevention. These include adults over 65 Page 2 years, children under 5 years, and pregnant women, as well as individuals with HIV/AIDs, cancer, heart disease, stroke, diabetes, asthma, and children with neurological disorders.

  • Flu shots decrease the chance of becoming infected with the flu.

Deliverables

  • Provide information to support a staffing plan, detailing what data can help inform the timing and spatial distribution of medical personnel throughout the United States.

  • Determine whether influenza occurs seasonally or throughout the entire year. If seasonal, does it start and end at the same time (month) in every state?

  • Prioritize states with large vulnerable populations. Consider categorizing each state as low-, medium-, or high-need based on its vulnerable population count.

  • Assess data limitations that may prevent you from conducting your desired analyses.

 

01 Understanding Business Requirements

After going over business requirements, initial clarifying questions were developed such as: When does flu season occur? Which states have the highest amounts of vulnerable populations?

These led to funneling questions such as: Is flu season the same length of time every year? In areas with large numbers of influenza related deaths, are individuals receiving preventative care, such as flu shots?

As this project worked with medical information, there were concerns of ethics and privacy as well. However as the data used in the project came from government sources, anonymity could be confirmed.

02 Preparing Data and Data Integration

This project contained two datasets that needed to be merged together. To prepare for integration data cleaning was performed, data quality and integrity checks were completed, and extraneous variables were identified. Some data checks included misspellings, duplicates, and missing data. The datasets were combined using the VLOOKUP in Excel.

03 Statistical Analyses and Statistical Hypothesis Testing

Derived variables were created such as Vulnerable Population, Non-Vulnerable Population, Vulnerable Population Deaths, Non-Vulnerable Population Deaths. The variance and standard deviation was calculated for key variables. Variables with potential relationships were identified and tested for correlation.

After this, a statistical hypothesis was developed:

H0: Influenza related deaths within vulnerable age groups is less than or equal to non vulnerable age groups.

HA: Influenza related deaths within vulnerable age groups is greater than non vulnerable age groups.

Setup of the statistical test done in Excel.

Results of t-test performed in Excel.

At this point, the next step in analysis was to determine which states have higher proportions of vulnerable-aged peoples to aid insights as to where medical staff should be placed during flu season.

04 Data Visualization and Storytelling

At this point, visualizations were created using Tableau to illustrate the key insights found during analysis. Then a Tableau Storyboard was created with a video presentation for stakeholders. You can see this presentation here.

 

Interim Report

For this project, an interim report was created to summarize findings before any visualizations were made. This report can be found here.

Where Are Flu Death’s Occurring?

This visualization compares the total population of a state and the number of flu related death’s that occur. The larger a state's population, the more influenza related deaths are reported. The top ten states with the largest populations, California, New York, Texas, Florida, Illinois, Pennsylvania, Ohio, Michigan, Georgia, and North Carolina, make up 44% of all influenza related deaths in the United States from 2009 - 2017.

When Are Flu Death’s Occurring?

This visualization depicts the seasonality of flu season and influenza related deaths in the United States from 2009-2017.

Flu season begins in October and ends in March. Year to year, there are minor fluctuations between these start and end points.

January is the most fatal month of flu season. Possible explanations are the colder weather and that this month immediately proceeds holiday travelling.

July, August, and September have the lowest reported numbers of influenza realted deaths.

Forecasting: Flu Deaths for 2018

This forecast model shows projected flu deaths for top six most populous states through 2018. Based on this forecast, it is recommended to give special focus to California and New York in the month of January.

While all these states have high populations, there is a noticeable leap from New York from Florida, Texas, Pennsylvania, and Illinois. Then from New York to California there is another noticeable leap betwen states.

Therefore the analyst would like to propose a weighted scale to determine how much staff to allocate to each state at a particular time.

 

The stakeholder’s posed the following questions for the analyst:

Were there any limitations that prevented you from conducting an analysis? Think of these in terms of a future project or wish list (i.e., “If I had x, I would have been able to do y.”).

Yes, this analyses only address two parts of the vulnerable population as defined by the CDC. If the analyst had data on pregnant women or individuals with the previously defined medical conditions, a more accurate plan could be made for the whole vulnerable population.

If there had been data on influenza cases that did not result in death, the analyst could have a better idea on how much medical staff is needed. Because of the data available the analyst has only been able to look at flu data in terms of how many people have died to develop a plan, when not every person who contracts influenza dies. This could result in a skewed understaffing of how influenza affects the United States population during the 2018 flu season.

Data for staff to patient ratios for each state would have been very helpful in creating the weighted scale that will be mentioned in recommendations for stakeholders.. This data could provide insight for more detailed recommendations.

Did your data have any limitations that may have affected your results? Consider this in terms of data quality and data bias.

Yes, the CDC Flu Death data set had many entries labeled suppressed, after researching the data source, the analyst was able to determine that suppressed values represented values between zero and nine that were altered for privacy reasons. With this information, the analyst was able to replace these entries with a random number between zero and nine. However, as these values were randomly generated, the analyst cannot guarantee total accuracy.

How might you monitor the impact of the staffing changes you recommended?

One plan would be hearing directly from the medical staff and patients with either a survey or a ranking scale about new staffing strategies. Another would be tracking staffing numbers to hospital admission rates.

Is there a metric that could be used for monitoring this impact?

The percentage of understaffing or overstaffing according to each state of staff to patient ratio would be a useful metric.

 

What Has Been Learned from Statistical Analyses So Far

Flu season peaks in January, during this month more influenza related deaths should be expected and prepared for.

Those who are under the age of five and sixty-five years of age and older are the most vulnerable to influenza.

Possible Plans of Action

From October to November, assign staff equally in each state according to the desired staff to patient ratio of the medical agency.

Beginning in December, gradually start allocating more staff to states with higher vulnerable populations from states with lower vulnerable populations. The goal is for states with higher vulnerable populations to be adequately staffed for flu season peak in January.

In mid-February, begin reallocating staff to initial locations.

Further Considerations for Successful Deployment

What is the current staffing allocation for each state?

What is the maximum available staff currently?

What is the ideal ratio of staff to patients within each state?

Is there additional data available that could provide more insight to the other sectors of the vulnerable population (pregnant women, those individuals with pre-existing medical conditions)?

Considerations Beyond Project Scope

What is the accessibility of medical care for each age group? Each ethnic group? Within each state?

Are there adequate hospitals or clinics in each state or highly populated area?

Is the agency prioritizing proactive or reactive medical treatments?

 

Special thanks to my tutor, Ayya Elzarka, and my mentor, John Kocur, for all their feedback.