Instacart Shopping Analysis

CareerFoundry | Summer 2023

Marketing strategy for an online grocery store.

Main Focus: Python

Purpose

During my CareerFoundry course, I was introduced to Python through a project that analyzed shopping behaviors of Instacart customers.

Context

Instacart has very good sales, but they want to uncover more information about their sales patterns. They are considering using a targeted marketing strategy to do this. They want to target different customers with applicable marketing campaigns to see whether they have an effect on the sale of their products.

Objective

Analyze sales patterns through exploratory analysis to deliver insights on how to segment customers of an online grocery store, Instacart.

Goal

My role as a data analyst is to deliver a final report that answers business questions with data driven recommendations and visualizations that profile customers based on their purchasing behavior.

 

Data

Tools

Python and relevant libraries:

  • pandas

  • NumPy

  • matplotlib

  • seaborn

 
  • Data wrangling

  • Data merging

  • Deriving variables

  • Grouping data

  • Data aggregation

  • Reporting

  • Creating visualizations with Python

 

Sales Questions

  • What are the busiest days of the week and hours of the day?

  • Are there particular times of the day when people spend the most money?

  • Instacart has a lot of products with different price tags. They want to use simpler price range groupings to help direct their efforts.

  • Are there certain types of products that are more popular than others? They want to know which departments have the highest frequency of product orders.

Marketing Questions

  • What’s the distribution among users in regards to their brand loyalty?

  • Are there differences in ordering habits based on a customer’s loyalty status?

  • Are there differences in ordering habits based on a customer’s region?

  • Is there a connection between age and family status in terms of ordering habits?

  • What different classifications does the demographic information suggest?

  • What differences can you find in ordering habits of different customer profiles? Consider the price of orders, the frequency of orders, the products customers are ordering, and anything else you can think of.

 

01 Data Cleaning

  • The first step after initial observations of the data sets was to perform cleaning processes. This step is essential to avoid misleading or skewed results in an analysis and was made much easier by using Python.

  • Data cleaning was performed on three data sets (orders, products, and customers) using standard data wrangling and consistency techniques including:

    • Dropping columns, renaming columns, changing data types, transposing data

    • Ensuring consistent formatting, addressing mixed data types, missing values, and duplicate values

02 Merging Data

  • After cleaning, the next step was to merge the three data sets so that they could be analyzed together.

  • For a brief summary, this merged dataset contained information about the orders for each user, the time and date of orders, product names, departments of each product, and customer demographics.

  • The result was a final cleaned, merged data set of over 32 million rows.

03 Segmenting Customers

  • During this process, it became clear why the pandas library is a better option when working with large datasets.

    • There was not a lag in processing time when executing lines of code as opposed to using Excel, not to mention that Excel couldn’t handle all the observations in the dataset.

  • By using the loc() function, it was possible to create different segmentations of Instacart users to answer business questions posed by the stakeholders quickly and easily.

  • Below is an example of code used to create the segmentation of loyal customers using the loc() function:

 

After creating customer profiles in the previous step, visualizations were created to help understand these relationships.

The visualizations were created using the matplotlib and seaborn libraries.

Here are some visualizations created for this project:

 

All the questions from stakeholders were answered, but here are the main takeaways from analysis:

Question: What are the busiest days of the week and hours of the day?

Finding: The busiest hours are from 10 am - 3pm. The two busiest days are Saturday and Sunday.

Recommendation: Schedule ads before 10 am and after 3pm, Monday-Thursday as these are low order traffic times. These steps could increase orders during these times.

Question: Are there certain types of products that are more popular than others? They want to know which departments have the highest frequency of product orders.

Finding: Produce, dairy/eggs, and snacks are the three biggest departments.

Recommendation: Focus promotions on these three departments. Produce and dairy/eggs have a limited shelf life and will always be repurchased. I wouldn't recommend promoting the least popular departments such as bulk, alcohol, and pets as all these categories have specific stores that cater to them.

Question: Is there a connection between age and family status in terms of ordering habits? What different classifications does the demographic information suggest? What differences can you find in ordering habits of different customer profiles? Consider the price of orders, the frequency of orders, the products customers are ordering, and anything else you can think of.

Finding: The majority of customers are middle aged, high to middle income, who purchase produce, dairy/eggs, and snacks the most, and are most likely part of a family or have dependents, as opposed to being single.

Recommendation: The demographic information suggests to me that most people are using Instacart for convenience. In the app have a section of 'most-ordered' products or 'buy again' products so users can quickly add them to orders. Also consider a scheduled or 'delivery in x amount of time' orders. This could take some of the load off peak order times

 

What Went Well

  • One thing I feel I did well during this project was keeping my scripts organized and commented well. Not only did this help other people understand my scripts, but it also helped me understand them too.

What Didn’t Go Well

  • During the initial steps of this project, I experienced issues with RAM usage and my computer crashing. I ultimately decided to increase my device’s RAM from 16 GB to 32GB.This made my analysis MUCH easier.

  • When merging my data sets, I initially merged them in the wrong order which led to a lot of confusion. But with the help of my tutor, I was able to determine what I had done wrong and remerge them correctly.

Future Steps

  • I would like to work on more projects like this in the future. I really enjoyed being able to answer questions using data and uncover new insights that weren’t initially asked about.

Final Thoughts

  • Overall, I really enjoyed this project. In the beginning I was intimidated with the prospect of using Python for data analysis.

  • During the course of this learned that there are so many different applications for programming and while programming can be difficult, it’s not impossible. Now I see the benefits of using Python in data analytics and I prefer it over other tools, such as Excel.

 

Special thanks to my tutor, Ayya Elzarka, and my mentor, John Kocur, for all their feedback.