Rockbuster Stealth Video Rental

Career Foundry | Summer 2023

Main Focuses: SQL, Tableau

See this project on GitHub

Purpose

During my CareerFoundry course, I was introduced to SQL through a project that analyzed movie rentals for a fictitious company called Rockbuster Stealth

Context

Rockbuster Stealth is a movie rental company looking to launch an online video rental service in order to stay competitive with other streaming services.

Objective

Use SQL to analyze the data and provide data driven answers to any ad-hoc business questions that other departments may have.

Goal

Help Rockbuster Stealth launch an online video rental service using its current movie rental services.

 

Data

  • Rockbuster Stealth Dataset provided by CareerFoundry

Tools

  • SQL

  • pgAdmin4

  • DbVisualizer

  • Tableau

  • Excel

 
  • Understanding business requirements

  • Relational databases

  • Cleaning and summarizing data

  • Database querying

  • Filtering Data

  • Subqueries

  • Common Table Expressions

  • Query Optimization

 

•What was the average rental duration for all videos?

•Which countries are Rockbuster customers based in?

•Where are customers with a high lifetime value based?

•Do sales figures vary between geographic regions?

 

01 Loading Data and Exploring the Dataset

For this project pgAdmin4 was the Relational Database Management System (RDBMS) used. After loading the data into the RDBMS general data exploration and cleaning was performed.

However, to understand how the dataset was connected, an Entity Relationship Diagram (ERD) was created to show the relationship schema using DbVisualizer. This schema was vital in understanding the dataset and keys that connected different tables to perform queries.

To clean the data, the “film” and “customer” tables were checked for duplicates non-uniform data, and missing data.

Duplicates were checked using the HAVING COUNT clause where the count was set to greater than one. However the first and last name of customers was ignored because it’s possible for two different people to have the same name or for a value of a first name/last name to be repeated. There were no duplicates found.

Non-uniform data was checked for using the SELECT DISTINCT clause. There was no non-uniform data found.

Missing data was checked for using the NULL clause. There was no missing data found.

After cleaning the data, descriptive statistics were found for the “film” and “customer” tables for both numerical and non-numerical columns.

Here is an example of a query used to get the descriptive statistics of the numeric columns in the “film” table:

 

02 Finding High Lifetime Value Customers Using JOIN

Step 1: Write a query to find the top 10 countries for Rockbuster in terms of customer numbers.

To write this query, the ERD for this project was used to determine which tables would need access to find the information needed and what the keys were between the tables. It was determined that the ‘customer’, ‘address’, ‘city’ and ‘country’ tables needed to be accessed. To find this solution, the country and the count of customers within each country were needed. So, these two columns were chosen with the SELECT command. Then it was determined what would be the LEFT table and how all the other tables would join. INNER JOIN was used for all joins because only limited information was needed. Data was grouped by ‘country’ so the actual name of the country could be seen in addition to the ‘country_id’. To increase readability, results were ordered by COUNT(customer_id) in descending order. The query was limited by 10 as that was the number of countries asked to be looked at according to business requirements.

Step 2: Write a query to find the top 10 cities within the top 10 countries identified earlier.

This query was written by taking the query from step one and adding two components. Firstly, adding “city” and “cityid” to the SELECT clause. Secondly adding a WHERE clause that specified the “countryid” of the top 10 countries found in the initial query. The “country_id” was used instead of the country name to avoid misspellings.


Step 3: Write a query to find the top 5 customers in the top 10 cities who have paid the highest total amounts to Rockbuster.

This query was written by taking the query from Step 2 and building on to it. The SELECT clause now included “customer_id”, “first_name”, “last_name”, and a sum of amount paid. Also another INNER JOIN was added to include the “payment” table. The GROUP BY clause was updated to reflect the changes to SELECT. The limit was set to 5 to met business requirements.

 

03 Finding Average Amount Paid by the Top 5 Customers Using Subqueries

Subqueries are useful to compare different data points (SELECT), to analyze data that is constantly changing (WHERE) or analyzing the results of a complex query (FROM). This assignment created a complex query, so the FROM clause needed to be used in the subqueries. However, the query written in this exercise will only help Rockbuster for a limited amount of time because the cities that are being looked at are input manually and won’t update the ten cities with the highest customers in the top ten countries. To ensure that the query results are always accurate one could add another subquery using the WHERE clause to find this information. While this will add to the cost of our query, it will save the data analyst time in the long run because they won’t have to be updating the city list every time a transaction occurs.

 

04 Converting Subqueries into Common Table Expressions

To write this query into a CTE, the first thing done was to separate the subquery from the main query so the could be differentiated. Then the subquery was rewritten into a CTE following the CTE syntax. Lines 2 through 17 are essentially the subquery used previously reformatted to fit CTE syntax. Then the main query was slightly changed. The SELECT command was used to find the average amount paid by the top 5 customers, or AVG(total_amount_paid). This was given an alias of ‘Average’. This gave the same results as the subquery above.

 

These visualizations were made using Tableau.

Spatial Analysis: Global Customer Locations

This visualization shows the distribution of Rockbuster Stealth customers globally. Here it can be seen that the country with the most customers is India. This is similar customer information that was found in the query example found in the Steps Taken section of this page. However it is presented in a different way. This representation of the data is beneficial because it helps to show if there are any regional trends present in the data.

 

Spatial Analysis: Customer Payment Distribution

This visualization shows the distribution of customer payments. Here, the size of a circle represents the customer count for a country and the color of the circle represents the total payment given to Rockbuster Stealth by customers. The darker the color, the more has money paid. The lighter the color, the less money has been paid.

 

Question: What was the average rental duration for all videos?

Answer: The average rental duration for all videos is 4.99 days.

Question: Which countries are Rockbuster customers based in?

Answer: Rockbuster customers are located in 109 countries across the globe. Find a full list here.

Question: Where are customers with a high lifetime value based?

Answer: India, Japan, the United States, China, and Indonesia

Question: Do sales figures vary between geographic regions?

Answer: Yes, the data would suggest that a large segment of customers live in South Asia and in North America.

 

Rockbuster Stealth should focus on countries with high lifetime value customers such as India, China, Japan, Indonesia, and the United States.

As all films are English language films, consider breaking into English speaking markets such as Australia. Conversely, consider acquiring more multi language films to cater to non-English speaking customers or dub current films in different languages.

The most rented films in all countries are those with ‘PG-13’ ratings, so there must be enough of those films in the inventory.

The average film length is 115 minutes, Rockbuster should aim to have films around this length.

 

Special thanks to my tutor, Ayya Elzarka, and my mentor, John Kocur, for all their feedback.