🔗 GitHub

Project Overview

  • Created a MySQL script to upload and pre-process the data, as well as build several pivot tables to help leadership better understand customer segments.
  • Created an R programming language script to upload and pre-process the data, build several pivot tables, and create visualizations to better present data to leadership.
  • Created a Tableau Storyboard to present findings to leadership, including several recommendations.
  • Wrote a series of Medium articles to futher explain each stage of the data analysis process.

Code and Resources Used

R Version: 4.2.2

MySQL Version: 8.0.31

Data Cleaning

After uploading the data, I performed several pre-processing tasks to prepare it for analysis. I performed the following changes to the data:

Added

  • ride_length column by subtracting starting time from ending time
  • day_of_week column by WEEKDAY() function

Changed

  • Renamed all csv files to follow “Bike__Data_YYYYMM” nomenclature
  • Renamed “rideable_type” column to “bike_type”
  • Renamed “start_station_name” column to “start_sta_name”
  • Renamed “start_station_id” column to “start_sta_id”
  • Renamed “end_station_name” column to “end_sta_name”
  • Renamed “end_station_id” column to “end_sta_id”
  • Renamed “member_casual” column to “user_type” column
  • Renamed “rideable_type” column to “bike_type” column
  • “started_at” column’s data type changed to date and time
  • “ended_at” column’s data type changed to date and time
  • Replaced “Clybourne Ave” to “Clybourn Ave” in all columns
  • Applied the CamelCase naming convention to the data set

Removed

  • All null values in the data set
  • All trailing, leading, and excess spaces in the data set
  • All records with a ride length of less than one minute

Exploratory Data Analysis

I created several visualizations to better understand the data’s distribution, value differences, and categorical variables. Below are a few highlights from this phase:

sql_table_12 r_table_4 r_table_7 r_table_5