🔗 GitHub
Project Overview
- Created a MySQL script to upload and pre-process the data, as well as build several pivot tables to help leadership better understand customer segments.
- Created an R programming language script to upload and pre-process the data, build several pivot tables, and create visualizations to better present data to leadership.
- Created a Tableau Storyboard to present findings to leadership, including several recommendations.
- Wrote a series of Medium articles to futher explain each stage of the data analysis process.
Code and Resources Used
R Version: 4.2.2
MySQL Version: 8.0.31
Data Cleaning
After uploading the data, I performed several pre-processing tasks to prepare it for analysis. I performed the following changes to the data:
Added
- ride_length column by subtracting starting time from ending time
- day_of_week column by WEEKDAY() function
Changed
- Renamed all csv files to follow “Bike__Data_YYYYMM” nomenclature
- Renamed “rideable_type” column to “bike_type”
- Renamed “start_station_name” column to “start_sta_name”
- Renamed “start_station_id” column to “start_sta_id”
- Renamed “end_station_name” column to “end_sta_name”
- Renamed “end_station_id” column to “end_sta_id”
- Renamed “member_casual” column to “user_type” column
- Renamed “rideable_type” column to “bike_type” column
- “started_at” column’s data type changed to date and time
- “ended_at” column’s data type changed to date and time
- Replaced “Clybourne Ave” to “Clybourn Ave” in all columns
- Applied the CamelCase naming convention to the data set
Removed
- All null values in the data set
- All trailing, leading, and excess spaces in the data set
- All records with a ride length of less than one minute
Exploratory Data Analysis
I created several visualizations to better understand the data’s distribution, value differences, and categorical variables. Below are a few highlights from this phase: