NYC Citibike

Exploratory Analysis
Project Overview
Using data collected by customer usage of the service collected in 2013, I looked within the data to find relationships and correlations that could help Citibike operations and marketing to enhance an eco-friendlier form of travel.

Tools Used: Python, Tableau Public
My Contributions
This was a solo project. As such, I gathered, cleaned, and ran statistical analysis on the 2013 NYC Citibike dataset. I then gathered insight on operational efficiency ideas and compiled these findings into a dashboard on Tableau Public.
Spatial map of Citibike station locations
NYC Citibike
Exploratory Analysis
August 2022
Our objective is to run analyses on this data, gleaning any relationships we can in order to assist Citibike in efficient operations and effective marketing strategies. I started by running a correlation on the data and plotting it on a heatmap; this allows for an easier interpretation of the relationships. Looking through the numbers, we see no strong correlation between any of the variables, save those that are supposed to (i.e., same variable, longitude/latitude). This leads us to utilize another way to find correlation: cluster analysis.

I ran a cluster analysis on the data and found that it grouped together the trips by trip length: long trips (approx. 33 min), medium trips (approx. 17 min), and short trips (approx. 7 min). When I plotted the start time of the trips with these newly defined trip lengths, it returned interesting results. We can see defined high and low times for service use, as well as what length of trip is most common during those times.

Knowing trip length and high-volume hours for the service, I thought it wise to see if there were stations that were more commonly used during certain parts of the day. I ran a spatial analysis on the stations using the median start time of trips from each station. What I found was that stations in downtown Manhattan were most commonly used during the latter half of the day, while those uptown and in western Brooklyn were used much earlier. This falls in line with our assumption that those using this service are heading into the city for work and heading out of the city afterward.

Through this analysis, we can determine proper rotation of bikes to ensure optimal stock for customer use, average trip length, peak operating hours to optimize app performance, and when to implement marketing strategies to garner new customers.

Additional information on this case study can be found in the following locations:

Tableau Public Presentation
Github Repository
HIstogram of Citibike TripsCorrelation heatmap of Citibike variables