Major

Data Science

Research Abstract

This research explores the relationship between daily air quality indicator (AQI) values and the daily intensity of bike-share ridership in New York City. The authors designed and deployed a distributed data science framework on which to process and run Elastic Net, Random Forest Regression, and Gradient Boosted Regression Trees. Nine gigabytes of CitiBike ridership data, along with 1 gigabyte of air quality indicator (AQI) data were employed. All machine learning algorithms identified bike-share ridership intensity as either the most important or the second most important feature in predicting future daily AQIs. The authors also empirically demonstrated that although a distributed platform was necessary to ingest and pre-process the raw 10 gigabytes of data, the actual execution time of all three machine learning algorithms on cleaned, joined, and aggregated data was far faster on a local, commodity computer than on its distributed counterpart.

Faculty Mentor/Advisor

Diane Woodbridge, Paul Intrevado

Download

COinS

Apr 26th, 1:10 AM Apr 26th, 1:25 AM

The Impact of Bike-Sharing Ridership on Air Quality: A Scalable Data Science Framework

2019

The Impact of Bike-Sharing Ridership on Air Quality: A Scalable Data Science Framework

Major

Research Abstract

Faculty Mentor/Advisor

Links

Browse

Author Corner

2019

The Impact of Bike-Sharing Ridership on Air Quality: A Scalable Data Science Framework

Name

Major

Research Abstract

Faculty Mentor/Advisor

Share

Links

Browse

Author Corner