top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Tip Prediction NYC Taxi Data

Project type

Machine Learning in R

This project involved building a predictive model to estimate tip amounts for New York City taxi drivers. Using data from February 2017, the model was developed on Week 2 data and evaluated on Week 4 data, focusing on generalization to unseen data as measured by Mean Squared Prediction Error (MSPE). The process included data cleaning (removing irrelevant columns, handling missing/invalid entries, and outlier removal) and feature engineering, which created new variables like pickup hour, day of the week, trip duration, and categorized trip distance and fare amounts. Exploratory Data Analysis (EDA) was performed on a subsample to understand data distributions and relationships.

bottom of page