top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

NZ Tranport Agency Data Analysis

Project type

Machine Learning in Python

This project involved an in-depth analysis of the New Zealand Transport Agency (NZTA) driver licence dataset, which contained over 9,800 records detailing licence holder information across various regions, age groups, licence types, and financial years. The goal was to identify key trends, forecast future licence volumes, and build a model to classify licence stages.

Key steps included comprehensive data cleaning and preparation, extensive exploratory data analysis (EDA) using visualizations to uncover patterns (such as Auckland's dominance in licence numbers and the significant cohort of "60 and over" licence holders), and robust feature engineering to prepare the data for machine learning.

I developed several models:

* A Linear Regression model for time-series forecasting, projecting an increase of 50,000-60,000 total licence holders annually for the next five years.

* A Random Forest classifier to predict licence stages (Full, Learner, Restricted). After hyperparameter tuning and cross-validation, this model achieved a Mean Accuracy of approximately 91.7%. Feature importance analysis revealed that Count (number of licence holders in a category), Age group (ordinal), and Year were the most significant predictors.

* K-Means clustering (using PCA for dimensionality reduction and Silhouette Analysis to determine optimal k=2 clusters) was applied to identify distinct segments within the data.

This project utilized Python, Pandas, Matplotlib, Seaborn, and Scikit-learn, demonstrating an end-to-end data science workflow to extract actionable insights from public transport data.

bottom of page