Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
NZ Property Market Analysis
Project type
Forecasting + Analysis + Machine Learning in R
This project involved an in-depth analysis of the New Zealand housing market, drawing from multiple datasets covering regional rental CPI, property sales (median prices), and national dwelling tenure statistics over many years. The goal was to integrate this complex information to identify key trends, understand regional market dynamics, accurately predict Rent CPI, and explore different time series forecasting methodologies out to 2030.
Key steps included rigorous data cleaning and preparation, such as standardising time references across disparate sources and developing a robust mapping strategy to aggregate granular property location data to broader rental regions. Extensive exploratory data analysis (EDA) using ggplot2 helped uncover crucial patterns, like Auckland's distinct price and rental trajectories and varying regional relationships between property prices and rents. Comprehensive feature engineering was then performed to create insightful predictors like growth rates, price-to-rent ratios, and log transformations for machine learning.
I developed and evaluated several analytical components:
Exploratory Data Analysis Highlights: The EDA revealed significant regional disparities. For instance, Auckland's median property prices showed a more pronounced right skew and higher overall range compared to other regions like Canterbury. Analysis of the Price-to-Rent CPI ratio also pinpointed specific regions and years where properties were relatively more expensive to purchase compared to renting.
Regression Modelling for Rent CPI Prediction: I trained Linear Regression, Lasso, Random Forest, and XGBoost models, using a recipes pipeline for preprocessing (including imputation, dummy variable creation, normalisation) and caret for 10-fold cross-validation and hyperparameter tuning. The ensemble models were clear winners, with Random Forest achieving a test set RMSE of approximately 26.1 and XGBoost an RMSE of approximately 27.4.
Time Series Forecasting of Rent CPI (e.g., for Auckland): I explored two initial approaches for forecasting Rent CPI to 2030:
A direct univariate ARIMA model was applied to the regional Rent CPI series, with auto.arima selecting appropriate orders and handling seasonality.
A recursive Random Forest model was implemented, using an ARIMA model (with a log transformation for trend stability) to first forecast future median property prices for the region, which then served as a dynamic input to the Rent CPI prediction model. This highlighted the impact of exogenous predictor accuracy on machine learning-based forecasts.
This project utilised R along with key packages including tidyverse (for data wrangling and visualisation with dplyr and ggplot2), lubridate, janitor, zoo, tidymodels (specifically recipes, rsample, yardstick for modern modelling workflows), caret (for model training and tuning), glmnet, randomForest, xgboost, forecast (for ARIMA models), and various plotting aids like corrplot and ggrepel. The findings were compiled into an HTML report using R Markdown, demonstrating an end-to-end data science workflow to extract actionable insights from public New Zealand housing market data.