top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Spam Email Classification

Project type

Spam Email Classification

This project implements a spam email classifier using the Enron dataset, distinguishing between spam and ham emails. The methodology involved loading email data, followed by preprocessing steps including lemmatization, tokenization to lowercase, and the removal of common stop words and infrequent terms. Features were extracted based on word counts from the processed email content, and samples were labeled as spam (1) or ham (0). A Naive Bayes classifier was trained on 70% of the data, achieving approximately 96% training accuracy and 94% accuracy on the remaining 30% test set.

bottom of page