top of page
Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Spam Email Classification
Project type
Spam Email Classification
This project implements a spam email classifier using the Enron dataset, distinguishing between spam and ham emails. The methodology involved loading email data, followed by preprocessing steps including lemmatization, tokenization to lowercase, and the removal of common stop words and infrequent terms. Features were extracted based on word counts from the processed email content, and samples were labeled as spam (1) or ham (0). A Naive Bayes classifier was trained on 70% of the data, achieving approximately 96% training accuracy and 94% accuracy on the remaining 30% test set.
bottom of page