Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 1.35 KB

README.md

File metadata and controls

31 lines (22 loc) · 1.35 KB

Anti-Phishing

This is coursework for the big data analytics class, that identifies whether a website is legitimate or a phishing site using random forest.

The aim was to learn as much as possible about supervised machine learning, and in the end to create a jupyter notebook on the topic of our choice (phishing detection in my case).

Overview

The coursework is on a jupyter notebook (coursework_phishing_website_detection.ipynb) which is 100% reproducible and explains my thinking step by step.

There are several stages in this coursework:

  1. Research & Data Exploration
    1. Dataset presentation
    2. Related Work & Data Exploration
    3. Data Pre-processing
  2. Modelling/ Classification
  3. Solution Improvement

Key words:

  • Random Forest Classification
  • Gradient Boosted Trees
  • Cross-validation
  • Randomized Search
  • Grid Search
  • Fully Homomorphic Encryption Machine Learning

As a bonus, I decided to create a streamlit application to simulate a real-world implementation of an anti-phishing solution based on machine learning.

Note To run the streamlit app that allow you to determine if it's a phishing or legitimate website based on URL do the following command: streamlit run phishing_website_detection_app.py

anti_phishing_streamlit.mp4