This repository contains project work in the course "Signals and data" taught at the Technical University of Denmark. It serves as experiments for a comparison between two text classification methods.
The project is divided into two notebooks. Common functions are found in common.py
.
The experiments for each methods can be found in in the notebook in their respective folders.
.
├── baseline
│ ├── baseline.ipynb
│ └── glove.6B.50d.txt
├── common.py
├── emails.csv
├── fasttext
│ ├── CV
│ ├── fasttext.ipynb
│ ├── news_fasttext_classifier.p
│ ├── news_train_emb.p
│ ├── spam_fasttext_classifier.p
│ ├── spam_train_emb.p
│ └── text_classifier.py
├── __init__.py
├── news_data.npz
├── news_data.zip
├── readme.md
├── similar_news.txt
├── spam_data.npz
└── spam_data.zip
'World' = 'blue' 0, 'sports' = 'red' 1, 'Business'= 'green' 2, 'Sci/Tec'='cyan' 3
formatted as real, could be, text(first paragraph). gathered from frontpage of https://www.bbc.com/news on 08-05-2020
- 0,3/2 https://www.bbc.com/news/business-52392366
- 0 https://www.bbc.com/news/world-us-canada-52584774
- 0 https://www.bbc.com/news/world-europe-52585162
- 0, 3 https://www.bbc.com/news/uk-england-suffolk-52566082
- 1, 3 https://www.bbc.com/sport/av/athletics/51332721
- 1 https://www.bbc.com/sport/formula1/52568642
- 1 https://www.bbc.com/sport/boxing/52573766
- 2 https://www.bbc.com/news/business-52570600
- 2 https://www.bbc.com/news/business-52580950
- 2, 3 https://www.bbc.com/news/business-52570714
- 3 https://www.bbc.com/news/science-environment-52550973
- 3 https://www.bbc.com/news/technology-52572381
- 3 https://www.bbc.com/news/science-environment-52560812