Usupervised-Learining-ML-Projects

This Project begins with the https://drive.google.com/drive/folders/12sLEvXt5jqJRrbVhF0EsoVpVhEFTGJNN?usp=sharing where dataset is given in folders as word documents.

We got this dataset from Almabetter for our capstone project unsupervised learning.

Our dataset contains five folders with text document the text document contains various news articles by BBC.

First, we made csv file out of the folders as per instructions. The CSV has 2225 row and 2 columns.

This is an NLP Project in which we have to identify the type of news article based on topics provided.

In this topic modelling project, we have performed all the text processing steps like changing all the text to small case, expand contraction, remove punctuation, remove special character and digits, remove stopwords and white spaces, and perform tokenization, normalization by lemmatizer and portstemmer.

To build the model we used LDA by gensim. To visualise the text importance we used pyLDAvis.

For XGboost we plotting tree plot to visualise the feature under consideration. We also used neural network so that deep learning can be used for topic modelling in future.

Business Context

In this project the task is to identify major themes/topics across a collection of BBC news articles. Using Clustering-algorithms such as Latent Dirichlet Allocation (LDA).

Dataset Description

The dataset contains a set of news articles for each major segment consisting of business, entertainment, politics, sports and technology. You need to create an aggregate dataset of all the news articles and perform topic modeling on this dataset. Verify whether these topics correspond to the different tags available.

Main Libraries used:

Pandas for data manipulation, aggregation Matplotlib and Seaborn for visualization and behavior with respect to the target variable

• Gensium for applying LDA

NumPy for computationally efficient operations

Pandas for various operations on dataframe tasks

seaborn, matplotlib for plotting graphs and charts

nltk for text pre processing

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Purushotam_Unsupervisedlearningcapstoneproject.ipynb		Purushotam_Unsupervisedlearningcapstoneproject.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usupervised-Learining-ML-Projects

About

Releases

Packages

Languages

datascientistpurushotam/Topic-Modelling-on-BBC-News-Articles

Folders and files

Latest commit

History

Repository files navigation

Usupervised-Learining-ML-Projects

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages