Topic_modelling

The repo contains code for topic modelling in python using LDA (latent dirichlet allocation).

Aim:

Given a csv file containg the following fields:

ID,
Job Role/Title,
Job Description and
Category ;

we have to extract skills (or important keywords) from each of the job titles; basically, in the dataframe, a new column is to be added which contains the important skills for the job described in the respective rows.

As an initial try, the model is trained on kaggle dataset Tweeets.csv in which the 'text' column represents the 'Job description' column for the original problem from which topics(skills) will be extracted.

Steps :

For each job description:

1)A corpus(list of documents) is created and it is cleaned using nltk (removing stopwords, punctuation, slang).

2)A dictionary is prepared from the corpus which is followed by preparation of a document_term _matrix.

Document_term_matrix: Python list having length equal to number of documents; it contains the number of occurences of each word of the dictionary in a document.

3)Then an LDA model is built which is given as input the corpus and the number of topics into which it is supposed to categorize the corpus.

4)Each word of the corpus is assigned to one of the topics and for each topic we have a probability distribution over the words that are categorized under it.

5)Output for a topic here is being taken as the most probable word under that topic; although we increase the number of words as output for some topic (the words categorized under that topic) with decreasing priorities.

Number of topics for a job description :

It is taken as the length of corpus; and so we have those many words (skills) extracted for a particular job description.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
Tweets.csv		Tweets.csv
backpropagation .ipynb		backpropagation .ipynb
topic modelling.ipynb		topic modelling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topic_modelling

Aim:

Steps :

For each job description:

Number of topics for a job description :

About

Releases

Packages

Languages

harshalmittal4/Topic_modelling

Folders and files

Latest commit

History

Repository files navigation

Topic_modelling

Aim:

Steps :

For each job description:

Number of topics for a job description :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages