- Compiled.ipynb is to show the whole flow of our analysis (without output). The process is as follow :
- EDA and Preprocessing is the exploration to understand our dataset prior to analysis and creation of dataframes of the policitcs dataset
- TopicModelling is used to generate the topic distribution for both politics and entertainment domain
- Emotion_Sentiment_Pos is used to genereate the emotion distribution, sentiment distribution and pos grouping for politics domain
- Named Entity Recognition is used to generate the entity type count for Location, Persons and Organisations
- CrossDomain_Emotion_Sentiment_POS is used to genereate the emotion distribution, sentiment distribution, pos grouping and entity type count for entertainment domain that will be used for testing for cross domomain model
- Classification shows the model training , testing and validation process. It also shows the type of models used and the respective data used for both specific and cross domain classifiers
- process_text is to process the dataset from the entertainment dataset.