This work comes from a recent research project I volunteered to help out for investigating the sentiment of Twitter accounts. As part of the initial research, an extensive Multilingual Sentiment Analysis on a set of tweets was necessary for extracting useful insights.
A high-level Sentiment analysis Pipeline flowchart
- FunctionsMLSA.py: A file that contains a set of general-purpose functions. Mostly used within the Multi-Sentiment_Analysis.
- Multi_Sentiment_Analysis.py: A class for creating the Multi-Sentiment Analysis object and calling all the relevant functions (Plots, Text Cleaning, etc.).
- main.py: The file that contains the main function
- Output: A folder containing all the outputs from executing the project
- requirement.txt: File that contains all the packages used for this project
The tweet extraction has been done using vicinitas.io. Tweets from 10 accounts have been selected and their tweets have been saved in different spreadsheets (CSV). This dataset contains 36k tweets that are not classified – labeled.
At this point, the data can't be published and thus can't be uploaded to GitHub
The project was developed using python 3.6.13. There is a requirement.txt file that contains all the appropriate packages and their versions for this project. Furthermore there are also two levixons that must be downloaded:
- nltk.download('vader_lexicon')
- nltk.download('wordnet')
Conda Installation:
- MultiSA.yml : A yml file that contains the environment for running the Vader-multi package ()
Installation with pip:
pip install -r requirements.txt
- Have Python >= 3.6 installed on your machine
- Clone or download this repository
- Create a folder called Data and add your spreadsheets that contain your tweets
- In a shell, execute the main.py script with Python 3
Useful Resources: