GitHub - gpsyrou/Twitter_Sentiment_Analysis: Exploration of the Twitter API and sentiment & topic analysis on tweets relevant to COVID-19

Twitter Topic Modelling and Sentiment Analysis

Topic: Analysis of Coronavirus related Tweets using TwitterAPI

Log:

1) 15/02/2020: The TwitterAPI has a limit of 5000 tweets for the FullArchive version, and 25000 for the 30day version.
	   Need to find a way to receive data for each day for a month period, as the API does not seem to provide this functionality.
2) 01/03/2020: Version one completed. It included analysis for tweets from 17/01/2020 to 29/02/2020.Tw
	   The analysis is focused on the words that appear frequently in the tweets, as well as analysis on bigrams (words that appear next to each other).
	   Finally we include some analysis on the sentiment of the tweets by using the Hiu Lu opinion lexicon algorithm.
3) 07/03/2020: Handle non-english tweets (translation) by using a Google translation API 2) Use the location column to identify the longitude and latitude
4) 14/10/2020: Add data for more months except the initial tweets from January - March. Create a Class for the sentiment analysis. Update the main Jupyter notebook.
5) 12/11/2020: Version two completed. The jupyter notebook contains data and findings for all months, while analyzing further the months of April, August and October 2020 and compare the change in sentiment. In this version we also include a functionality to plot a geolocation map of the tweets.

Running Guide for Data Retrieval and Preprocessing

Run data_retrieval.py to get tweets for a specific period. The script is taking as parameters the start and end date we want to receive data from . It is not recommended to retrieve data for more than a 2-day period in a single API call, as the Twitter API has limits.
python data_retrieval.py 2020-05-15 2020-05-17
Combine the retrieved jsonl files by using the merge_json_files.py script. This will output a text file the contains the combined data
python merge_json_files.py
The data contained in the output text file from step 2 require some preprocessing before we analyze them. In this step we are using the data_preprocessing.py scripts which picks the required data of interest from the text file, removes blank tweets, clean the tweets from hyperlinks, applies translation to the text, and more.
python data_preprocessing.py

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
__pycache__		__pycache__
img		img
jupyter_notebooks		jupyter_notebooks
output_files		output_files
utilities		utilities
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
data_preprocessing.py		data_preprocessing.py
geolocation_data_retrieval.py		geolocation_data_retrieval.py
sentiment_class.py		sentiment_class.py
tweets_translated.csv		tweets_translated.csv
tweets_with_sentiment.csv		tweets_with_sentiment.csv
twitter_config.json		twitter_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Topic Modelling and Sentiment Analysis

Topic: Analysis of Coronavirus related Tweets using TwitterAPI

Log:

Running Guide for Data Retrieval and Preprocessing

Useful material while developing:

About

Releases

Packages

Contributors 2

Languages

gpsyrou/Twitter_Sentiment_Analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter Topic Modelling and Sentiment Analysis

Topic: Analysis of Coronavirus related Tweets using TwitterAPI

Log:

Running Guide for Data Retrieval and Preprocessing

Useful material while developing:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages