GitHub - RichardAbraham/Text_Analytics_NLP: “With great power comes great responsibility.”- Stan Lee. Here, I analyze social media's response, specifically Twitter, to the life around and after the death of Marvel's creative leader, Stan Lee.

TEXT ANALYTICS USING NLP (Natural Language Processing)

Web scraping - Twitter data

SUMMARY:

Stan Lee, one of America's most prolific comic book writers, died in Los Angeles at the age of ninety-five on November 12, 2018. Here, I aim to analyze social media's response, particularly Twitter, to his death.

PROJECT GOALS:

Explore tweets to reveal interesting insights about user activity after his death.
Build a machine learning model that is capable of accurately classifyig the sentiment of a tweet as either positive, neutral or negative.

PACKAGES USED:

Scikit-Learn, Numpy, Pandas, NLTK, Textblob, Matplotlib, and Tweepy among others.

MOTIVATION:

Stan Lee was an American comic book writer, editor, publisher, and producer. He rose through the ranks of a family-run business to become Marvel Comics' primary creative leader for two decades, leading its expansion from a small division of a publishing house to a multimedia corporation that dominated the comics industry. Lee was inducted into the comic book industry's Will Eisner Award Hall of Fame in 1994 and the Jack Kirby Hall of Fame in 1995. He received the NEA's National Medal of Arts in 2008.

As a fan of Marvel comics myself, I wanted to explore his life and work in greater detail using machine learning!

DATA COLLECTION:

Data for the analysis was collected through Twitter's public APIs. (How to extract tweets using Twitter's public APIs)

I used the following keywords to filter the extraction - Stan Lee, StanLee, Stanley Martin Lieber

PS: Adil Moujahid does a great job introducing Text Mining using Twitter's streaming API and Python

Refer to "Historical Tweets Extraction - Web Scrapping.ipynb" for steps to extract historical tweets as needed.

DESCRIPTIVE ANALYTICS (EDA)

Tools used include Python, Tableau, MS PowerBI

Top 5 Languages used to tweet

English comes in at #1 followed by Spanish

Time Series analysis displaying number of likes vs date of creation (at the time of his death):

We see a surge in activity after his death

Percent(%) distribution of content sources

Majority of the tweets were made using a mobile device

Basemap displaying the location of tweets

SENTIMENT ANALYSIS

Wordcloud

Important words include:

angeles
awesome
respect
memorial

Percent(%) distribution of sentiments

Majority of the tweets were of a positive sentiment

For more findings, please go to the "Images" folder.

FILE CONTENTS:

Text Analytics using NLP - Web Scrapping.ipynb: Contains coded steps undertaken to

Extract the relevant tweets
Pre-process and structure the data for analysis
Carry out some descriptive analytics
Perform sentiment analysis and build a model for sentiment classification

Logistic Regression performed the best with an accuracy of 98% and an average f1 score of 0.97

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Images		Images
EDA.pptx		EDA.pptx
Historical Tweets Extraction - Web Scrapping.ipynb		Historical Tweets Extraction - Web Scrapping.ipynb
README.md		README.md
Text Analytics using NLP - Web Scrapping.ipynb		Text Analytics using NLP - Web Scrapping.ipynb
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TEXT ANALYTICS USING NLP (Natural Language Processing)

SUMMARY:

PROJECT GOALS:

PACKAGES USED:

MOTIVATION:

DATA COLLECTION:

DESCRIPTIVE ANALYTICS (EDA)

Top 5 Languages used to tweet

Time Series analysis displaying number of likes vs date of creation (at the time of his death):

Percent(%) distribution of content sources

Basemap displaying the location of tweets

SENTIMENT ANALYSIS

Wordcloud

Percent(%) distribution of sentiments

FILE CONTENTS:

Please feel free suggest any improvements or to use any of the steps shown above and have fun coding!!

About

Releases

Packages

Languages

RichardAbraham/Text_Analytics_NLP

Folders and files

Latest commit

History

Repository files navigation

TEXT ANALYTICS USING NLP (Natural Language Processing)

SUMMARY:

PROJECT GOALS:

PACKAGES USED:

MOTIVATION:

DATA COLLECTION:

DESCRIPTIVE ANALYTICS (EDA)

Top 5 Languages used to tweet

Time Series analysis displaying number of likes vs date of creation (at the time of his death):

Percent(%) distribution of content sources

Basemap displaying the location of tweets

SENTIMENT ANALYSIS

Wordcloud

Percent(%) distribution of sentiments

FILE CONTENTS:

Please feel free suggest any improvements or to use any of the steps shown above and have fun coding!!

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages