Detect scandals for any entity, such as public firms or celebrities. The algorithm only requires sufficient search interest for a keyword and its associated scandals.
Detect scandals for publicly known firms within the last 5 years.
- Show overall number of scandals
- Display scandals in timeline
- Provide links to Google for further investigation
The project builds on Google Trends data. It covers search interest for a keyword over the last five years.
The data pipeline includes the following steps:
- a user inputs a keyword to which we add "scandal"
- the search interest is fetched from Google Trends with pytrends.
- Process the data with pandas
- train a time-series model with Facebook Prophet.
- Identify outliers from the predicted search interest. These are the scandals.
- Visualize timeline with plotly
- List links to Google in the period of the scandal
- check that detected scandals do not overlap within a period of `date_uppderbound_days
- create
config.yaml
- load config with
hydra
Using color codings from http://towardssustainablefinance.com/.
color_discrete_sequence=["#4d886d", "#f3dab9", "#9bcab8", "#829fa5", "#cfaea5"] dark font color: #545454 bright background color: #D5E6E0