-
Notifications
You must be signed in to change notification settings - Fork 27
/
Copy path07-Visualizing-a-Graph-of-Retweet-Relationships.Rmd
57 lines (45 loc) · 2.65 KB
/
07-Visualizing-a-Graph-of-Retweet-Relationships.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# Visualizing a Graph of Retweet Relationships
## Problem
You want to visualize a graph of retweets.
## Solution
There are a plethora of ways to visualize graph structures in R. One recent and popular one is [`ggraph`](https://github.com/thomasp85/ggraph).
Given the cookbook-nature of this book, we'll cover one more visualization about retweet relationships. Let's explore the entire retweet network and label the screen names with the most retweets over a given search term (and use `#rstats` again, but gather more tweets this time to truly make a spaghetti chart):
```{r 07_lib, message=FALSE, warning=FALSE}
library(rtweet)
library(igraph)
library(hrbrthemes)
library(ggraph)
library(tidyverse)
```
```{r include=FALSE, echo=FALSE}
extrafont::loadfonts(quiet=TRUE)
```
```{r 07_rstats, message=FALSE, warning=FALSE, cache=TRUE}
rstats <- search_tweets("#rstats", n=1500)
# same as previous recipe
filter(rstats, retweet_count > 0) %>%
select(screen_name, mentions_screen_name) %>%
unnest(mentions_screen_name) %>%
filter(!is.na(mentions_screen_name)) %>%
graph_from_data_frame() -> rt_g
```
To help de-clutter the vertex labels, we'll only add labels for nodes that have a degree of 20 or more (rough guess --- you should look at the degree distribution for more formal work). We'll also include the degree for those nodes so we can size them properly:
```{r 07_verts, message=FALSE, warning=FALSE, cache=TRUE}
V(rt_g)$node_label <- unname(ifelse(degree(rt_g)[V(rt_g)] > 20, names(V(rt_g)), ""))
V(rt_g)$node_size <- unname(ifelse(degree(rt_g)[V(rt_g)] > 20, degree(rt_g), 0))
```
Now, we'll create the graph. Using `..index..` for the alpha channel will help show edge weight without too much extra effort. Note the _heavy_ customization of `geom_node_label()`. Thomas made it way too easy to make beautiful network graphs with `ggraph`:
```{r 07_graph, message=FALSE, warning=FALSE, cache=TRUE, fig.width=10, fig.height=10}
ggraph(rt_g, layout = 'linear', circular = TRUE) +
geom_edge_arc(edge_width=0.125, aes(alpha=..index..)) +
geom_node_label(aes(label=node_label, size=node_size),
label.size=0, fill="#ffffff66", segment.colour="springgreen",
color="slateblue", repel=TRUE, family=font_rc, fontface="bold") +
coord_fixed() +
scale_size_area(trans="sqrt") +
labs(title="Retweet Relationships", subtitle="Most retweeted screen names labeled. Darkers edges == more retweets. Node size == larger degree") +
theme_graph(base_family=font_rc) +
theme(legend.position="none")
```
## See Also
- Enter `twitter network analysis r` into Google (seriously!). Lots of folks have worked in this space and blogged or wrote about their efforts.