-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add episode on contact matrices #63
base: main
Are you sure you want to change the base?
add episode on contact matrices #63
Conversation
moving some contact matrix content from `simulating-transmission.Rmd` to`contact-matrices.Rmd`
added section on socialmixr, including how to download surveys.
Co-authored-by: Sebastian Funk <[email protected]>
majority text suggestion from @adamkucharski
now that the pre-transpose matrix is defined as Cij, the post transpose matrix used in the ODEs should be Cji
Thank you!Thank you for your pull request 😃 🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}. If you have files that automatically render output (e.g. R Markdown), then you should check for the following:
Rendered Changes🔍 Inspect the changes: https://github.com/epiverse-trace/tutorials-late/compare/md-outputs..md-outputs-PR-63 The following changes were observed in the rendered markdown documents:
What does this mean?If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible. This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation. ⏱️ Updated at 2025-01-21 16:19:16 +0000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is looking good – just had a few minor suggestions, and some comments about consistency in contact indices (which could arguably go either way, but we should probably go with popular formulation!)
Co-authored-by: Adam Kucharski <[email protected]>
C is now the model term for the contact matrix, which is the transpose of `contact_matrix$data`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really nice - left a few more comments.
episodes/contact-matrices.Rmd
Outdated
|
||
Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. Normalisation means converting to a value to be between 0 and 1. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values. | ||
|
||
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. $C[i,j]$ is defined as contacts to $i$ from $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "defined as contacts to
$i$ from$j$ " - I don't think this from/to notion is useful as it implies that it's one of the two that initiates contact - see also discussion at from/to, contacting/contacted socialcontactdata/contactmatrix#14 - you could (if you don't think it adds confusion) point to the
split
argument incontact_matrix
which does the normalisation for you, although it's definitely also useful to show here how to do iot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've altered the text slightly to 'represents the contacts between populations
I've also added a callout on using split
, I think it is a useful addition to know the normalisation can be done within the function contact_data()
. Related to this, I think there could be come confusion about where normalisation takes place in different analyses e.g. in epidemics it happens within the model function call , I've added a callout box to the first tutorial on using epidemics
to highlight that the contact matrix normalisation does not need to be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading the various uses here how about speaking about contacts of group i with group j (which I though you used somewhere but now I can't find it) which to me does not imply any directionality but makes it clear that in rows we're specifically looking at group i. So perhaps adopt this one throughout?
Co-authored-by: Sebastian Funk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've suggested a consistent notation throughout. @amanda-minter do you think this works? I feel I've gone around this too many times to be a good judge any more.
One way or the other it would be good I think to make sure the terminology is the same in these sections and perhaps define somewhere early e.g. "We call
episodes/contact-matrices.Rmd
Outdated
|
||
Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values. | ||
|
||
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts between populations $i$ and $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts between populations $i$ and $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$). | |
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts of population $j$ with population $i$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: I think this is the wrong way around - C[i, j] in socialmixr is as in the equations above I think (but please correct me if I'm wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's the same then why do we need to transpose the matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always find
Taking step back, we just need the FOI to be defined sensibly, i.e.:
The contact_matrix()
function gives the following structure:
#> $matrix
#> contact.age.group
#> age.group [0,1) [1,5) [5,15) 15+
#> [0,1) 0.40000000 0.8000000 1.266667 5.933333
#> [1,5) 0.11250000 1.9375000 1.462500 5.450000
#> [5,15) 0.02450980 0.5049020 7.946078 6.215686
#> 15+ 0.03230337 0.3581461 1.290730 9.594101
So
For completeness (and just to remind myself), {epidemics} has this internal processing in .prepare_population()
, which normalises by dominant eigenvalue and scales based on
contact_matrix <- (contact_matrix / max(Re(eigen(contact_matrix)$values))) /
x[["demography_vector"]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we just need the FOI to be defined sensibly, i.e.:
$\sum_j C_{i,j} I_j/N_j$ . So$C_{i,j}$ should be contacts that group$j$ (the infectious ones) make with group$i$ (the susceptible ones)
Shouldn't this be the other way round, i.e.
Here's an example:
There are two groups,
Sticking with this notation:
The FOI on person
The FOI on people in group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense. So it basically comes down to whether the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is useful for the purpose of teaching but I find it easier sometimes to work from the symmetric encounter matrix, i.e. the number of encounters between group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone through the {epidemics} and {finalsize} examples a bit more, thinking about how these steps are introduced in vignettes. The challenge is that these packages allow user to specify in terms of
But because the result is the symmetric per capita matrix that goes into the model, the end result is equivalent:
# 1. Current epidemics and finalsize implementation
contact_matrix <- t(contact_data[["matrix"]])
contact_matrix <- contact_matrix / max(Re(eigen(contact_matrix)$values))/demography_vector
# 2. Per-capita formulation
# Normalise the matrix to ensure correct R0 in model
scaling_factor <- 1 / max(Re(eigen(contact_data[["matrix"]])$values))
contact_matrix2 <- contact_data[["matrix.per.capita"]]*scaling_factor
So in terms of implications for this episode, the main thing is just to make sure that the notation for the matrix in the model is in terms of contacts susceptibles make with infectives?
Bringing it back to those two key distinctions (which we could tweak to mention, give above discusssion!):
-
Convert contact matrix into expected number of secondary cases The R calculation involves an infective meeting susceptibles
-
Convert contact matrix into contact rates The epidemic model involves a susceptible meeting infectives
Also tagging @rozeggo and @BlackEdder for info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to distinguish between the two? Given that the eigenvalues will be invariant under transpose (1) could be done in either orientation (as could (2) if the index of the normalising population size is swapped).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably a simpler framing we could use, but think need to explain the two steps (eigenvalue calc, then normalisation by demography over correct matrix dimension), otherwise could lead people to assume the following is correct?
# 1. Current epidemics and finalsize implementation
contact_matrix <- contact_data[["matrix"]] # Edit: no transpose
contact_matrix <- contact_matrix / max(Re(eigen(contact_matrix)$values))/demography_vector
Perhaps the below is clearest, because doesn't involve any explicit transposes or normalisation? Just need to explain intuitively what the two matrices represent?
# 2. Per-capita formulation
# Normalise the matrix to ensure correct R0 in model
scaling_factor <- 1 / max(Re(eigen(contact_data[["matrix"]])$values))
contact_matrix2 <- contact_data[["matrix.per.capita"]]*scaling_factor
Co-authored-by: Sebastian Funk <[email protected]>
episodes/contact-matrices.Rmd
Outdated
|
||
Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values. | ||
|
||
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts between populations $i$ and $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for final version of this paragraph (wrapping up above discussions). I've edited so contact_data$matrix[j,i]
as this is how it's defined in line 165 above (and for a training episode, makes it easier to follow the logic).
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. If the entry of the contact matrix $C[i,j]$ represents the contacts of population $i$ with $j$, it is equivalent to
contact_data$matrix[i,j], and the maximum eigenvalue of this matrix represents the typical magnitude of contacts, not typical magnitude of transmission. We must therefore normalise the matrix $C$ so the maximum eigenvalue is one; we call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ as a model input is calculated from $R_0$, the scaling factor and the value of $\gamma$ (i.e. mathematically we use the fact that the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is equal to $\beta / \gamma$).
This PR adds an episode on contact matrices, fixes issues #47 and #30 and incorporates the edits made in the closed PR #52.