Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add episode on contact matrices #63

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

amanda-minter
Copy link
Collaborator

This PR adds an episode on contact matrices, fixes issues #47 and #30 and incorporates the edits made in the closed PR #52.

Copy link

github-actions bot commented Nov 7, 2024

Thank you!

Thank you for your pull request 😃

🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}.

If you have files that automatically render output (e.g. R Markdown), then you should check for the following:

  • 🎯 correct output
  • 🖼️ correct figures
  • ❓ new warnings
  • ‼️ new errors

Rendered Changes

🔍 Inspect the changes: https://github.com/epiverse-trace/tutorials-late/compare/md-outputs..md-outputs-PR-63

The following changes were observed in the rendered markdown documents:

 config.yaml                |   4 +-
 contact-matrices.md (new)  | 349 +++++++++++++++++++++++++++++++++++++++++++++
 md5sum.txt                 |   7 +-
 modelling-interventions.md |   4 +-
 simulating-transmission.md |  17 ++-
 5 files changed, 368 insertions(+), 13 deletions(-)
What does this mean?

If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible.

This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation.

⏱️ Updated at 2025-01-21 16:19:16 +0000

github-actions bot pushed a commit that referenced this pull request Nov 7, 2024
github-actions bot pushed a commit that referenced this pull request Nov 7, 2024
Copy link
Member

@adamkucharski adamkucharski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is looking good – just had a few minor suggestions, and some comments about consistency in contact indices (which could arguably go either way, but we should probably go with popular formulation!)

episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/simulating-transmission.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
github-actions bot pushed a commit that referenced this pull request Nov 12, 2024
C is now the model term for the contact matrix, which is the transpose of `contact_matrix$data`
github-actions bot pushed a commit that referenced this pull request Nov 14, 2024
Copy link
Contributor

@sbfnk sbfnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice - left a few more comments.

episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved

Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. Normalisation means converting to a value to be between 0 and 1. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values.

In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. $C[i,j]$ is defined as contacts to $i$ from $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • "defined as contacts to $i$ from $j$" - I don't think this from/to notion is useful as it implies that it's one of the two that initiates contact - see also discussion at from/to, contacting/contacted socialcontactdata/contactmatrix#14
  • you could (if you don't think it adds confusion) point to the split argument in contact_matrix which does the normalisation for you, although it's definitely also useful to show here how to do iot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've altered the text slightly to 'represents the contacts between populations $i$ and $j$' so it is more generic, wording this episode has been tricky - if you have any more suggestions for clarity let me know!

I've also added a callout on using split, I think it is a useful addition to know the normalisation can be done within the function contact_data(). Related to this, I think there could be come confusion about where normalisation takes place in different analyses e.g. in epidemics it happens within the model function call , I've added a callout box to the first tutorial on using epidemics to highlight that the contact matrix normalisation does not need to be done.

Copy link
Contributor

@sbfnk sbfnk Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading the various uses here how about speaking about contacts of group i with group j (which I though you used somewhere but now I can't find it) which to me does not imply any directionality but makes it clear that in rows we're specifically looking at group i. So perhaps adopt this one throughout?

github-actions bot pushed a commit that referenced this pull request Nov 28, 2024
github-actions bot pushed a commit that referenced this pull request Nov 28, 2024
github-actions bot pushed a commit that referenced this pull request Nov 29, 2024
github-actions bot pushed a commit that referenced this pull request Nov 29, 2024
github-actions bot pushed a commit that referenced this pull request Nov 29, 2024
Copy link
Contributor

@sbfnk sbfnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested a consistent notation throughout. @amanda-minter do you think this works? I feel I've gone around this too many times to be a good judge any more.

One way or the other it would be good I think to make sure the terminology is the same in these sections and perhaps define somewhere early e.g. "We call $C[i, j]$ the average number of contacts of group $i$ with group $j$ the number of contacts between the two groups, averaged across all individuals in group $i$."

episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved

Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values.

In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts between populations $i$ and $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts between populations $i$ and $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts of population $j$ with population $i$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: I think this is the wrong way around - C[i, j] in socialmixr is as in the equations above I think (but please correct me if I'm wrong).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's the same then why do we need to transpose the matrix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always find $i$ and $j$ a potential headache (which it's why this will be so useful to have written down!)

Taking step back, we just need the FOI to be defined sensibly, i.e.: $\sum_j C_{i,j} I_j/N_j$. So $C_{i,j}$ should be contacts that group $j$ (the infectious ones) make with group $i$ (the susceptible ones) - this is from equation A3 in Wallinga et al (2006)

The contact_matrix() function gives the following structure:

#> $matrix
#>          contact.age.group
#> age.group      [0,1)     [1,5)   [5,15)      15+
#>    [0,1)  0.40000000 0.8000000 1.266667 5.933333
#>    [1,5)  0.11250000 1.9375000 1.462500 5.450000
#>    [5,15) 0.02450980 0.5049020 7.946078 6.215686
#>    15+    0.03230337 0.3581461 1.290730 9.594101

So $C[i,j]$ is contacts made by group $i$ with group $j$? Which I think would mean it needs transposing?

For completeness (and just to remind myself), {epidemics} has this internal processing in .prepare_population(), which normalises by dominant eigenvalue and scales based on $w_{tot}/w_i$ (to use the Walling et al notation):

contact_matrix <- (contact_matrix / max(Re(eigen(contact_matrix)$values))) /
    x[["demography_vector"]]

Copy link
Contributor

@sbfnk sbfnk Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we just need the FOI to be defined sensibly, i.e.: $\sum_j C_{i,j} I_j/N_j$. So $C_{i,j}$ should be contacts that group $j$ (the infectious ones) make with group $i$ (the susceptible ones)

Shouldn't this be the other way round, i.e. $C_{ij}$ here is the average number of contacts with group $j$ that a suspectible in group $i$ has (then multiplied with the probability that the contact is infectious, i.e. $I_j/N_j$)?

Here's an example:

There are two groups, $i$ and $j$. There are 1 person in group $i$ ($N_i=1$) and 100 people in group $j$ ($N_j = 100$). Person $i$ meets all 100 people in group $j$ every day: according to socialmixr notation that means $C_{ij}=100$ and $C_{ji}=1$ (on the scale of days). The total number of contacts between the two groups per day is $C_{ij} N_i = C_{ji} N_j = 100$.

Sticking with this notation:
The FOI on person $i$ is proportional to 100 * (proportion of $j$ that is ill), or $C_{ij} I_j / N_j$
The FOI on people in group $j$ is proportional to 1 * (1 if $i$ is ill, 0 otherwise), or $C_{ji} I_i / N_i$

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. So it basically comes down to whether the $N_j$ (or symmetry-derived equivalent) is wrapped into the $\beta_{ij}$ term. If defined separately, i.e. $\sum_j C_{i,j} I_j/N_j$ as above, then as you say, contact rate should be defined from-S-to-I.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is useful for the purpose of teaching but I find it easier sometimes to work from the symmetric encounter matrix, i.e. the number of encounters between group $i$ and group $j$ per unit time. If we call this $E_{i,j}$ then it is symmetric $E_{i,j}= E_{j,i}$ and so is the term in the force of infection which is proportional to $\frac{E_{ij}I_j}{N_iN_j}$. This highlights that the row vs. column notation is purely about how the matrix is normalised i.e. whether we write it as $\frac{E_{ij}}{N_i}$ or $\frac{E_{ij}}{N_j}$ (which thus determines which of the $N$ terms remains in the force of infection) and not about contacts from/to etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone through the {epidemics} and {finalsize} examples a bit more, thinking about how these steps are introduced in vignettes. The challenge is that these packages allow user to specify in terms of $R_0$ (which is useful), but that means normalising the matrix, then converting back into the correct form for the contact rate you describe above @sbfnk . This is where I think the transpose comes in, so that it's switching between $C_{ij}=R_{ij}$ for the eigenvalue normalisation and $C_{ij}/N_j$ (i.e. contact rate per capita) for the model.

But because the result is the symmetric per capita matrix that goes into the model, the end result is equivalent:

# 1. Current epidemics and finalsize implementation
contact_matrix <- t(contact_data[["matrix"]])
contact_matrix <- contact_matrix / max(Re(eigen(contact_matrix)$values))/demography_vector

# 2. Per-capita formulation
# Normalise the matrix to ensure correct R0 in model
scaling_factor <- 1 / max(Re(eigen(contact_data[["matrix"]])$values))
contact_matrix2 <- contact_data[["matrix.per.capita"]]*scaling_factor

So in terms of implications for this episode, the main thing is just to make sure that the notation for the matrix in the model is in terms of contacts susceptibles make with infectives?

Bringing it back to those two key distinctions (which we could tweak to mention, give above discusssion!):

  1. Convert contact matrix into expected number of secondary cases The R calculation involves an infective meeting susceptibles

  2. Convert contact matrix into contact rates The epidemic model involves a susceptible meeting infectives

Also tagging @rozeggo and @BlackEdder for info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need to distinguish between the two? Given that the eigenvalues will be invariant under transpose (1) could be done in either orientation (as could (2) if the index of the normalising population size is swapped).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably a simpler framing we could use, but think need to explain the two steps (eigenvalue calc, then normalisation by demography over correct matrix dimension), otherwise could lead people to assume the following is correct?

# 1. Current epidemics and finalsize implementation
contact_matrix <- contact_data[["matrix"]] # Edit: no transpose
contact_matrix <- contact_matrix / max(Re(eigen(contact_matrix)$values))/demography_vector

Perhaps the below is clearest, because doesn't involve any explicit transposes or normalisation? Just need to explain intuitively what the two matrices represent?

# 2. Per-capita formulation
# Normalise the matrix to ensure correct R0 in model
scaling_factor <- 1 / max(Re(eigen(contact_data[["matrix"]])$values))
contact_matrix2 <- contact_data[["matrix.per.capita"]]*scaling_factor

github-actions bot pushed a commit that referenced this pull request Dec 5, 2024
github-actions bot pushed a commit that referenced this pull request Dec 5, 2024

Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values.

In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts between populations $i$ and $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).
Copy link
Member

@adamkucharski adamkucharski Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for final version of this paragraph (wrapping up above discussions). I've edited so $C[i,j]$ =contact_data$matrix[j,i] as this is how it's defined in line 165 above (and for a training episode, makes it easier to follow the logic).

In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. If the entry of the contact matrix $C[i,j]$ represents the contacts of population $i$ with $j$, it is equivalent tocontact_data$matrix[i,j], and the maximum eigenvalue of this matrix represents the typical magnitude of contacts, not typical magnitude of transmission. We must therefore normalise the matrix $C$ so the maximum eigenvalue is one; we call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ as a model input is calculated from $R_0$, the scaling factor and the value of $\gamma$ (i.e. mathematically we use the fact that the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is equal to $\beta / \gamma$).

github-actions bot pushed a commit that referenced this pull request Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants