Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #267: Refactor to allow custom event priors and marginalise the latent likelihood #474

Merged
merged 28 commits into from
Nov 28, 2024

Conversation

seabbs
Copy link
Contributor

@seabbs seabbs commented Nov 24, 2024

Description

This PR closes #267 by moving to a formula approach for pwindow and swindow. On top of allowing for the correct likelihood this also allows users to set priors on pwindow and swindow. This could be extended to allow for custom formulas (for pwindow this would be a useful feature but not for swindow).

Checklist

  • My PR is based on a package issue and I have explicitly linked it.
  • I have included the target issue or issues in the PR title in the for Issue(s) issue-numbers: PR title
  • I have read the contribution guidelines.
  • I have tested my changes locally.
  • I have added or updated unit tests where necessary.
  • I have updated the documentation if required.
  • My code follows the established coding standards.
  • I have added a news item linked to this PR.
  • I have reviewed CI checks for this PR and addressed them as far as I am able.

R/latent_model.R Outdated Show resolved Hide resolved
@athowes athowes changed the title Issue 267: Refactor to use formulas for pwindow and swindow Issue #267: Refactor to use formulas for pwindow and swindow Nov 25, 2024
R/latent_model.R Outdated Show resolved Hide resolved
R/latent_model.R Outdated Show resolved Hide resolved
@seabbs
Copy link
Contributor Author

seabbs commented Nov 25, 2024

Noting this has been slowed down as I realised that reparameterisation wasn't working currently with general families and .replace_prior didn't appear to work as expected when used in epidist_prior.

@seabbs
Copy link
Contributor Author

seabbs commented Nov 25, 2024

Something else to note here is that the matrix multiplication and handling of the uniform priors appears to be very inefficient so in its current form it doesn't really make a great deal of sense.

@seabbs
Copy link
Contributor Author

seabbs commented Nov 25, 2024

I don't think that matrix multiplication issue is possible to get around so I think I would argue this PR is repurposed to address the issues found whilst exploring this and we hit pause on custom priors and look for another method for correcting the likelihood

@seabbs
Copy link
Contributor Author

seabbs commented Nov 27, 2024

This PR now:

I also added #478, #476, and #477 to enhance functionality from this PR. In general, I found quite a few edge cases in places (especially the prior handling) when looking at this that I think might need ongoing work.

Due to the inefficiency of the window as formulas approaches this implements the log likelihood where windows are marginalised out. I think this is more correct that the random unlinked sample version but is potentially not ideal. The only way I can think of to write down the full likelihood is to somehow switch the model from the direct version to the formula version when doing posterior prediction. That feels kind of bonkers though. The marginalised likelihood is currently very slow (due to the issues noted above) so it may feel less of a problem as those are resolved.

@seabbs seabbs marked this pull request as ready for review November 27, 2024 18:48
Copy link
Collaborator

@athowes athowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think the way to do the reparameterisation is cool. I do worry about it being a bit hacky, and relying on regex. Perhaps though the regex is relatively robust and the pattern it's looking for in brms Stan code can be relied not to change. Also to note I'm unsure about why this feature is a part of this PR. (Is it related to refactor to use formula for pwindow and swindow?). Also if it's not using S3 suggest dropping S3.
  • On the log_lik, posterior_predict and posterior_epred I think the generalisation beyond being about the latent model is good. I wonder about how we will extend this to working with the marginal_model (PR Issue #221: Add marginal model #426). For the naive model it already has these methods as it doesn't need a custom family. One could wonder about simulating from the right likelihood after using the naive model, but perhaps it's not going to happen / be a useful feature.
  • It's exciting to have priors on the pwindow_raw and swindow_raw. I wonder about how easy it will be for users on the journey to changing those priors (knowing which parameters to set).
  • About the prior infrastructure, it seems to be getting quite complicated and or brittle. Perhaps you disagree. I wonder about what we might be able to do to simplify it. I think at some level I feel quite bad about including an argument like merge_priors in the main epidist() function as it shows our approach isn't really working that well. I wonder what use cases the "built in approach isn't flexible enough for" -- is it mainly this one about prior for all coefficients? I was a bit confused by that because in the previous approach I thought we were matching on all entries in the dataframe. Now it seems like we are doing some regex and it's a different approach.
  • Is there some way we can document how users can make use of the priors on window functionality? And when they would do it? Inclusion in some vignette? Create new issue on this? I think otherwise it is quite a challenging thing to figure out.

R/epidist.R Outdated Show resolved Hide resolved
R/family.R Show resolved Hide resolved
R/family.R Show resolved Hide resolved
R/family.R Show resolved Hide resolved
R/gen.R Outdated Show resolved Hide resolved
R/gen.R Show resolved Hide resolved
R/latent_model.R Outdated Show resolved Hide resolved
R/latent_model.R Outdated Show resolved Hide resolved
vignettes/faq.Rmd Show resolved Hide resolved
R/latent_model.R Show resolved Hide resolved
@seabbs seabbs changed the title Issue #267: Refactor to use formulas for pwindow and swindow Issue #267: Refactor to allow custom event priors and marginalised the latent likelihood Nov 28, 2024
@seabbs seabbs changed the title Issue #267: Refactor to allow custom event priors and marginalised the latent likelihood Issue #267: Refactor to allow custom event priors and marginalise the latent likelihood Nov 28, 2024
@seabbs
Copy link
Contributor Author

seabbs commented Nov 28, 2024

ty for the review @athowes. I think I have addressed your specific concerns and discussed your points below. My high level is I agree some parts of this are a bit complex but I think we might need some more user testing to see how it all shakes out before making updates.

Stan code can be relied not to change. Also to note I'm unsure about why this feature is a part of this PR. (Is it related to refactor to use formula for pwindow and swindow?). Also if it's not using S3 suggest dropping S3

The S3 option is there in case it isn't working well enough and so you can write a s3 method as a workaround. If we didn't have this you would be just stuck. As I said in the specific comment I think we should monitor this.

Also to note I'm unsure about why this feature is a part of this PR.

Its part of this PR as I refactored the stan code section of the latent model - one part of that was this (it also needed to be done in order to make the model work at all when we were passing the window parameters as dpars which we no longer are).

I wonder about how we will extend this to working with the marginal_model

Yes I agree. I think we should make sure that the marginal model has the vreals etc that this needs and then it should just be plug and play

Is there some way we can document how users can make use of the priors on window functionality? And when they would do it? Inclusion in some vignette? Create new issue on this? I think otherwise it is quite a challenging thing to figure out.

As above and yes I think so. I think we might want to wait to see what custom priors looks like in the marginal model (if and when that goes in) as that will have other limitations/features we should probably discuss. I'll make an issue now though.

I was a bit confused by that because in the previous approach I thought we were matching on all entries in the dataframe. Now it seems like we are doing some regex and it's a different approach.

I found multiple instances where this didn't work (with some of them in the new tests). My working with this more generally made me think we might need a list argument per low level function in epidist to pass optional parameters as it felt pretty limited when playing around.

it seems to be getting quite complicated and or brittle

It got so complicated because the current implementation was pretty brittle when actually trying to use it (mostly when trying to pass pwindow and swindow in the formula which a future model might need to do). I think we might need some more user testing to see where we are but I really can't see how we can do away with the handling of manual priors as a special case for example.

I wonder about how easy it will be for users on the journey to changing those priors (knowing which parameters to set).

Not at all easy I think and there are lots of difficult edge cases (also posterior_predict etc is uniform only) I think we might need another issue to expand on this/add some guard rails.

@seabbs seabbs merged commit f9e2fc8 into main Nov 28, 2024
10 checks passed
@seabbs seabbs deleted the window-priors branch November 28, 2024 17:08
seabbs added a commit that referenced this pull request Jan 10, 2025
… latent likelihood (#474)

* first pass at refactoring latent model to use window formulas

* add docs to stan function

* check getting started -drive by fix plotting

* update approach to handling formulas

* get reparameterisation from brms itself vs enforcing manual declaration

* work on regexing:

* test manually setting new priors

* fix .replace_prior

* reset for pause

* add back in lower bounds

* revert pass in via formula

* add custom priors pass in

* write priors down more neatly

* add manual prior mode and optout

* clean up easy test failures

* use marginalised log likelihood

* debug marginalised likelihood

* workaround for liklihood vectorisation

* further increase prior complexity options

* update prior ordering

* catch printing issue for .replace_prior

* add news iteem

* add PR links

* speeed up test

* code read through

* clean up precommit

* turn off priorsense to check theory its numerical instability for extreme log lik values

* review comments

Former-commit-id: 5f6baaa183c156a9d981a669f3717daed3806468 [formerly 9869c9563464c7752c57123f718bd66b3b08d5a3]
Former-commit-id: b1b61cbc1600a76b14924cf86eacf66c5acefbcb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make log_lik methods correct by obtaining latent parameter draws
2 participants