Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intuition on why distrcompositor uses KM or NA as opposed to Breslow for Cox models? #263

Closed
dnwissel opened this issue Mar 23, 2022 · 6 comments

Comments

@dnwissel
Copy link

dnwissel commented Mar 23, 2022

Hi,

first of all, thank you for the amazing package and all of your hard work!

I have some questions related to distrcompositor and its usage. In case this would be a better fit for cross validated / SO, please let me know and I am happy to open there instead (since this is more theory than directly package related).

Specifically, I'm working on benchmarking a few different survival boosting methods. If I understand correctly, the proper mlr3proba usage to get survival distributions from XGBoost for the Cox PH and AFT losses in particular would be something like the following:

learners = list(
  ppl("distrcompositor", 
      lrn("surv.xgboost", objective = "survival:cox"),
      estimator = "kaplan",
      form = "ph"
  ),
  ppl("distrcompositor",
      lrn("surv.xgboost", objective = "survival:aft", aft_loss_distribution = "logistic"),
      estimator = "kaplan",
      form = "aft"
  )
)

I was wondering whether you could provide some intuition and/or theory as to why you decided to estimate the baseline hazard function using KM/NA as opposed to Breslow (at least for Cox)? As far as I can see, mboost and gbm both estimate the baseline hazard using Breslow - overall I am just curious whether you would expect to see any differences in common calibration measures such as the Integrated Brier Score when estimating the baseline hazard with KM/NA vs Breslow?

Thanks in advance!

@RaphaelS1
Copy link
Collaborator

From my understanding of the question there are actually two separate ones: 1) why use a unconditional estimator instead of a conditional one? 2) why use KM/NA instead of Breslow in that case?

Answering (2) first: the Breslow estimator is identical to NA in the unconditonal case (i.e. when ignoring covariates). Also, with enough data, KM and NA are asymptotically equivalent.

Now for (1): KM is not the right estimator for IBS or other measures when censoring is dependent on covariates, which is usually the case, see e.g. Gerds 2006. However when this is the case, you then become dependent on fitting yet another model for censoring, which then requires even more assumptions and you end up in a mess. So it's still better to have a misspecified by more justifiable estimtator like KM/NA.

Given my reasons above is why I haven't yet got around to the issue that addresses this #164

Hope that helps!

@dnwissel
Copy link
Author

dnwissel commented Mar 29, 2022

Hi Raphael,

thanks for the fast response! Maybe I misunderstand your answer, but I believe we're talking about slightly different things?

#164 (and your answer, unless I misunderstood) discuss the estimation of the censoring distribution for usage in the IBS metric (or similar).

My question was related to the estimation of the (baseline) survival function given some log-hazard estimates produced by e.g., XGBoost fitted with the Cox PH family. In effect, I was just curious why you estimate the baseline survival function unconditionally (using KM/NA) as opposed to using Breslow (at least for models fit using the PH assumption) in the compositor. As you pointed out, Breslow is identical to NA in the unconditional case, but when we fit the Cox model with covariates (which is generally the case), this does not hold as far as I can see.

Maybe I am also misunderstanding the usage of distrcompositor, but I found #44 quite interesting as there you seem to have made the explicit choice to have users use the compositor instead of the gbm native baseline hazard (which is Breslow).

Hope that makes sense - overall I am just curious why i) you chose to use KM/NA only in the compositor as opposed to Breslow for those where PH is assumed to hold and ii) whether you'd expect to see great differences between final survival curves when using Breslow vs e.g., KM?

Thanks a bunch!

@RaphaelS1
Copy link
Collaborator

My point was basically that this is an open question and it is worth considering both in the context of fitting and evaluation. The point of distrcompositor is to allow users to pick which estimation method they want - perhaps I can add Breslow as a choice for this but note Breslow is only possible for models that predict a linear predictor - but see below for why this might be problematic.

Graph below might also answer your second question. I don't know why they differ so much, but I suspect it's because the Breslow estimator was designed for simple linear models that estimate the coefficients (i.e. f(x) = beta), not ML models that predict the linear predictor as a whole (i.e. f(x) = Xbeta). Not sure if that makes sense?

  library(mlr3proba)
library(mlr3extralearners)
l = lrn("surv.gbm")
t = tsk("whas")
l$train(t)
p = l$predict(t)

plot(exp(-gbm::basehaz.gbm(t$truth()[, 1], t$truth()[, 2], p$lp,
  t.eval = sort(unique(t$truth()[, 1]))
)), ylim = c(0, 1), type = "l", xlab = "T", ylab = "S(T)")
lines(survival::survfit(t$formula(1), t$data())$surv, col = 2)
library(survival)
df = t$data()
lines(exp(-basehaz(coxph(t$formula(), df)))[, 1], col = 3)
legend("topright", lty = 1, col = 1:3, legend = c("GBM", "KM", "CPH"))

Created on 2022-04-01 by the reprex package (v2.0.1)

@RaphaelS1
Copy link
Collaborator

Will add it in the future #269

@dnwissel
Copy link
Author

dnwissel commented Apr 4, 2022

That helps a lot, thank you!

@dnwissel dnwissel closed this as completed Apr 4, 2022
@bblodfon
Copy link
Collaborator

Breslow estimator is now supported: https://mlr3proba.mlr-org.com/reference/mlr_pipeops_compose_breslow_distr.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants