Intuition on why distrcompositor uses KM or NA as opposed to Breslow for Cox models? #263

dnwissel · 2022-03-23T10:34:01Z

Hi,

first of all, thank you for the amazing package and all of your hard work!

I have some questions related to distrcompositor and its usage. In case this would be a better fit for cross validated / SO, please let me know and I am happy to open there instead (since this is more theory than directly package related).

Specifically, I'm working on benchmarking a few different survival boosting methods. If I understand correctly, the proper mlr3proba usage to get survival distributions from XGBoost for the Cox PH and AFT losses in particular would be something like the following:

learners = list(
  ppl("distrcompositor", 
      lrn("surv.xgboost", objective = "survival:cox"),
      estimator = "kaplan",
      form = "ph"
  ),
  ppl("distrcompositor",
      lrn("surv.xgboost", objective = "survival:aft", aft_loss_distribution = "logistic"),
      estimator = "kaplan",
      form = "aft"
  )
)

I was wondering whether you could provide some intuition and/or theory as to why you decided to estimate the baseline hazard function using KM/NA as opposed to Breslow (at least for Cox)? As far as I can see, mboost and gbm both estimate the baseline hazard using Breslow - overall I am just curious whether you would expect to see any differences in common calibration measures such as the Integrated Brier Score when estimating the baseline hazard with KM/NA vs Breslow?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

RaphaelS1 · 2022-03-24T21:14:09Z

From my understanding of the question there are actually two separate ones: 1) why use a unconditional estimator instead of a conditional one? 2) why use KM/NA instead of Breslow in that case?

Answering (2) first: the Breslow estimator is identical to NA in the unconditonal case (i.e. when ignoring covariates). Also, with enough data, KM and NA are asymptotically equivalent.

Now for (1): KM is not the right estimator for IBS or other measures when censoring is dependent on covariates, which is usually the case, see e.g. Gerds 2006. However when this is the case, you then become dependent on fitting yet another model for censoring, which then requires even more assumptions and you end up in a mess. So it's still better to have a misspecified by more justifiable estimtator like KM/NA.

Given my reasons above is why I haven't yet got around to the issue that addresses this #164

Hope that helps!

dnwissel · 2022-03-29T08:32:19Z

Hi Raphael,

thanks for the fast response! Maybe I misunderstand your answer, but I believe we're talking about slightly different things?

#164 (and your answer, unless I misunderstood) discuss the estimation of the censoring distribution for usage in the IBS metric (or similar).

My question was related to the estimation of the (baseline) survival function given some log-hazard estimates produced by e.g., XGBoost fitted with the Cox PH family. In effect, I was just curious why you estimate the baseline survival function unconditionally (using KM/NA) as opposed to using Breslow (at least for models fit using the PH assumption) in the compositor. As you pointed out, Breslow is identical to NA in the unconditional case, but when we fit the Cox model with covariates (which is generally the case), this does not hold as far as I can see.

Maybe I am also misunderstanding the usage of distrcompositor, but I found #44 quite interesting as there you seem to have made the explicit choice to have users use the compositor instead of the gbm native baseline hazard (which is Breslow).

Hope that makes sense - overall I am just curious why i) you chose to use KM/NA only in the compositor as opposed to Breslow for those where PH is assumed to hold and ii) whether you'd expect to see great differences between final survival curves when using Breslow vs e.g., KM?

Thanks a bunch!

RaphaelS1 · 2022-04-01T22:16:12Z

My point was basically that this is an open question and it is worth considering both in the context of fitting and evaluation. The point of distrcompositor is to allow users to pick which estimation method they want - perhaps I can add Breslow as a choice for this but note Breslow is only possible for models that predict a linear predictor - but see below for why this might be problematic.

Graph below might also answer your second question. I don't know why they differ so much, but I suspect it's because the Breslow estimator was designed for simple linear models that estimate the coefficients (i.e. f(x) = beta), not ML models that predict the linear predictor as a whole (i.e. f(x) = Xbeta). Not sure if that makes sense?

  library(mlr3proba)
library(mlr3extralearners)
l = lrn("surv.gbm")
t = tsk("whas")
l$train(t)
p = l$predict(t)

plot(exp(-gbm::basehaz.gbm(t$truth()[, 1], t$truth()[, 2], p$lp,
  t.eval = sort(unique(t$truth()[, 1]))
)), ylim = c(0, 1), type = "l", xlab = "T", ylab = "S(T)")
lines(survival::survfit(t$formula(1), t$data())$surv, col = 2)
library(survival)
df = t$data()
lines(exp(-basehaz(coxph(t$formula(), df)))[, 1], col = 3)
legend("topright", lty = 1, col = 1:3, legend = c("GBM", "KM", "CPH"))

^{Created on 2022-04-01 by the reprex package (v2.0.1)}

RaphaelS1 · 2022-04-02T08:47:25Z

Will add it in the future #269

dnwissel · 2022-04-04T08:42:27Z

That helps a lot, thank you!

bblodfon · 2024-01-26T16:12:49Z

Breslow estimator is now supported: https://mlr3proba.mlr-org.com/reference/mlr_pipeops_compose_breslow_distr.html

dnwissel closed this as completed Apr 4, 2022

bblodfon mentioned this issue Feb 8, 2024

Using xgboost with crankcompositor #363

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intuition on why distrcompositor uses KM or NA as opposed to Breslow for Cox models? #263

Intuition on why distrcompositor uses KM or NA as opposed to Breslow for Cox models? #263

dnwissel commented Mar 23, 2022 •

edited

Loading

RaphaelS1 commented Mar 24, 2022

dnwissel commented Mar 29, 2022 •

edited

Loading

RaphaelS1 commented Apr 1, 2022

RaphaelS1 commented Apr 2, 2022

dnwissel commented Apr 4, 2022

bblodfon commented Jan 26, 2024

Intuition on why distrcompositor uses KM or NA as opposed to Breslow for Cox models? #263

Intuition on why distrcompositor uses KM or NA as opposed to Breslow for Cox models? #263

Comments

dnwissel commented Mar 23, 2022 • edited Loading

RaphaelS1 commented Mar 24, 2022

dnwissel commented Mar 29, 2022 • edited Loading

RaphaelS1 commented Apr 1, 2022

RaphaelS1 commented Apr 2, 2022

dnwissel commented Apr 4, 2022

bblodfon commented Jan 26, 2024

dnwissel commented Mar 23, 2022 •

edited

Loading

dnwissel commented Mar 29, 2022 •

edited

Loading