The survPen
package was designed to fit hazard and
excess hazard models with multidimensional penalized splines allowing
for time-dependent effects, non-linear effects and interactions between
several covariates (Fauvernier et al. 2019). The linear predictor in
survPen
is the logarithm of the (excess) hazard. Since
version 2.0.0, survPen
allows models for the logarithm of
the relative mortality ratio between a cohort and a reference population
here. Besides, models for the logarithm of the
marginal hazard (intensity) are also available (with robust estimation
of standard errors) here.
As the hazard function fully determines the distribution of the time-to-event, this modelling approach is actually well-suited for many time-to-event analyses: the splines provide the flexibility required for modelling the hazard and the penalty terms control this flexibility for smooth estimation. Excess hazard modelling (Estève et al. 1990, Remontet et al. 2007, Remontet et al. 2018) is linked to the concept of net survival (competitive risk setting), and can be useful in specific situations, for example to study the mortality associated with chronic diseases (e.g., cancer survival).
The framework is very similar to that of the R mgcv
package developed by Wood for generalized additive models; it allows
including parametric smooth terms based on restricted cubic regression
splines as marginal bases, associated with penalties on the second
derivative. Multidimensional smoothers are based on tensor product
splines, i.e. a term-by-term multiplication of the marginal bases.
Smoothing parameters are estimated automatically by optimizing either
the Laplace approximate marginal likelihood (LAML) or the likelihood
cross-validation criterion (LCV).
The user must be aware that the survPen
package is
independent of the mgcv
package and that some
functionalities available in mgcv
in terms of types of
splines (such as thin plate regression splines or P-splines) are not
available in survPen
(yet).
In survPen
, the linear predictor, i.e. the log-hazard
function, is fully and explicitly specified by the model’s formula,
including the baseline hazard and all time-dependent effects. Thus,
time-dependent effects are naturally specified as interactions with
functions of time.
The main functions of the survPen
package are
survPen
, smf
, tensor
,
tint
and rd
. The survPen
function
fits the model specified in the formula
argument. The
functions smf
, tensor
, tint
are
used to define penalized splines within this formula
.
Finally, rd
allows including random effects in the linear
predictor.
Unpenalized terms can also be incorporated in survPen
formulae, just as one would specify the terms of a linear predictor in a
glm
formula. The survPen
package thus allows
to easily define and fit various hazard models. As an example, analyses
performed using the coxph
function of the
survival
package, to fit a Cox proportional hazard model,
may readily be improved using the survPen
package by adding
a penalized spline to model the baseline hazard: in addition to the
hazard ratio estimates, the user would then obtain a smooth estimate of
the baseline hazard as well as smooth survival curves estimates.
In time-to-event analysis, we may deal with one or several covariates
whose functional forms, time-dependent effects and interaction structure
are challenging to specify. In this context, penalized hazard models
represent an interesting tool (Kauermann 2005, Kneib and Fahrmeir 2007,
Remontet et al. 2018). One possible way to implement such penalized
models is to use the classical approximation of the survival likelihood
by a Poisson likelihood by artificially splitting the data. The package
mgcv
can then be used to fit penalized hazard models
(Remontet et al. 2018). The problem with this option is that the setup
is rather complex and the method cannot be used on very large datasets
for computational reasons.
Wood et al. (2016) provided a general penalized framework that made available smooth function estimation to a wide variety of models. They proposed to estimate smoothing parameters by maximizing a Laplace approximate marginal likelihood (LAML) criterion and demonstrate how statistical consistency is maintained by doing so.
The survPen
function implements the framework described
by Wood et al. (2016) for modelling time-to-event data. The effects of
continuous covariates are represented using low rank spline bases
associated with penalties on the second derivative (penalty terms are
quadratic in the regression parameters in this case). The
survPen
function allows to account simultaneously for
time-dependent effects, non-linear effects and interactions between
several continuous covariates without the need to build a possibly
demanding model-selection procedure. In addition to LAML, the likelihood
cross-validation (LCV) criterion (O’Sullivan 1988) can be used for
smoothing parameter estimation.
A key feature of survPen
is that the optimization of LCV
and LAML relies on their first and second derivatives with respect to
the smoothing parameters; this makes the optimization procedure fast and
stable. The estimation procedure follows the optimization scheme
proposed by Wood et al. (2016); it is based on two nested Newton-Raphson
algorithms, an outer Newton-Raphson iterations for the smoothing
parameters and an inner Newton-Raphson iterations for the regression
parameters. Estimation of the regression parameters in the inner
algorithm is performed maximizing directly the penalized likelihood of
the survival model, therefore avoiding data augmentation and Poisson
likelihood approximation.
In practice, LAML optimization is generally both a bit faster and a
bit more stable and is thus the default option in survPen
.
For m covariates (x1, …, xm),
if we note h(t, x1, …, xm)
the hazard at time t, the
hazard model is the following : log[h(t, x1, …, xm)] = ∑jgj(t, x1, …, xm)
where each gj is either the
marginal basis of a specific covariate or a tensor product smooth of any
number of covariates. The marginal bases of the covariates are
represented as natural (or restricted) cubic splines (as in function
ns
from library splines
) with associated
quadratic penalties. Full parametric (unpenalized) terms for the effects
of covariates are also possible (see the examples below). Each gj is then
associated with zero, one or several smoothing parameters. The
cumulative hazard included in the log-likelihood is approximated by
Gauss-Legendre quadrature for numerical stability.
The method is detailed in Fauvernier et al. (in revision in the Journal of the Royal Statistical Society series C).
In the following examples, we will use a simulated dataset that contains artificial data from 2,000 women diagnosed with cervical cancer between 1990 and 2010. End of follow-up is June 30th 2013. The variables are as follows:
The first ten rows are shown below:
begin | fu | age | yod | dead | rate |
---|---|---|---|---|---|
0.2596339 | 0.7449282 | 35.86311 | 1990.617 | 1 | 0.0008125 |
0.1980317 | 0.7675560 | 43.51814 | 1990.195 | 1 | 0.0014839 |
0.7417083 | 0.8769426 | 46.03696 | 1990.157 | 1 | 0.0019026 |
0.5496453 | 0.7626806 | 49.97125 | 1990.063 | 1 | 0.0023774 |
0.2710438 | 0.8842343 | 49.18275 | 1990.310 | 1 | 0.0023774 |
0.4466266 | 0.7688269 | 52.53114 | 1990.219 | 1 | 0.0029628 |
0.5732427 | 0.9393162 | 53.26489 | 1990.742 | 1 | 0.0031958 |
0.0449805 | 0.0452196 | 55.24709 | 1990.124 | 1 | 0.0037191 |
0.0015541 | 0.0254632 | 66.30253 | 1990.304 | 1 | 0.0087319 |
0.1472586 | 0.1831834 | 73.86721 | 1990.313 | 1 | 0.0182685 |
The model specification should seem natural for users familiar with
the glm
formulation, because the linear predictor is fully
and explicitly specified by the model’s formula. In addition to
specifiying the time (argument t1
) and event variable
(argument event
), the user only needs to provide one
formula object starting with the symbol “~” followed by the functional
forms of the different covariates and time. Nothing is specified on the
left of the formula since the linear predictor scale is implicit
(log-hazard or log-excess hazard).
Suppose that we are only interested in the effect of the time elapsed since diagnosis on the hazard. Examples of models fitted on the log-hazard scale are shown below:
log[h(t)] = β0
$$ log[h(t)] = \sum_{k=1}^{p}\beta_k I_k(t) $$
where Ik(t) = 1 if t belongs to the kth specified interval and 0 otherwise.
log[h(t)] = β0 + β1 × t
log[h(t)] = f(t)
where f is a restricted cubic splines (linear beyond the boundary knots) with interior knots 0.25, 0.5, 1, 2 and 4 and boundary knots 0 and 5.
Using the splines
package, we can specify the model as
follows
We use the same design as before but add a penalty term that controls the smoothness of the fitted curve
log[h(t)] = s(t)
where s is a penalized restricted cubic splines with interior knots 0.25, 0.5, 1, 2 and 4 and boundary knots 0 and 5.
Using the smf
(stands for smooth function)
function within the survPen
package
f.pen <- ~ smf(fu,knots=c(0,0.25, 0.5, 1, 2, 4,5)) # careful here: the boundary knots are included
mod.pen <- survPen(f.pen,data=datCancer,t1=fu,event=dead)
Nota Bene: the unpenalized version of this model could also have been fitted by specifying that the smoothing parameter should be zero
new.time <- seq(0,5,length=100)
pred.cst <- predict(mod.cst,data.frame(fu=new.time))
pred.pwcst <- predict(mod.pwcst,data.frame(fu=new.time))
pred.lin <- predict(mod.lin,data.frame(fu=new.time))
pred.rcs <- predict(mod.rcs,data.frame(fu=new.time))
pred.pen <- predict(mod.pen,data.frame(fu=new.time))
lwd1 <- 2
par(mfrow=c(1,1))
plot(new.time,pred.cst$haz,type="l",ylim=c(0,0.2),main="hazard vs time",
xlab="time since diagnosis (years)",ylab="hazard",col="black",lwd=lwd1)
segments(x0=new.time[1:99],x1=new.time[2:100],y0=pred.pwcst$haz[1:99],col="blue3",lwd=lwd1)
lines(new.time,pred.lin$haz,col="green3",lwd=lwd1)
lines(new.time,pred.rcs$haz,col="orange",lwd=lwd1)
lines(new.time,pred.pen$haz,col="red",lwd=lwd1)
legend("topright",
legend=c("constant","piecewise constant","log-linear","cubic spline","penalized cubic spline"),
col=c("black","blue3","green3","orange","red"),
lty=rep(1,5),lwd=rep(lwd1,5))
We can see that the penalized model offers a smoother curve than the unpenalized model. Estimation from the penalized version will then tend to be slightly biased but less prone to overfitting.
Hazard and survival predictions can be made along their confidence intervals
par(mfrow=c(1,2))
plot(new.time,pred.pen$haz,type="l",ylim=c(0,0.2),main="Hazard from mod.pen with CIs",
xlab="time since diagnosis (years)",ylab="hazard",col="red",lwd=lwd1)
lines(new.time,pred.pen$haz.inf,lty=2)
lines(new.time,pred.pen$haz.sup,lty=2)
plot(new.time,pred.pen$surv,type="l",ylim=c(0,1),main="Survival from mod.pen with CIs",
xlab="time since diagnosis (years)",ylab="survival",col="red",lwd=lwd1)
lines(new.time,pred.pen$surv.inf,lty=2)
lines(new.time,pred.pen$surv.sup,lty=2)
Hazard ratios as well as survival differences (with associated confidence intervals) can be calculated directly
The following example constructs a model with a tensor product spline of time and age (see below for details about those models). We then predict the hazard ratio and survival differences between ages 70 and 30 according to time using the type=“HR” argument.
f.pen.age <- ~tensor(fu,age,df=c(5,5)) # see below for explanations about tensor models
mod.pen.age <- survPen(f.pen.age,data=datCancer,t1=fu,event=dead)
pred.pen.HR <- predict(mod.pen.age,data.frame(fu=new.time,age=70),newdata.ref=data.frame(fu=new.time,age=30),type="HR")
par(mfrow=c(1,2))
plot(new.time,pred.pen.HR$HR,type="l",ylim=c(0,15),main="Hazard ratio (age 70 vs 30)",
xlab="time since diagnosis (years)",ylab="hazard ratio",col="red",lwd=lwd1)
lines(new.time,pred.pen.HR$HR.inf,lty=2)
lines(new.time,pred.pen.HR$HR.sup,lty=2)
plot(new.time,pred.pen.HR$diff.surv,type="l",
ylim=range(c(pred.pen.HR$diff.surv.inf,pred.pen.HR$diff.surv.sup)),
main="Survival difference (age 70 vs 30)",xlab="time since diagnosis (years)",
ylab="survival difference",col="red",lwd=lwd1)
lines(new.time,pred.pen.HR$diff.surv.inf,lty=2)
lines(new.time,pred.pen.HR$diff.surv.sup,lty=2)
Besides the basics hazard and survival predictions, the user may use
the predict function to retrieve directly the design matrix
corresponding to the new dataset specified. This functionality is
available via the type = lpmatrix
argument. This feature is
particularly useful if the user wants to calculate the predictions from
the model on a arbitrary scale (beyond hazard, cumulative hazard and
survival).
# you can also calculate the hazard yourself with the lpmatrix option.
# For example, compare the following predictions:
haz.pen <- pred.pen$haz
X.pen <- predict(mod.pen,data.frame(fu=new.time),type="lpmatrix")
haz.pen.lpmatrix <- as.numeric(exp(X.pen%*%mod.pen$coefficients))
summary(haz.pen.lpmatrix - haz.pen)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -8.327e-17 0.000e+00 0.000e+00 4.163e-18 0.000e+00 1.388e-16
The 95% confidence intervals can be calculated like this:
# standard errors from the Bayesian covariance matrix Vp
std <- sqrt(rowSums((X.pen%*%mod.pen$Vp)*X.pen))
qt.norm <- stats::qnorm(1-(1-0.95)/2)
haz.inf <- as.vector(exp(X.pen%*%mod.pen$coefficients-qt.norm*std))
haz.sup <- as.vector(exp(X.pen%*%mod.pen$coefficients+qt.norm*std))
# checking that they are similar to the ones given by the predict function
summary(haz.inf - pred.pen$haz.inf)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -5.551e-17 0.000e+00 0.000e+00 2.394e-18 0.000e+00 1.110e-16
summary(haz.sup - pred.pen$haz.sup)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -8.327e-17 0.000e+00 0.000e+00 6.176e-18 0.000e+00 1.388e-16
Let’s look at the summary of mod.pen
summary(mod.pen)
#> penalized hazard model
#>
#> Call:
#> survPen(formula = f.pen, data = datCancer, t1 = fu, event = dead)
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -3.04226 0.12344 -24.645 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Parametric coefficients (exp):
#> Estimate lower (95% CI) upper (95% CI)
#> (Intercept) 0.048 0.037 0.061
#>
#> log-likelihood = -2320.8, penalized log-likelihood = -2321.7
#> Number of parameters = 7, effective degrees of freedom = 3.6549
#> LAML = 2326.3
#>
#> Smoothing parameter(s):
#> smf(fu)
#> 1690.5
#>
#> edf of smooth terms:
#> smf(fu)
#> 2.6549
#>
#> converged= TRUE
Here we get:
All these values can respectively be retrieved as follows:
Standard AIC can be retrieved like this
The effective degrees of freedom used to define the AIC criterion are given here
mod.pen$edf
#> (Intercept) smf(fu).1 smf(fu).2 smf(fu).3 smf(fu).4 smf(fu).5
#> 1.0000000 0.3309511 0.2569940 0.5204774 0.8712478 0.4198833
#> smf(fu).6
#> 0.2553509
If we sum them we get the effective degrees of freedom associated with the model.
If we want to compare penalized models, we can use the AIC corrected for smoothing parameter uncertainty (Wood et al. 2016)
The corrected AIC comes with a new definition for the effective degrees of freedom
The survPen
package offers two criteria to estimate the
smoothing parameters: LCV for Likelihood Cross Validation and LAML for
Laplace Approximate Marginal Likelihood.
f1 <- ~smf(fu)
mod.LCV <- survPen(f1,data=datCancer,t1=fu,event=dead,expected=NULL,method="LCV")
mod.LCV$lambda
#> smf(fu)
#> 3346.303
mod.LAML <- survPen(f1,data=datCancer,t1=fu,event=dead,expected=NULL,method="LAML")
mod.LAML$lambda
#> smf(fu)
#> 3682.498
new.time <- seq(0,5,length=100)
pred.LCV <- predict(mod.LCV,data.frame(fu=new.time))
pred.LAML <- predict(mod.LAML,data.frame(fu=new.time))
par(mfrow=c(1,1))
plot(new.time,pred.LCV$haz,type="l",ylim=c(0,0.2),main="LCV vs LAML",
xlab="time since diagnosis (years)",ylab="hazard",col="black",lwd=lwd1)
lines(new.time,pred.LAML$haz,col="red",lwd=lwd1,lty=2)
legend("topright",legend=c("LCV","LAML"),col=c("black","red"),lty=c(1,2),lwd=rep(lwd1,2))
Choosing either one of them would often not really impact the predictions (the smoothing parameters are similar).
To understand what is going on we can look at the LCV and LAML criteria as functions of the log smoothing parameter.
rho.vec <- seq(-1,15,length=50)
LCV <- rep(0,50)
LAML <- rep(0,50)
for (i in 1:50){
mod <- survPen(f1,data=datCancer,t1=fu,event=dead,lambda=exp(rho.vec[i]))
LCV[i] <- mod$LCV
LAML[i] <- mod$LAML
}
par(mfrow=c(1,2),mar=c(3,3,1.5,0.5),mgp=c(1.5,0.5,0))
plot(rho.vec,LCV,type="l",main="LCV vs log(lambda)",ylab="LCV",xlab="log(lambda)",lwd=lwd1)
plot(rho.vec,LAML,type="l",main="LAML vs log(lambda)",ylab="-LAML",xlab="log(lambda)",lwd=lwd1)
In this case, the functions to minimize give the same smoothing parameter.
Unidimensional penalized spline for time since diagnosis with 5 knots
When knots are not specified, survPen
places them using
quantiles. For example, for the term smf(x,df=df1)
, the
vector of knots will be:
quantile(unique(x),seq(0,1,length=df1))
In this case, we have
df1 <- 5
quantile(unique(datCancer$fu),seq(0,1,length=df1))
#> 0% 25% 50% 75% 100%
#> 0.001455958 0.732075578 1.425060350 2.570586727 5.000000000
You can also retrieve the knots directly from the fitted object
mod1 <- survPen(f1,data=datCancer,t1=fu,event=dead)
mod1$list.smf
#> [[1]]
#> $term
#> [1] "fu"
#>
#> $dim
#> [1] 1
#>
#> $knots
#> $knots$fu
#> 0% 25% 50% 75% 100%
#> 0.001455958 0.732075578 1.425060350 2.570586727 5.000000000
#>
#>
#> $df
#> [1] 5
#>
#> $by
#> [1] "NULL"
#>
#> $same.rho
#> [1] FALSE
#>
#> $name
#> [1] "smf(fu)"
#>
#> attr(,"class")
#> [1] "smf.smooth.spec"
Knots can also be specified by the user
One important feature of the survPen
package is that it
allows fitting penalized excess hazard models.
Excess mortality is a very useful concept that allows estimating the mortality due to a specific disease as the excess mortality as compared to the expected mortality if the studied population did not have the disease. Excess mortality is estimated from all-cause deaths in the studied-population and has two advantages: i) it does not require knowing the cause of death, which may be unavailable and/or unreliable and ii) it accounts for indirect long-term side-effects, such as treatment toxicities, weakening preventing physical activity, weight gains, etc…The expected mortality from other causes is an external data, referred to as the expected mortality hP; it is usually taken as the general population all-cause mortality, assuming the studied population have similar mortality as the general population and that mortality form the disease is negligible in all-cause mortality.
The excess mortality is directly linked to the concept of net survival, which is the survival that would be observed if patients could not die from other causes. Flexible excess hazard models have already been proposed (for examples see Remontet et al. 2007, Charvat et al. 2016) but none of them deals with a penalized framework (outside a Bayesian setting, Hennerfeind et al. 2008).
The mortality (all causes) observed in the patients (hO) is actually decomposed as the sum of the expected mortality hP and the excess mortality due to the pathology (hE).
This may be written as: hO(t, x) = hE(t, x) + hP(a + t, z)
In that equation, t is the time since cancer diagnosis, a is the age at diagnosis, hP is the mortality of the general population time of death, i.e. at age a + t given demographical characteristics z (hP is considered known and available from national statistics), and x a vector of variables that may have an effect on hE. Including the age in the model is necessary in order to deal with the informative censoring due to other causes of death (Danieli et al. 2012).
Thus, for m covariates (x1, …, xm), if we note hE(t, x1, …, xm) the excess hazard at time t, the excess hazard model is the following: log[hE(t, x1, …, xm)] = ∑jgj(t, x1, …, xm)
Let’s compare the predictions from a total hazard model to those of an excess hazard one:
mod.total <- survPen(f1,data=datCancer,t1=fu,event=dead,method="LAML")
mod.excess <- survPen(f1,data=datCancer,t1=fu,event=dead,expected=rate,method="LAML")
# compare the predictions of the models
new.time <- seq(0,5,length=100)
pred.total <- predict(mod.total,data.frame(fu=new.time))
pred.excess <- predict(mod.excess,data.frame(fu=new.time))
# hazard vs excess hazard
par(mfrow=c(1,2))
plot(new.time,pred.total$haz,type="l",ylim=c(0,0.2),main="hazard vs excess hazard",
xlab="time since diagnosis (years)",ylab="hazard",lwd=lwd1)
lines(new.time,pred.excess$haz,col="red",lwd=lwd1,lty=2)
legend("topright",legend=c("total","excess"),col=c("black","red"),lty=c(1,2), lwd=rep(lwd1,2))
plot(new.time,pred.total$surv,type="l",ylim=c(0,1),main="survival vs net survival",
xlab="time",ylab="survival",lwd=lwd1)
lines(new.time,pred.excess$surv,col="red",lwd=lwd1,lty=2)
legend("bottomleft",legend=c("overall survival","net survival"), col=c("black","red"), lty=c(1,2), lwd=rep(lwd1,2))
Tensor product splines represent the key functionality of the
survPen
package. Indeed, they allow us jointly modelling
non-linearity, time-dependency and interactions. Two constructors can be
used :
tensor
, in which the number of associated smoothing
parameters equals the number of covariates involved. This is similar to
te
in the mgcv
package.tint
, which leads to the very same design as
tensor
but decomposes the penalty terms into a main effect
part and an interaction part (this is called ANOVA decoposition of
smooths, see Wood 2006). This is similar to ti
in the
mgcv
package.The tensor
approach allows specifying models like this
one: log[h(t, age)] = f(t, age)
where f is a tensor product spline associated with two smoothing parameters, one for each direction. However, this construction makes the assumption that the main effect of each covariate has the same complexity as its associated effect in the interaction term.
The tint
approach relaxes this assumption. Indeed, the
model would become: log[h(t, age)] = f(t) + g(age) + k(t, age)
where f is associated with
one smoothing parameter, g is
associated with one smoothing parameter and k is associated with two smoothing
parameters. In total we thus have four smoothing parameters in this case
but the design is the same as before. Of course, the tint
approach rapidly reaches its limits in terms of complexity when the
number of covariates rises. Indeed, for example, with three covariates,
while the tensor
approach is associated with three
smoothing parameters, the fully decomposed tint
approach
leads to twelve smoothing parameters to estimate.
The models presented here are a tensor product smooth and a tensor product interaction (Wood 2006) of time since diagnosis and age at diagnosis. Smoothing parameters are estimated via LAML.
f.tensor <- ~tensor(fu,age,df=c(5,5))
f.tint <- ~tint(fu,df=5)+tint(age,df=5)+tint(fu,age,df=c(5,5))
# hazard model
mod.tensor <- survPen(f.tensor,data=datCancer,t1=fu,event=dead)
summary(mod.tensor)
#> penalized hazard model
#>
#> Call:
#> survPen(formula = f.tensor, data = datCancer, t1 = fu, event = dead)
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -3.31334 0.17612 -18.813 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Parametric coefficients (exp):
#> Estimate lower (95% CI) upper (95% CI)
#> (Intercept) 0.036 0.026 0.051
#>
#> log-likelihood = -2106.2, penalized log-likelihood = -2110
#> Number of parameters = 25, effective degrees of freedom = 11.69
#> LAML = 2121.7
#>
#> Smoothing parameter(s):
#> tensor(fu,age).1 tensor(fu,age).2
#> 0.77927 21.67000
#>
#> edf of smooth terms:
#> tensor(fu,age)
#> 10.69
#>
#> converged= TRUE
mod.tint <- survPen(f.tint,data=datCancer,t1=fu,event=dead)
summary(mod.tint)
#> penalized hazard model
#>
#> Call:
#> survPen(formula = f.tint, data = datCancer, t1 = fu, event = dead)
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -3.23237 0.15164 -21.316 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Parametric coefficients (exp):
#> Estimate lower (95% CI) upper (95% CI)
#> (Intercept) 0.039 0.029 0.053
#>
#> log-likelihood = -2106.4, penalized log-likelihood = -2109.6
#> Number of parameters = 25, effective degrees of freedom = 10.462
#> LAML = 2122.7
#>
#> Smoothing parameter(s):
#> tint(fu) tint(age) tint(fu,age).1 tint(fu,age).2
#> 9.4054e-01 6.4517e+00 1.8307e-01 1.2800e+05
#>
#> edf of smooth terms:
#> tint(fu) tint(age) tint(fu,age)
#> 3.5664 2.5192 3.3764
#>
#> converged= TRUE
# predictions
new.age <- seq(50,90,length=50)
new.time <- seq(0,7,length=50)
Z.tensor <- outer(new.time,new.age,function(t,a) predict(mod.tensor,data.frame(fu=t,age=a))$haz)
Z.tint <- outer(new.time,new.age,function(t,a) predict(mod.tint,data.frame(fu=t,age=a))$haz)
# color settings
col.pal <- colorRampPalette(c("white", "red"))
colors <- col.pal(100)
facet <- function(z){
facet.center <- (z[-1, -1] + z[-1, -ncol(z)] + z[-nrow(z), -1] + z[-nrow(z), -ncol(z)])/4
cut(facet.center, 100)
}
theta1 = 30
zmax=1.1
# plot the hazard surfaces for both models
par(mfrow=c(1,2),mar=c(3,3,1.5,0.5),mgp=c(1.5,0.5,0))
persp(new.time,new.age,Z.tensor,col=colors[facet(Z.tensor)],main="tensor",theta=theta1,
xlab="\n time since diagnosis",ylab="\n age",zlab="\n excess hazard",
ticktype="detailed",zlim=c(0,zmax))
persp(new.time,new.age,Z.tint,col=colors[facet(Z.tint)],main="tint",theta=theta1,
xlab="\n time since diagnosis",ylab="\n age",zlab="\n excess hazard",
ticktype="detailed",zlim=c(0,zmax))
The first thing to notice is that the tensor
model is
associated with two smoothing parameters whereas the tint
model is associated with four of them. In the tint
model,
the smoothing parameter associated with age in the interaction term
(tint(fu,age).2) is much higher than the one associated with the main
effect of age (tint(age)). This behaviour is of course impossible to
obtain with the tensor
approach.
Despite this difference, the two approaches show almost identical predictions in this last example.
In practice, consider using the tensor interaction approach if you expect an interaction structure which is either simpler or more complex than the main effects.
Let’s illustrate the differences between tensor
and
tint
. Consider the following dataset
Now we fit the same models as before
mod.tensor.sub <- survPen(f.tensor,data=subdata,t1=fu,event=dead)
mod.tint.sub <- survPen(f.tint,data=subdata,t1=fu,event=dead)
Here are the estimated smoothing parameters and effective degrees of freedom
# tensor
mod.tensor.sub$lambda
#> tensor(fu,age).1 tensor(fu,age).2
#> 241.85707 26.93721
summary(mod.tensor.sub)$edf.per.smooth
#> tensor(fu,age)
#> 3.204508
# tint
mod.tint.sub$lambda
#> tint(fu) tint(age) tint(fu,age).1 tint(fu,age).2
#> 5.206692e+05 2.637013e+04 7.333033e+00 1.228636e+00
summary(mod.tint.sub)$edf.per.smooth
#> tint(fu) tint(age) tint(fu,age)
#> 1.000020 1.000036 1.576766
As we can see, the tint
reduces the edf of the main
effects almost to a minimum of 1 (equivalent to say that the effects are
linear). However, the interaction is a bit more complex. If we look at
the smoothing parameters we see that the main effects have been heavily
penalized whereas the time effect in its interaction with the age effect
has almost not been.
This difference in terms of the extent of penalization between the
main effects and the interactions is not possible with the
tensor
model. Indeed, the estimated smoothing parameters in
the tensor
model concern the main effects as well as the
interactions. And here we see that both the main effects and the
interactions get heavily penalized.
Let’s look at the predictions
new.age <- seq(quantile(subdata$age,0.10),quantile(subdata$age,0.90),length=50)
new.time <- seq(0,max(subdata$fu),length=50)
Z.tensor.sub <- outer(new.time,new.age,function(t,a) predict(mod.tensor.sub,data.frame(fu=t,age=a))$haz)
Z.tint.sub <- outer(new.time,new.age,function(t,a) predict(mod.tint.sub,data.frame(fu=t,age=a))$haz)
theta1 = 30
zmax=0.7
# plot the hazard surfaces for both models
par(mfrow=c(1,2),mar=c(3,3,1.5,0.5),mgp=c(1.5,0.5,0))
persp(new.time,new.age,Z.tensor.sub,col=colors[facet(Z.tensor.sub)],main="tensor",theta=theta1,
xlab="\n time since diagnosis",ylab="\n age",zlab="\n excess hazard",
ticktype="detailed",zlim=c(0,zmax))
persp(new.time,new.age,Z.tint.sub,col=colors[facet(Z.tint.sub)],main="tint",theta=theta1,
xlab="\n time since diagnosis",ylab="\n age",zlab="\n excess hazard",
ticktype="detailed",zlim=c(0,zmax))
The predictions confirm that the interactions between time and age is
much stronger according to the tint
model, especially for
older patients in early follow-up.
To see more precisely these differences, let’s look at the 2D plots. Thus, we predict the dynamics of the excess hazard for four different ages (50, 60, 70 and 80) for both models.
data2D <- expand.grid(fu=new.time,age=c(50,60,70,80))
data2D$haz.tensor <- predict(mod.tensor.sub,data2D)$haz
data2D$haz.tint <- predict(mod.tint.sub,data2D)$haz
par(mfrow=c(2,2),mar=c(3,3,1.5,0.5),mgp=c(1.5,0.5,0))
plot(new.time,data2D[data2D$age==50,]$haz.tensor,type="l",ylim=c(0,0.7),
main="age 50",xlab="time since diagnosis",ylab="excess hazard",lwd=lwd1)
lines(new.time,data2D[data2D$age==50,]$haz.tint,col="red",lty=2,lwd=lwd1)
legend("topright",c("tensor","tint"),lty=c(1,2),col=c("black","red"),lwd=rep(lwd1,2))
for (i in c(60,70,80)){
plot(new.time,data2D[data2D$age==i,]$haz.tensor,type="l",ylim=c(0,0.7),
main=paste("age", i),xlab="time since diagnosis",ylab="excess hazard",lwd=lwd1)
lines(new.time,data2D[data2D$age==i,]$haz.tint,col="red",lty=2,lwd=lwd1)
}
In order to choose between the two models, one can choose the model with minimum AIC corrected for smoothing parameter uncertainty (details in Wood et al. 2016).
In this case, the tensor
model is to be preferred.
The model presented is a tensor product spline of time, age and year of diagnosis (yod).
f4 <- ~tensor(fu,age,yod,df=c(5,5,5))
# excess hazard model
mod6 <- survPen(f4,data=datCancer,t1=fu,event=dead,expected=rate)
summary(mod6)
#> penalized excess hazard model
#>
#> Call:
#> survPen(formula = f4, data = datCancer, t1 = fu, event = dead,
#> expected = rate)
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -3.43068 0.19226 -17.844 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Parametric coefficients (exp):
#> Estimate lower (95% CI) upper (95% CI)
#> (Intercept) 0.032 0.022 0.047
#>
#> log-likelihood = -2035.5, penalized log-likelihood = -2040.4
#> Number of parameters = 125, effective degrees of freedom = 17.789
#> LAML = 2046.5
#>
#> Smoothing parameter(s):
#> tensor(fu,age,yod).1 tensor(fu,age,yod).2 tensor(fu,age,yod).3
#> 0.2725 14.1290 82.4000
#>
#> edf of smooth terms:
#> tensor(fu,age,yod)
#> 16.789
#>
#> converged= TRUE
# predictions of surfaces for years 1990, 1997, 2003 and 2010
new.age <- seq(50,90,length=50)
new.time <- seq(0,5,length=50)
Z_1990 <- outer(new.time,new.age,function(t,a) predict(mod6,data.frame(fu=t,yod=1990,age=a))$haz)
Z_1997 <- outer(new.time,new.age,function(t,a) predict(mod6,data.frame(fu=t,yod=1997,age=a))$haz)
Z_2003 <- outer(new.time,new.age,function(t,a) predict(mod6,data.frame(fu=t,yod=2003,age=a))$haz)
Z_2010 <- outer(new.time,new.age,function(t,a) predict(mod6,data.frame(fu=t,yod=2010,age=a))$haz)
par(mfrow=c(1,2),mar=c(3,3,1.5,0.5),mgp=c(1.5,0.5,0))
persp(new.time,new.age,Z_1990,col=colors[facet(Z_1990)],main="1990",theta=20,
xlab="\n time since diagnosis",ylab="\n age",zlab="\n excess hazard",
ticktype="detailed",zlim=c(0,1))
persp(new.time,new.age,Z_1997,col=colors[facet(Z_1997)],main="1997",theta=20,
xlab="\n time since diagnosis",ylab="\n age",zlab="\n excess hazard",
ticktype="detailed",zlim=c(0,1))
par(mfrow=c(1,2),mar=c(3,3,1.5,0.5),mgp=c(1.5,0.5,0))
persp(new.time,new.age,Z_2003,col=colors[facet(Z_2003)],main="2003",theta=20,
xlab="\n time since diagnosis",ylab="\n age",zlab="\n excess hazard",
ticktype="detailed",zlim=c(0,1))
persp(new.time,new.age,Z_2010,col=colors[facet(Z_2010)],main="2010",theta=20,
xlab="\n time since diagnosis",ylab="\n age",zlab="\n excess hazard",
ticktype="detailed",zlim=c(0,1))
Nothing stops the user from using four-dimensional or even five-dimensional tensor product splines but in practice, using the tensor approach beyond three covariates can be extremely time- and memory-consuming. You can try with four covariates if the situation demands it and if you do not have too many degrees of freedom for each marginal basis.
The smf
, tensor
and tint
terms
used to specify smooths accept an argument by
that allows
for building varying-coefficient models i.e. for letting smoothers
‘interact’ with factors or parametric terms.
For continuous variables, simple linear interaction with a smooth
term may be specified through the by
argument, as in the
following model (using age as the continuous covariate):
log[h(t, age)] = f(t) + β × age + g(t) × age
where f and g are penalized splines. In
survPen
, this model is specified with formula
smf(t) + smf(t,by=age)
. Note that the main effect of age is
included in the term smf(t,by=age)
. You do not want to
include the main effect of age, then use tint(t,by=age)
.
This is useful if we want to fit the following model for example:
log[h(t, age)] = f(t) + f2(age) + g(t) × age
Where f2 is a
penalized spline. Such a model is specified via
smf(t) + smf(age) + tint(t,by=age)
.
Technically, if a by
variable is numeric, then its ith
element multiples the ith row
of the model matrix corresponding to the smooth term concerned.
Factor by
variables allow specifying three types of
models:
The following model is an example of stratified analysis:
log[h(t, sex)] = fwomen(t) + fmen(t)
where fwomen
and fmen
are penalized splines corresponding to the baseline hazards for women
and for men, respectively. In this design, the regression parameters for
men are completely independent from the parameters for women. The
smoothing parameters for men and women (λmen
and λwomen)
are independently estimated as well. This model is therefore equivalent
to a stratified analysis. The model would be specified by using the term
sex + smf(t,by=sex)
. Be careful here, contrary to the
continuous setting, smf(t,by=sex)
is subject to centering
constraints and does not include the main effect of sex.
The stratified design with common smoothing parameters applied to the
above model would impose λmen = λwomen.
This is useful if we think that the baseline hazard for women is likely
to be as complex as the one for men. In that case, the formula becomes
sex + smf(t,by=sex,same.rho=TRUE)
and the model estimates a
unique smoothing parameter. In the stratified analysis, the formula is
actually sex + smf(t,by=sex,same.rho=FALSE)
as
same.rho=FALSE
is the default setting.
The penalized difference approach allows specifying models like this one:
log[h(t, sex)] = f(t) + fdiffmen(t)
In this model, f is common
to men and women whereas fdiffmen
represents what must be added to f in order to get the effect of time
among men. In other words, fdiffmen
is a difference smooth between men and women. Since this
smoothed difference is defined on the log-hazard scale, we can see that
fdiffmen
actually corresponds to the log-hazard ratio between men and women.
Thus, the real advantage of the difference smooth approach is to allow
defining the penalty on the log-hazard ratio scale instead on the
classical log-hazard one. This model is obtained by the formula
sex + smf(t) + smf(t,by=sex)
where sex has been
turned into an ordered factor. The reference modality is then the first
level of the ordered factor (in that case its women).
Technically, if a by
variable is a factor then it
generates an indicator vector for each level of the factor, unless it is
an ordered factor. In the non-ordered case, the model matrix for the
smooth term is then replicated for each factor level, and each copy has
its rows multiplied by the corresponding rows of its indicator variable.
The smoothness penalties are also duplicated for each factor level. In
short, a different smoother is generated for each factor level. Ordered
by
variables are handled in the same way, except that no
smooth is generated for the first level of the ordered factor (like in
the mgcv
package). This is useful if you are interested in
differences from a reference level.
by
variableIn what follows we will illustrate the by
functionality
with a factor sex variable. First we simulate survival times from a
Weibull distribution. The parameters of the distribution depend on the
sex of each individual (proportional effect).
n <- 10000
don <- data.frame(num=1:n)
shape_men <- 0.90 # shape for men (first weibull parameter)
shape_women <- 0.90 # shape for women
scale_men <- 0.6 # second weibull parameter
scale_women <- 0.7
prop_men <- 0.5 # proportion of men
set.seed(50)
don$sex <- factor(sample(c("men","women"),n,replace=TRUE,prob=c(prop_men,1-prop_men)))
don$sex.order <- factor(don$sex,levels=c("women","men"),ordered=TRUE)
don$shape <- ifelse(don$sex=="men",shape_men,shape_women)
don$scale <- ifelse(don$sex=="men",scale_men,scale_women)
don$fu <- rweibull(n,shape=don$shape,scale=don$scale)
don$dead <- 1 # no censoring
Now we look at the theoretical hazard and hazard ratio functions:
hazard <- function(x,shape,scale){
exp(dweibull(x,shape=shape,scale=scale,log=TRUE) - pweibull(x,shape=shape,scale=scale,log.p=TRUE,lower.tail=FALSE))
}
nt <- seq(0.01,5,by=0.1)
mar1 <- c(3,3,1.5,0.5)
mgp1 <- c(1.5,0.5,0)
par(mfrow=c(1,2),mar=mar1,mgp=mgp1)
plot(nt,hazard(nt,shape_women,scale_women),type="l",
xlab="time",ylab="hazard",lwd=lwd1,main="Theoretical hazards",
ylim=c(0,max(hazard(nt,shape_women,scale_women),hazard(nt,shape_men,scale_men))))
lines(nt,hazard(nt,shape_men,scale_men),col="red",lwd=lwd1,lty=2)
legend("bottomleft",c("women","men"),lty=c(1,2),lwd=rep(lwd1,2),col=c("black","red"))
plot(nt,hazard(nt,shape_men,scale_men)/hazard(nt,shape_women,scale_women),type="l",
xlab="time",ylab="hazard ratio",lwd=lwd1,
ylim=c(0,2),
main="Theoretical HR men / women")
We are going to compare 4 approaches:
# knots for time
knots.t <- quantile(don$fu,seq(0,1,length=10))
# stratified analysis via the two models
m.men <- survPen(~smf(fu,knots=knots.t),t1=fu,event=dead,data=don[don$sex=="men",])
m.women <- survPen(~smf(fu,knots=knots.t),t1=fu,event=dead,data=don[don$sex=="women",])
# by variable with same.rho = FALSE
m.FALSE <- survPen(~sex + smf(fu,by=sex,same.rho=FALSE,knots=knots.t),t1=fu,event=dead,data=don)
# by variable with same.rho = TRUE
m.TRUE <- survPen(~sex + smf(fu,by=sex,same.rho=TRUE,knots=knots.t),t1=fu,event=dead,data=don)
# difference smooth via ordered factor by variable
m.difference <- survPen(~sex.order + smf(fu,knots=knots.t) +smf(fu,by=sex.order,same.rho=FALSE,knots=knots.t),t1=fu,event=dead,data=don)
Let’s look at the predicted hazard functions
newt <- seq(0,5,by=0.1)
data.pred <- expand.grid(fu=newt,sex=c("women","men"))
data.pred$men <- ifelse(data.pred$sex=="men",1,0)
data.pred$women <- ifelse(data.pred$sex=="women",1,0)
data.pred$sex.order <- data.pred$sex # no need to reorder here as the model keeps track of the factor's structure
data.pred$haz.men <- predict(m.men,data.pred)$haz
data.pred$haz.women <- predict(m.women,data.pred)$haz
data.pred$haz.FALSE <- predict(m.FALSE,data.pred)$haz
data.pred$haz.TRUE <- predict(m.TRUE,data.pred)$haz
data.pred$haz.difference <- predict(m.difference,data.pred)$haz
# predicting hazard
ylim1 <- c(0,max(data.pred$haz.men,data.pred$haz.women,
data.pred$haz.FALSE,data.pred$haz.TRUE,data.pred$haz.difference))
par(mfrow=c(1,2),mar=mar1,mgp=mgp1)
plot(newt,data.pred[data.pred$sex=="men",]$haz.men,type="l",main="Men",lwd=lwd1,
ylim=ylim1,xlab="time since diagnosis",ylab="hazard")
lines(newt,data.pred[data.pred$sex=="men",]$haz.FALSE,col="red",lwd=lwd1,lty=2)
lines(newt,data.pred[data.pred$sex=="men",]$haz.TRUE,col="green3",lwd=lwd1,lty=4)
lines(newt,data.pred[data.pred$sex=="men",]$haz.difference,col="orange",lwd=lwd1,lty=5)
lines(nt,hazard(nt,shape_men,scale_men),col="blue3",lty=3)
legend("bottomleft",c("stratified","same.rho=FALSE","same.rho=TRUE","difference smooth","true"),lty=c(1,2,4,5,3),
col=c("black","red","green3","orange","blue3"),lwd=c(rep(lwd1,4),1))
plot(newt,data.pred[data.pred$sex=="women",]$haz.women,type="l",main="Women",lwd=lwd1,
ylim=ylim1,xlab="time since diagnosis",ylab="hazard")
lines(newt,data.pred[data.pred$sex=="women",]$haz.FALSE,col="red",lwd=lwd1,lty=2)
lines(newt,data.pred[data.pred$sex=="women",]$haz.TRUE,col="green3",lwd=lwd1,lty=4)
lines(newt,data.pred[data.pred$sex=="women",]$haz.difference,col="orange",lwd=lwd1,lty=5)
lines(nt,hazard(nt,shape_women,scale_women),col="blue3",lty=3)
As expected, the stratified and same.rho=FALSE approaches are identical.
The first three approaches give here very similar predictions. The difference approach gives smoother estimates among men and slightly more wiggly ones among women. Among men, the predictions from the difference approach are the closest to the true values.
Now let’s look at the corresponding hazard ratios men/women.
# predicting hazard ratio men / women
HR.stratified <- data.pred[data.pred$sex=="men",]$haz.men / data.pred[data.pred$sex=="women",]$haz.women
HR.FALSE <- data.pred[data.pred$sex=="men",]$haz.FALSE / data.pred[data.pred$sex=="women",]$haz.FALSE
HR.TRUE <- data.pred[data.pred$sex=="men",]$haz.TRUE / data.pred[data.pred$sex=="women",]$haz.TRUE
HR.difference <- data.pred[data.pred$sex=="men",]$haz.difference / data.pred[data.pred$sex=="women",]$haz.difference
par(mfrow=c(1,1))
plot(newt,HR.stratified,type="l",main="Hazard ratio, Men/Women",lwd=lwd1,
ylim=c(0,2),xlab="time since diagnosis",ylab="hazard ratio")
lines(newt,HR.FALSE,col="red",lwd=lwd1,lty=2)
lines(newt,HR.TRUE,col="green3",lwd=lwd1,lty=4)
lines(newt,HR.difference,col="orange",lwd=lwd1,lty=5)
abline(h=hazard(nt,shape_men,scale_men)/hazard(nt,shape_women,scale_women),lty=3,col="blue3")
legend("bottomright",c("stratified","same.rho=FALSE","same.rho=TRUE","difference smooth","true"),lty=c(1,2,4,5,3),
col=c("black","red","green3","orange","blue3"),lwd=c(rep(lwd1,4),1))
Again, the approaches stratified and same.rho=FALSE are identical. They give the same wiggly hazard ratio curve that is quite difficult to justify and explain. The same.rho=TRUE gives a slightly less wiggly hazard ratio curve whereas the difference approach gives a straight line not too far the true constant value.
In this kind of situations, using an ordered by
variable
might be advantageous.
by
variableContinuous by
variable allows specifying time-varying
coefficients models, i.e models in which a penalized spline is in
interaction with the parametric effect of another covariate.
Do not refrain to center continuous covariates to avoid convergence
issues (especially when said continuous covariates are used as
by
variables)
Penalized cubic spline of time with linear interaction with age: log[h(t, age)] = f(t) + β × age + g(t) × age
Another option to fit the same model
m.bis <- survPen(~smf(fu) + agec + tint(fu,by=agec,df=10),data=datCancer,t1=fu,event=dead)
m.bis$ll.pen # same penalized log-likelihood as m
#> [1] -2112.848
Penalized cubic spline of time, penalized cubic spline of age and penalized cubic spline of time with linear interaction with age: log[h(t, age)] = f(t) + g(age) + k(t) × age
m2 <- survPen(~tint(fu,df=10) + tint(agec,df=10) + tint(fu,by=agec,df=10),data=datCancer,t1=fu,event=dead)
m2$ll.pen
#> [1] -2110.94
Be careful here. In model m, the effect of age is included
in the term smf(fu,by=agec)
. In m.bis, the term
tint(fu,by=agec,df=10)
is subjected to centering
constraints and the effect of age itself is not included and therefore
must be added as a parametric term. tint
is particularly
useful when several smoothers contain the same continuous
by
variable. Be also careful when using tint
instead of smf
since the default df is not the same (5 vs
10).
survPen
allows including independent gaussian random
effects, since such effects may be easily implemented though the
penalization framework by using a ridge penalty (details in Wood 2017,
section 5.8).
This approach allows implementing commonly used random effect
structures via the rd
constructor. For example if g is a factor then rd(g) produces a
random parameter for each level of g, the random parameters being
i.i.d. normal. If g is a
factor and x is numeric, then
rd(g, x)
produces an i.i.d. normal random slope relating the response to x for each level of g.
Thus, random effects treated as penalized splines allow specifying frailty (excess) hazard models (Charvat et al. 2016). For each individual i from cluster (usually geographical unit) j, a possible model would be: log[h(tij, xij1, …, xijm)] = ∑kgk(tij, xij1, …, xijm) + wj
where wj follows a
normal distribution with mean 0. uj = exp(wj)
is known as the frailty term. The random effect associated with the
cluster variable (random intercept) is specified with the model term
rd(cluster)
. We could also specify a random effect
depending on age (random slope) for example with the model term rd(cluster, age)
(wj would
then become wj × ageij
in the above formula). Note that only independent random effets can yet
be specified. For example, the model term rd(cluster) + rd(cluster, age)
creates a random intercept and a random slope of age but it is not
possible to estimate any covariance parameters between them.
Technically, when using rd(cluster)
, the associated
regression parameters wj are assumed
i.i.d. normal, with unknown variance (to be estimated). This assumption
is equivalent to an identity penalty matrix (i.e. a ridge penalty) on
the regression parameters. The unknown smoothing parameter λ associated with the term
rd(cluster)
is directly linked to the unknown variance
σ2:
$$ \sigma^2 = \frac{1}{\lambda \times S.scale} $$
with S.scale the rescaling factor associated with λ (technical point: all penalty matrices used to define the penalized likelihood of the model are rescaled in order to be comparable in terms of a certain matrix norm. The associated rescaling factors are stored in the S.scale vector).
The log standard deviation of the random effect is thus estimated by: log(σ̂) = −0.5 × log(λ̂) − 0.5 × log(S.scale)
And the estimated variance of the log standard deviation is: Var[log(σ̂)] = 0.25 × Var[log(λ̂)] = 0.25 × inv.Hess.rho
To illustrate the use of the rd
constructor, let’s set
up the following simple simulation:
h(tij) = h0(tij)exp(wj)
where wj ∼ 𝒩(0, 0.12) and h0(t) = b−a × a × ta − 1.
The baseline hazard corresponds to a Weibull distribution with shape a = 0.9 and scale b = 2.
set.seed(1)
# Weibull parameters
shape <- 0.9
scale <- 2
# number of simulated datasets
NFile <- 50
# number of individuals per dataset
n <- 2000
# number of clusters
NCluster <- 20
# data frame
data.rd <- data.frame(cluster=seq(1:NCluster))
cluster <- sample(rep(1:NCluster,each=n/NCluster))
don <- data.frame(num=1:n, cluster=factor(cluster)) # be careful, cluster needs to be a factor !
don <- merge(don, data.rd, by="cluster")[, union(names(don), names(data.rd))]
don <- don[order(don$num),]
rownames(don) <- NULL
# theoretical standard deviation
sd1 <- 0.1
# vector of estimated log standard deviations
log.sd.vec <- rep(as.numeric(NA),NFile)
# maximum follow-up time
max.time <- 5
For each simulated dataset, we are going to fit the following model (for individual i in cluster j):
log[h(tij)] = spline(tij) + clusterj
for (file in 1:NFile){
wj <- rnorm(NCluster,mean=0,sd=sd1)
don$wj <- wj[don$cluster]
# simulated times
u <- runif(n)
don$fu <- exp( 1/shape*(log(-log(1-u)) - don$wj) + log(scale))
# censoring
don$dead <- ifelse(don$fu <= max.time,1,0)
don$fu <- pmin(don$fu,max.time)
# fitting
mod.frailty <- survPen(~smf(fu)+rd(cluster),data=don,t1=fu,event=dead)
# estimated log standard deviation
log.sd.vec[file] <- summary(mod.frailty)$random.effects[,"Estimate"]
}
# Relative Bias in percentage for sd1
100*(mean(exp(log.sd.vec)) - sd1)/sd1
#> [1] -14.73251
As we can see from this very simple simulation, the standard deviation is pretty well estimated.
Let’s look at the summary of the last model
summary(mod.frailty)
#> penalized hazard model
#>
#> Call:
#> survPen(formula = ~smf(fu) + rd(cluster), data = don, t1 = fu,
#> event = dead)
#>
#> Parametric coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.759376 0.025514 -29.763 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Parametric coefficients (exp):
#> Estimate lower (95% CI) upper (95% CI)
#> (Intercept) 0.468 0.445 0.492
#>
#> Random effects (log(sd)):
#> Estimate Std. Error
#> rd(cluster) -4.643896 19.86456
#>
#> log-likelihood = -3111.3, penalized log-likelihood = -3111.6
#> Number of parameters = 30, effective degrees of freedom = 2.6505
#> LAML = 3116.3
#>
#> Smoothing parameter(s):
#> smf(fu) rd(cluster)
#> 72076.00 442.12
#>
#> edf of smooth terms:
#> smf(fu) rd(cluster)
#> 1.49460 0.15597
#>
#> converged= TRUE
Here, we have sd(wj)= exp(-4.6438963) = 0.0096201. You can retrieve this value from the model like this
or like this
Predictions for specific cluster levels (Best Linear Unbiased Prediction)
# 1-year survival for a patient in cluster 6
predict(mod.frailty,data.frame(fu=1,cluster=6))$surv
#> [1] 0.5972545
# 1-year survival for a patient in cluster 10
predict(mod.frailty,data.frame(fu=1,cluster=10))$surv
#> [1] 0.5968989
Prediction by setting the random effect to zero (we still need to specify a cluster level but it is disregarded)
The t0
argument allows specifying entry times in case of
left-truncated data
Nota Bene: The begin
variable was
simulated for illustration purposes only and is not representative of
cancer data.
# fitting
f1 <- ~smf(fu)
mod.trunc <- survPen(f1,data=datCancer,t0=begin,t1=fu,event=dead,expected=NULL,method="LAML")
# predictions
new.time <- seq(0,5,length=100)
pred.trunc <- predict(mod.trunc,data.frame(fu=new.time))
par(mfrow=c(1,2))
plot(new.time,pred.trunc$haz,type="l",ylim=c(0,0.2),main="Hazard",
xlab="time since diagnosis (years)",ylab="hazard",lwd=lwd1)
plot(new.time,pred.trunc$surv,type="l",ylim=c(0,1),main="Survival",
xlab="time since diagnosis (years)",ylab="survival",lwd=lwd1)
Another important feature of the survPen
package is that
it allows fitting penalized relative mortality ratio models.
As we discussed above, the excess mortality setting considers that the mortality (all causes) observed in the patients (hO) is actually decomposed as the sum of the expected mortality hP and the excess mortality due to the pathology (hE).
This may be written as: hO(t, x) = hE(t, x) + hP(a + t, z)
One limitation of such a decomposition is that hE is considered positive. Indeed, sometimes this assumption is not met. For example, in prostate cancer patients with low stages at diagnosis, we observe an ‘undermortality’ due to selection effects and better overall medical care. In that case, the excess mortality is actually neagtive and the net survival setting fails to describe the reality of those patients. Besides, the excess mortality setting considers the studied disease as an independent cause of death (conditionally on the covariates) compared to the other causes. This point of view is not usely considered in multiple sclerosis epidemiology for example, where the disease is seen as a comorbidity impacting all pre- existing causes of death. In that case, the observed hazard is decomposed as product of population hazard and a relative mortality ratio r
This may be written as: hO(t, x) = r(t, x) * hP(a + t, z)
This decomposition was first proposed in a modelling framework by Andersen et al. (1985). However Andersen’s model was a non-flexible semi-parametric model.
The survPen
package allows modelling the relative
mortality ratio r as a
multidimensional function of time and covariates. For m covariates (x1, …, xm),
if we note r(t, x1, …, xm)
the relative mortality ratio at time t, the model is as follows: log[r(t, x1, …, xm)] = ∑jgj(t, x1, …, xm)
Where the gj functions may be penalized unidimensional or penalized tensor product splines. All features described for the (excess) hazard setting still apply when fitting a relative mortality ratio model. One difference lies in the predictions. With a fitted relative mortality ratio model, you can only retrieve the relative mortality ratio and cumulative relative mortality ratio predictions (with CIs), as well as the ratios of realtive mortality ratio (with type=‘HR’). No survival prediction (let alone survival difference) will be directly available because its calculation depends on expected mortality rates.
Finally, one important difference between an excess hazard model and
relative mortality ratio model is data preparation. For an excess hazard
model we only need individual data with expected mortality rate at the
time of death. Whereas in a relative mortality ratio model, the
contribution to an individual to the likelihood requires all possible
expected mortality rate values during the entire follow-up. Therefore,
since the expected mortality rates come from national mortality tables
usually available in 1-year intervals, we need to split the original
dataset as many times as there are 1-year intervals during each
individual’s follow-up. The function splitmult
will help
you getting the splitdataset from the original one.
Let’s fit an example model on the log relative mortality ratio scale
library(survPen)
data(datCancer)
data(expected.table)
#-------------------- creating split dataset for multiplicative model
splitdat <- splitmult(datCancer, cut = (1:5), end = "fu",
event = "dead")
#-------------------- merging with expected mortality table
# deriving current age and year (closest whole number)
splitdat$age_current <- floor(splitdat$age + splitdat$fu + 0.5)
splitdat$year_current <- floor(splitdat$yod + splitdat$fu + 0.5)
splitdat <- merge(splitdat, expected.table,
by.x=c("age_current","year_current"), by.y=c("Age","Year"),all.x=T)
# fitting model
f1 <- ~smf(fu)
# In terms of Gauss-Legendre quadrature, as the follow-up is split into up to 5 parts, the cumulative hazard approximation
# requires less nodes than an approximation on the whole range of definition. If not supplied, the default number of nodes
# for a relative mortality ratio model will be 10 (as opposed to 20 for an excess hazard model)
mod.ratio <- survPen(f1,data=splitdat,t1=fu,event=dead,expected=mx,method="LAML",type="mult")
# predictions of the model
new.time <- seq(0,5,length=100)
pred.ratio <- predict(mod.ratio,data.frame(fu=new.time))
lwd1 <- 2
par(mfrow=c(1,1))
plot(new.time,pred.ratio$ratio,type="l",ylim=range(c(pred.ratio$ratio.inf,pred.ratio$ratio.sup)),
main="realtive mortality ratio with CIs",
xlab="time since diagnosis (years)",ylab="relative mortality ratio",lwd=lwd1)
lines(new.time,pred.ratio$ratio.inf,lty=2)
lines(new.time,pred.ratio$ratio.sup,lty=2)
In presence of correlated time-to-event data (for example recurrent
event data), robust standard errors accounting for said correlation need
to be derived. The survPen
package allows deriving such
robust standard errors based on sandwich estimators (often called Huber
sandwich estimator, see also Coz et al. submitted to Biostatistics, for
an example in the recurrent event setting).
The user only needs to specify the cluster
variable
defining the statistical units for which repeated observations are
available. This specification is performed via the cluster
argument.
Let’s fit an example model for the marginal intensity of hospitalization events in heart failure patients
library(survPen)
data(HeartFailure)
# We are going to fit an unpenalized spline on the log-marginal intensity in order to use the robust variance with
# a proper theoretical background. Indeed, with penalized splines it is advised to use
# bootstrap for confidence intervals (Coz et al. submitted to Biostatistics).
# Number of knots for an unpenalized natural cubic spline
df.t <- 4
k.t <- c(quantile(HeartFailure$t1[HeartFailure[,"event"] == 1],seq(0,1,le=df.t)))
# We consider a non-proportional effect of treatment
mod_MI <- survPen(~ treatment + smf(t1, knots = k.t, by = treatment),
t0 = t0,
t1 = t1, data = HeartFailure, event = event, cluster=id, lambda=0)
#> Warning in model.cons(formula, lambda, data, t1, t1.name, t0, t0.name, event, :
#> lambda is of length 1. All smoothing parameters (2) are set to 0
# predictions
nt <- seq(0,3.27,le = 100)
pred.MI <- expand.grid(t1 = nt, treatment = c(0,1))
# marginal intensity
pred.MI$MI <- predict(mod_MI, newdata = pred.MI)$haz
pred.MI$MI.inf <- predict(mod_MI, newdata = pred.MI)$haz.inf
pred.MI$MI.sup <- predict(mod_MI, newdata = pred.MI)$haz.sup
# In the absence of a terminal event we can retrieve the expected number of cumulative events
# according to time
pred.MI$Expected_Nb_event <- -log(predict(mod_MI, newdata = pred.MI)$surv)
pred.MI$Expected_Nb_event.inf <- -log(predict(mod_MI, newdata = pred.MI)$surv.sup)
pred.MI$Expected_Nb_event.sup <- -log(predict(mod_MI, newdata = pred.MI)$surv.inf)
lwd1 <- 2
vec.col <- c("red","forestgreen")
par(mfrow=c(1,2))
plot(nt,pred.MI$MI[pred.MI$treatment==0],type="l",
main="marginal intensity with CIs",ylim=c(0,1.5),
xlab="time (years)",ylab="marginal intensity",lwd=lwd1,col=vec.col[1])
lines(nt,pred.MI$MI.inf[pred.MI$treatment==0],lty=2,col=vec.col[1])
lines(nt,pred.MI$MI.sup[pred.MI$treatment==0],lty=2,col=vec.col[1])
lines(nt,pred.MI$MI[pred.MI$treatment==1],lwd=lwd1,lty=1,col=vec.col[2])
lines(nt,pred.MI$MI.inf[pred.MI$treatment==1],lty=2,col=vec.col[2])
lines(nt,pred.MI$MI.sup[pred.MI$treatment==1],lty=2,col=vec.col[2])
legend("bottomleft",c("exercise training","control"),col=vec.col,lty=1)
plot(nt,pred.MI$Expected_Nb_event[pred.MI$treatment==0],type="l",
main="Expected cumulative number of events",ylim=c(0,3),
xlab="time (years)",ylab="expected cumulative number of events",lwd=lwd1,col=vec.col[1])
lines(nt,pred.MI$Expected_Nb_event.inf[pred.MI$treatment==0],lty=2,col=vec.col[1])
lines(nt,pred.MI$Expected_Nb_event.sup[pred.MI$treatment==0],lty=2,col=vec.col[1])
lines(nt,pred.MI$Expected_Nb_event[pred.MI$treatment==1],lty=1,lwd=lwd1,col=vec.col[2])
lines(nt,pred.MI$Expected_Nb_event.inf[pred.MI$treatment==1],lty=2,col=vec.col[2])
lines(nt,pred.MI$Expected_Nb_event.sup[pred.MI$treatment==1],lty=2,col=vec.col[2])
# Comparing naïve and robust standard errors
naive <- sqrt(diag(mod_MI$Vp))
robust <- sqrt(diag(mod_MI$Vr))
print(naive)
#> (Intercept) treatment1 smf(t1):treatment0.1
#> 0.04763256 0.06318756 0.07376488
#> smf(t1):treatment0.2 smf(t1):treatment0.3 smf(t1):treatment1.1
#> 0.07998189 0.14181439 0.06481038
#> smf(t1):treatment1.2 smf(t1):treatment1.3
#> 0.07120717 0.12506416
print(robust)
#> (Intercept) treatment1 smf(t1):treatment0.1
#> 0.04924672 0.06488291 0.08513027
#> smf(t1):treatment0.2 smf(t1):treatment0.3 smf(t1):treatment1.1
#> 0.09843445 0.15330109 0.07879878
#> smf(t1):treatment1.2 smf(t1):treatment1.3
#> 0.09897790 0.14980254
The survPen
package estimates the smoothing parameters
with either LCV or LAML. However, one can be interested in comparing the
effect of a smoothing parameter on the predicted hazard or survival. The
argument lambda
allows the user to choose specific values
for the smoothing parameters.
f.pen <- ~ smf(fu)
vec.lambda <- c(0,1000,10^6)
new.time <- seq(0,5,length=100)
par(mfrow=c(1,3),mar=c(3,3,1.5,0.5),mgp=c(1.5,0.5,0))
for (i in (1:3)){
mod.pen <- survPen(f.pen,data=datCancer,t1=fu,event=dead,lambda=vec.lambda[i])
pred.pen <- predict(mod.pen,data.frame(fu=new.time))
plot(new.time,pred.pen$haz,type="l",ylim=c(0,0.2),main=paste0("hazard vs time, lambda = ",vec.lambda[i]),
xlab="time since diagnosis (years)",ylab="hazard",col="black",lwd=lwd1)
}
If you observe a convergence problem and intend to fix it, a simple
option is to change the initial values used by the algorithm. The
argument beta.ini
allows you setting the regression
parameters to specific values. This is especially useful if your excess
hazard model fails to converge. Indeed, in that case, you can try to fit
the corresponding total hazard model and use its estimated regression
parameters as initial values for the excess hazard model. The argument
rho.ini
allows you choosing initial values for the log
smoothing parameters. Consider also changing LAML to LCV to see if the
convergence problem persists.
If a convergence problem occurs, it is always instructive to know
exactly what is going on inside the optimization process. The arguments
detail.rho
and detail.beta
were made to make
the user’s life easier when dealing with convergence issues.
The example below shows the effect of specifying
detail.rho=TRUE
when fitting a model
mod.pen <- survPen(f.pen,data=datCancer,t1=fu,event=dead,detail.rho=TRUE)
#> _______________________________________________________________________________________
#>
#> Beginning smoothing parameter estimation via LAML optimization
#> ______________________________________________________________________________________
#>
#> --------------------
#> Initial calculation
#> -------------------
#>
#>
#>
#> new step = 86.6
#> new step corrected = 5
#>
#>
#> Smoothing parameter selection, iteration 1
#>
#> _______________________________________________________________________________________
#>
#> iter LAML : 1
#> rho.old= -1
#> rho= 4
#> val.old= 2347.586
#> val= 2330.74
#> val-val.old= -16.84601
#> gradient= -2.2
#>
#> _______________________________________________________________________________________
#>
#>
#>
#>
#> Smoothing parameter selection, iteration 2
#>
#> _______________________________________________________________________________________
#>
#> iter LAML : 2
#> rho.old= 4
#> rho= 7.7566
#> val.old= 2330.74
#> val= 2326.32
#> val-val.old= -4.4199
#> gradient= -0.22
#>
#> _______________________________________________________________________________________
#>
#>
#>
#>
#> Smoothing parameter selection, iteration 3
#>
#> _______________________________________________________________________________________
#>
#> iter LAML : 3
#> rho.old= 7.7566
#> rho= 8.2241
#> val.old= 2326.32
#> val= 2326.269
#> val-val.old= -0.05042
#> gradient= 0.0064
#>
#> _______________________________________________________________________________________
#>
#>
#>
#>
#> Smoothing parameter selection, iteration 4
#>
#> _______________________________________________________________________________________
#>
#> iter LAML : 4
#> rho.old= 8.2241
#> rho= 8.2113
#> val.old= 2326.269
#> val= 2326.269
#> val-val.old= -4e-05
#> gradient= 6.7e-06
#>
#> _______________________________________________________________________________________
#>
#>
#>
#> Smoothing parameter(s) selection via LAML ok, 4 iterations
#> ______________________________________________________________________________________
At each iteration, you get:
In this example, we see that the first Newton step is huge (86.6 on
the log scale is huge). When that happens the algorithm forbids the step
value to be over the step.max
argument (default is 5).
The example below shows the effect of specifying
detail.rho=TRUE
and detail.beta=TRUE
when
fitting a model and thus illustrates the two nested Newton-Raphson
algorithms.
mod.pen <- survPen(f.pen,data=datCancer,t1=fu,event=dead,detail.rho=TRUE,detail.beta=TRUE)
#> _______________________________________________________________________________________
#>
#> Beginning smoothing parameter estimation via LAML optimization
#> ______________________________________________________________________________________
#>
#> --------------------
#> Initial calculation
#> -------------------
#>
#> ---------------------------------------------------------------------------------------
#> Beginning regression parameter estimation
#>
#> iter beta: 1
#> betaold= -2.3569 0 0 0 0 0 0 0 0 0
#> beta= -1.9734 -0.0489 -0.0892 -0.017 0.069 0.099 0.0861 -0.0087 -0.9255 0.2804
#> abs((beta-betaold)/betaold)= 0.16271 Inf Inf Inf Inf Inf Inf Inf Inf Inf
#> ll.pen.old= -8754.749
#> ll.pen= -4128.377
#> ll.pen-ll.pen.old= 4626.372
#>
#> iter beta: 2
#> betaold= -1.9734 -0.0489 -0.0892 -0.017 0.069 0.099 0.0861 -0.0087 -0.9255 0.2804
#> beta= -1.3122 -0.1065 -0.2063 -0.0399 0.1521 0.223 0.193 -0.0067 -1.7624 0.8343
#> abs((beta-betaold)/betaold)= 0.33508 1.18016 1.31217 1.35048 1.20398 1.25307 1.24171 0.22956 0.90434 1.97539
#> ll.pen.old= -4128.377
#> ll.pen= -2711.747
#> ll.pen-ll.pen.old= 1416.63
#>
#> iter beta: 3
#> betaold= -1.3122 -0.1065 -0.2063 -0.0399 0.1521 0.223 0.193 -0.0067 -1.7624 0.8343
#> beta= -0.5875 -0.1054 -0.2512 -0.059 0.1504 0.2501 0.2122 0.0847 -2.4514 1.6471
#> abs((beta-betaold)/betaold)= 0.55226 0.01027 0.21751 0.47772 0.01155 0.12128 0.0994 13.68239 0.39097 0.97416
#> ll.pen.old= -2711.747
#> ll.pen= -2370.601
#> ll.pen-ll.pen.old= 341.146
#>
#> iter beta: 4
#> betaold= -0.5875 -0.1054 -0.2512 -0.059 0.1504 0.2501 0.2122 0.0847 -2.4514 1.6471
#> beta= -0.1298 -0.0392 -0.2243 -0.0876 0.0603 0.1942 0.1293 0.3425 -2.916 2.4493
#> abs((beta-betaold)/betaold)= 0.77903 0.62802 0.10699 0.48566 0.59918 0.22328 0.39056 3.04525 0.18951 0.48707
#> ll.pen.old= -2370.601
#> ll.pen= -2320.979
#> ll.pen-ll.pen.old= 49.62232
#>
#> iter beta: 5
#> betaold= -0.1298 -0.0392 -0.2243 -0.0876 0.0603 0.1942 0.1293 0.3425 -2.916 2.4493
#> beta= 0.0049 0.0086 -0.2149 -0.1081 0.0045 0.185 0.055 0.5494 -3.1121 2.8758
#> abs((beta-betaold)/betaold)= 1.03773 1.21987 0.04189 0.23381 0.92484 0.04745 0.57452 0.60426 0.06725 0.17413
#> ll.pen.old= -2320.979
#> ll.pen= -2318.206
#> ll.pen-ll.pen.old= 2.7728
#>
#> iter beta: 6
#> betaold= 0.0049 0.0086 -0.2149 -0.1081 0.0045 0.185 0.055 0.5494 -3.1121 2.8758
#> beta= 0.0153 0.0149 -0.2154 -0.1095 -0.0034 0.1894 0.04 0.5853 -3.1407 2.9464
#> abs((beta-betaold)/betaold)= 2.13347 0.7224 0.00235 0.01291 1.7404 0.02398 0.27293 0.06521 0.0092 0.02458
#> ll.pen.old= -2318.206
#> ll.pen= -2318.182
#> ll.pen-ll.pen.old= 0.02392
#>
#> iter beta: 7
#> betaold= 0.0153 0.0149 -0.2154 -0.1095 -0.0034 0.1894 0.04 0.5853 -3.1407 2.9464
#> beta= 0.0154 0.0149 -0.2154 -0.1094 -0.0035 0.1896 0.0397 0.5859 -3.1412 2.9478
#> abs((beta-betaold)/betaold)= 0.00431 0.00383 0.00012 0.00022 0.03289 0.00078 0.00725 0.00107 0.00016 0.00045
#> ll.pen.old= -2318.182
#> ll.pen= -2318.182
#> ll.pen-ll.pen.old= 0
#>
#> iter beta: 8
#> betaold= 0.0154 0.0149 -0.2154 -0.1094 -0.0035 0.1896 0.0397 0.5859 -3.1412 2.9478
#> beta= 0.0154 0.0149 -0.2154 -0.1094 -0.0035 0.1896 0.0397 0.5859 -3.1412 2.9478
#> abs((beta-betaold)/betaold)= 0 0 0 0 1e-05 0 0 0 0 0
#> ll.pen.old= -2318.182
#> ll.pen= -2318.182
#> ll.pen-ll.pen.old= 0
#>
#>
#> Beta optimization ok, 8 iterations
#> --------------------------------------------------------------------------------------
#>
#>
#> new step = 86.6
#> new step corrected = 5
#>
#>
#> Smoothing parameter selection, iteration 1
#>
#> ---------------------------------------------------------------------------------------
#> Beginning regression parameter estimation
#>
#> iter beta: 1
#> betaold= 0.0154 0.0149 -0.2154 -0.1094 -0.0035 0.1896 0.0397 0.5859 -3.1412 2.9478
#> beta= 0.006 0.0091 -0.1068 -0.0735 0.0136 0.1955 0.0783 0.5774 -3.1212 2.8726
#> abs((beta-betaold)/betaold)= 0.60769 0.39263 0.50423 0.3285 4.92589 0.03097 0.97074 0.01443 0.00637 0.02551
#> ll.pen.old= -2319.891
#> ll.pen= -2319.091
#> ll.pen-ll.pen.old= 0.80024
#>
#> iter beta: 2
#> betaold= 0.006 0.0091 -0.1068 -0.0735 0.0136 0.1955 0.0783 0.5774 -3.1212 2.8726
#> beta= 0.006 0.0092 -0.1068 -0.0734 0.0138 0.1955 0.0779 0.5777 -3.1216 2.8715
#> abs((beta-betaold)/betaold)= 0.00907 0.01159 0.00016 0.00076 0.01201 0.00021 0.00471 4e-04 0.00011 0.00036
#> ll.pen.old= -2319.091
#> ll.pen= -2319.091
#> ll.pen-ll.pen.old= 0.00014
#>
#> iter beta: 3
#> betaold= 0.006 0.0092 -0.1068 -0.0734 0.0138 0.1955 0.0779 0.5777 -3.1216 2.8715
#> beta= 0.006 0.0092 -0.1068 -0.0734 0.0138 0.1955 0.0779 0.5777 -3.1216 2.8715
#> abs((beta-betaold)/betaold)= 0 0 0 0 1e-05 0 0 0 0 0
#> ll.pen.old= -2319.091
#> ll.pen= -2319.091
#> ll.pen-ll.pen.old= 0
#>
#>
#> Beta optimization ok, 3 iterations
#> --------------------------------------------------------------------------------------
#> _______________________________________________________________________________________
#>
#> iter LAML : 1
#> rho.old= -1
#> rho= 4
#> val.old= 2347.586
#> val= 2330.74
#> val-val.old= -16.84601
#> gradient= -2.2
#>
#> _______________________________________________________________________________________
#>
#>
#>
#>
#> Smoothing parameter selection, iteration 2
#>
#> ---------------------------------------------------------------------------------------
#> Beginning regression parameter estimation
#>
#> iter beta: 1
#> betaold= 0.006 0.0092 -0.1068 -0.0734 0.0138 0.1955 0.0779 0.5777 -3.1216 2.8715
#> beta= 2e-04 3e-04 -0.0053 -0.0077 0.0048 0.0648 0.0893 0.4628 -3.0454 2.6652
#> abs((beta-betaold)/betaold)= 0.96815 0.97017 0.95065 0.89517 0.65432 0.66869 0.14592 0.1989 0.02439 0.07184
#> ll.pen.old= -2340.432
#> ll.pen= -2321.439
#> ll.pen-ll.pen.old= 18.99299
#>
#> iter beta: 2
#> betaold= 2e-04 3e-04 -0.0053 -0.0077 0.0048 0.0648 0.0893 0.4628 -3.0454 2.6652
#> beta= 2e-04 3e-04 -0.0052 -0.0076 0.0048 0.0643 0.0871 0.4643 -3.0476 2.6661
#> abs((beta-betaold)/betaold)= 0.07155 0.01716 0.00858 0.0096 0.00375 0.00724 0.02393 0.00338 0.00071 0.00032
#> ll.pen.old= -2321.439
#> ll.pen= -2321.438
#> ll.pen-ll.pen.old= 0.00126
#>
#> iter beta: 3
#> betaold= 2e-04 3e-04 -0.0052 -0.0076 0.0048 0.0643 0.0871 0.4643 -3.0476 2.6661
#> beta= 2e-04 3e-04 -0.0052 -0.0076 0.0048 0.0643 0.0871 0.4643 -3.0476 2.6661
#> abs((beta-betaold)/betaold)= 0 0 0 0 0 1e-05 2e-05 0 0 0
#> ll.pen.old= -2321.438
#> ll.pen= -2321.438
#> ll.pen-ll.pen.old= 0
#>
#>
#> Beta optimization ok, 3 iterations
#> --------------------------------------------------------------------------------------
#> _______________________________________________________________________________________
#>
#> iter LAML : 2
#> rho.old= 4
#> rho= 7.7566
#> val.old= 2330.74
#> val= 2326.32
#> val-val.old= -4.4199
#> gradient= -0.22
#>
#> _______________________________________________________________________________________
#>
#>
#>
#>
#> Smoothing parameter selection, iteration 3
#>
#> ---------------------------------------------------------------------------------------
#> Beginning regression parameter estimation
#>
#> iter beta: 1
#> betaold= 2e-04 3e-04 -0.0052 -0.0076 0.0048 0.0643 0.0871 0.4643 -3.0476 2.6661
#> beta= 1e-04 1e-04 -0.0034 -0.0051 0.003 0.0441 0.0762 0.4359 -3.0353 2.6368
#> abs((beta-betaold)/betaold)= 0.42488 0.50152 0.35583 0.32587 0.37734 0.31442 0.12602 0.06122 0.00405 0.011
#> ll.pen.old= -2321.898
#> ll.pen= -2321.813
#> ll.pen-ll.pen.old= 0.08487
#>
#> iter beta: 2
#> betaold= 1e-04 1e-04 -0.0034 -0.0051 0.003 0.0441 0.0762 0.4359 -3.0353 2.6368
#> beta= 1e-04 1e-04 -0.0034 -0.0051 0.003 0.0441 0.0761 0.436 -3.0353 2.6368
#> abs((beta-betaold)/betaold)= 0.00078 0.00116 0.00018 0.00043 0.00113 0.00018 4e-05 0.00026 2e-05 1e-05
#> ll.pen.old= -2321.813
#> ll.pen= -2321.813
#> ll.pen-ll.pen.old= 0
#>
#> iter beta: 3
#> betaold= 1e-04 1e-04 -0.0034 -0.0051 0.003 0.0441 0.0761 0.436 -3.0353 2.6368
#> beta= 1e-04 1e-04 -0.0034 -0.0051 0.003 0.0441 0.0761 0.436 -3.0353 2.6368
#> abs((beta-betaold)/betaold)= 0 0 0 0 0 0 0 0 0 0
#> ll.pen.old= -2321.813
#> ll.pen= -2321.813
#> ll.pen-ll.pen.old= 0
#>
#>
#> Beta optimization ok, 3 iterations
#> --------------------------------------------------------------------------------------
#> _______________________________________________________________________________________
#>
#> iter LAML : 3
#> rho.old= 7.7566
#> rho= 8.2241
#> val.old= 2326.32
#> val= 2326.269
#> val-val.old= -0.05042
#> gradient= 0.0064
#>
#> _______________________________________________________________________________________
#>
#>
#>
#>
#> Smoothing parameter selection, iteration 4
#>
#> ---------------------------------------------------------------------------------------
#> Beginning regression parameter estimation
#>
#> iter beta: 1
#> betaold= 1e-04 1e-04 -0.0034 -0.0051 0.003 0.0441 0.0761 0.436 -3.0353 2.6368
#> beta= 1e-04 1e-04 -0.0034 -0.0052 0.003 0.0446 0.0765 0.4369 -3.0357 2.6376
#> abs((beta-betaold)/betaold)= 0.01538 0.02071 0.01216 0.01101 0.01354 0.0111 0.00429 0.00191 0.00011 0.00031
#> ll.pen.old= -2321.802
#> ll.pen= -2321.802
#> ll.pen-ll.pen.old= 5e-05
#>
#> iter beta: 2
#> betaold= 1e-04 1e-04 -0.0034 -0.0052 0.003 0.0446 0.0765 0.4369 -3.0357 2.6376
#> beta= 1e-04 1e-04 -0.0034 -0.0052 0.003 0.0446 0.0765 0.4369 -3.0357 2.6376
#> abs((beta-betaold)/betaold)= 0 0 0 0 0 0 0 0 0 0
#> ll.pen.old= -2321.802
#> ll.pen= -2321.802
#> ll.pen-ll.pen.old= 0
#>
#>
#> Beta optimization ok, 2 iterations
#> --------------------------------------------------------------------------------------
#> _______________________________________________________________________________________
#>
#> iter LAML : 4
#> rho.old= 8.2241
#> rho= 8.2113
#> val.old= 2326.269
#> val= 2326.269
#> val-val.old= -4e-05
#> gradient= 6.7e-06
#>
#> _______________________________________________________________________________________
#>
#>
#>
#> Smoothing parameter(s) selection via LAML ok, 4 iterations
#> ______________________________________________________________________________________
Here, within each iteration of the log smoothing parameters, for each iteration of the regression parameters, you get:
When using detail.rho
or detail.beta
, you
might also see from time to time a warning message indicating that the
Hessian (of LCV, LAML or the penalized likelihood) has been perturbed.
As long as this Hessian perturbation does not occur at the very last
iteration, you do not need to worry about this warning. The algorithm is
just making sure that the step it is going to take is a descent
direction. However, if the Hessian perturbation occurs at convergence
(whether for the smoothing or regression parameters), it might indicate
a convergence issue (see the Hess.beta.modif and
Hess.rho.modif values returned by the model).