An R package implementing the SIMEX technique for classical additive measurement errors in generalized linear models.
The SIMEX technique accounts for additive measurement errors in explanatory variables when fitting a statistical model. This technique was initially developed by Cook and Stefanski 1994 and Stefanski and Cook 1995. Caroll et al. 2006 also includes a complete chapter on this technique.
The SIMEXGLM package implements the SIMEX technique with the following statistical models:
- the logistic regression
- the negative binomial regression
Compared to other existing packages implementing the SIMEX technique, the SIMEXGLM package allows the specification of multiple terms involving the linear term plus some transformation of the variables with measurement errors. For instance, if W is the variable with measurement error, the package allows the fit of the model E[Y] = g(W, W2), with g being the link function. An example of code is shown below.
The source code of the SIMEXGLM package is freely available at https://github.com/CWFC-CCFB/SIMEXGLM . The SIMEXGLM package is licensed under the GNU Lesser General Public License v3.0 (LGPL-3).
The backend of this R package is implemented in Java in two libraries:
- repicea, licensed under LGPL-3 (https://github.com/CWFC-CCFB/repicea)
- repicea-mathstats, licensed under LGPL-3 (https://github.com/CWFC-CCFB/repicea-mathstats)
Tickets can be created at https://github.com/CWFC-CCFB/SIMEXGLM/issues .
Mathieu Fortin e-mail: [email protected]
The SIMEXGLM package depends on J4R for accessing the Java library. Java version 8 or later must be installed on your computer. Please see https://github.com/CWFC-CCFB/J4R/wiki#requirements for more information.
Once Java is installed on your computed, you just have to copy and paste these lines of code in your R console:
install.packages("https://sourceforge.net/projects/repiceasource/files/latest/download", repos = NULL, type="source") ### To install J4R (dependency)
library(remotes)
install_github("CWFC-CCFB/SIMEXGLM") ### install SIMEXGLM directly from GitHub
Here is an example with the logistic regression with the complementary log-log link function:
require(SIMEXGLM)
data("simexExample")
mySIMEX <- SIMEXGLM("y ~ distanceToConspecific + sqr(distanceToConspecific)", # the formula
"Bernoulli", # the distribution
"CLogLog", # the link function
simexExample, # the data
"distanceToConspecific", # variable with measurement error
"variance",
nbThreads = 3) # variance of the measurement error
summary(mySIMEX)
plot(mySIMEX)
shutdownClient()
The sqr
function embedded in the formula means that the square of the distanceToConspecific variable is specified in the model. Regular summary
and plot
functions allow to visualize the results. In its current version, the SIMEXGLM package allows the following transformation in model formula:
- sqr(): the square of the argument
- log(): the natural logarithm of the argument
- exp(): the exponential of the argument (beta)
It is rather difficult to obtain convergence with the log transformation as the variance inflation may lead to negative values. So far, the sqr function has proven much more stable.
Here is an example of negative binomial regression with a log link function:
data("simexExampleNegBinomial")
mySIMEX <- SIMEXGLM("y ~ TotalPrcp + G_F + G_R + occIndex10km + timeSince1970", # the formula
"NegativeBinomial", # the distribution
"Log", # the link function
simexExampleNegBinomial, # the data
"occIndex10km", # variable with measurement error
"occIndex10kmVar",
nbThreads = 3) # variance of the measurement error
summary(mySIMEX)
plot(mySIMEX)
shutdownClient()
The call to the shutdownClient
function shuts down the client and the Java server as well avoiding having an idle process.