Generate predicted data for plotting results of regression modelsSource:
This is an alternate interface to the underlying tools that
effect_plot() as well as
interactions::cat_plot() from the
make_predictions() creates the data to be plotted and adds information
to the original data to make it more amenable for plotting with the
make_predictions(model, ...) # S3 method for default make_predictions( model, pred, pred.values = NULL, at = NULL, data = NULL, center = TRUE, interval = TRUE, int.type = c("confidence", "prediction"), int.width = 0.95, outcome.scale = "response", robust = FALSE, cluster = NULL, vcov = NULL, set.offset = NULL, new_data = NULL, return.orig.data = FALSE, partial.residuals = FALSE, ... )
The model (e.g.,
The name of the focal predictor as a string. This is the variable for which, if you are plotting, you'd likely have along the x-axis (with the dependent variable as the y-axis).
The values of
predyou want to include. Default is NULL, which means a sequence of equi-spaced values over the range of a numeric predictor or each level of a non-numeric predictor.
If you want to manually set the values of other variables in the model, do so by providing a named list where the names are the variables and the list values are vectors of the values. This can be useful especially when you are exploring interactions or other conditional predictions.
Optional, default is NULL. You may provide the data used to fit the model. This can be a better way to get mean values for centering and can be crucial for models with variable transformations in the formula (e.g.,
log(x)) or polynomial terms (e.g.,
poly(x, 2)). You will see a warning if the function detects problems that would likely be solved by providing the data with this argument and the function will attempt to retrieve the original data from the global environment.
Set numeric covariates to their mean? Default is TRUE. You may also just provide a vector of names (as strings) of covariates to center. Note that for
svyglmmodels, the survey-weighted means are used. For models with weights, these are weighted means.
TRUE, plots confidence/prediction intervals around the line using
Type of interval to plot. Options are "confidence" or "prediction". Default is confidence interval.
How large should the interval be, relative to the standard error? The default, .95, corresponds to roughly 1.96 standard errors and a .05 alpha level for values outside the range. In other words, for a confidence interval, .95 is analogous to a 95% confidence interval.
For nonlinear models (i.e., GLMs), should the outcome variable be plotted on the link scale (e.g., log odds for logit models) or the original scale (e.g., predicted probabilities for logit models)? The default is
"response", which is the original scale. For the link scale, which will show straight lines rather than curves, use
Should robust standard errors be used to find confidence intervals for supported models? Default is FALSE, but you should specify the type of sandwich standard errors if you'd like to use them (i.e.,
"HC1", and so on). If
TRUE, defaults to
For clustered standard errors, provide the column name of the cluster variable in the input data frame (as a string). Alternately, provide a vector of clusters.
Optional. You may supply the variance-covariance matrix of the coefficients yourself. This is useful if you are using some method for robust standard error calculation not supported by the sandwich package.
For models with an offset (e.g., Poisson models), sets an offset for the predicted values. All predicted values will have the same offset. By default, this is set to 1, which makes the predicted values a proportion. See details for more about offset support.
If you would prefer to generate your own hypothetical (or not hypothetical) data rather than have the function make a call to
make_new_data(), you can provide it.
Instead of returning a just the predicted data frame, should the original data be returned as well? If so, then a list will be return with both the predicted data (as the first element) and the original data (as the second element). Default is FALSE.
return.orig.datais TRUE, should the observed dependent variable be replaced with the partial residual? This makes a call to
partialize(), where you can find more details.