Straightforward way to quickly create data to make model predictions.

create_prediction_data(
  model_data,
  conditional_data = NULL,
  num = mean,
  cat = "most_common",
  ...
)

Arguments

model_data

The original data. Ideally this would come from a model object.

conditional_data

A data.frame constructed from something like base::expand.grid

num

A function like mean or median to be applied to numeric data. Should return a single value. Default is mean.

cat

Set categorical variables to the reference level ('ref') or the most frequently occurring category (most_common, the default).

...

Additional arguments to num, e.g. na.rm=T

Value

A data frame suitable for the newdata argument for predict functions.

Details

Given data that was used in a model, create data that can be used for predictions at key values, especially as a prelude to visualization. Some package functions can be found that do this, but are specific to certain models or don't quite provide the flexibility I want. Specifically, this allows for an arbitrary function to apply to numeric variables, and for categorical(ish) variables, one has the option for the most common category (ties go to the first category), or the reference level if a factor. For now class Date is treated as categorical.

In addition, one may supply their own data to set certain variables at any particular values via the conditional_data argument, for example, using expand.grid or tidyr::crossing. Variables not supplied as columns in the conditional_data are treated as above.

Examples

library(tidyext) create_prediction_data(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.843333 3.057333 3.758 1.199333 setosa
create_prediction_data(iris, num = median, expand.grid( Sepal.Width=c(0,3,5), Species = c('setosa', 'virginica') ) )
#> Sepal.Width Species Sepal.Length Petal.Length Petal.Width #> 1 0 setosa 5.8 4.35 1.3 #> 2 3 setosa 5.8 4.35 1.3 #> 3 5 setosa 5.8 4.35 1.3 #> 4 0 virginica 5.8 4.35 1.3 #> 5 3 virginica 5.8 4.35 1.3 #> 6 5 virginica 5.8 4.35 1.3
create_prediction_data(iris, num = function(x) quantile(x, p=.75))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 6.4 3.3 5.1 1.8 setosa
test_mod = lm(mpg ~ wt + cyl, mtcars) nd = create_prediction_data(test_mod$model) predict(test_mod, newdata = nd)
#> 1 #> 20.09062