Straightforward way to quickly create data to make model predictions.
create_prediction_data( model_data, conditional_data = NULL, num = mean, cat = "most_common", ... )
model_data | The original data. Ideally this would come from a model object. |
---|---|
conditional_data | A data.frame constructed from something like
|
num | A function like mean or median to be applied to numeric data. Should return a single value. Default is mean. |
cat | Set categorical variables to the reference level ('ref') or the most frequently occurring category (most_common, the default). |
... | Additional arguments to num, e.g. |
A data frame suitable for the newdata
argument for predict functions.
Given data that was used in a model, create data that can be used for predictions at key values, especially as a prelude to visualization. Some package functions can be found that do this, but are specific to certain models or don't quite provide the flexibility I want. Specifically, this allows for an arbitrary function to apply to numeric variables, and for categorical(ish) variables, one has the option for the most common category (ties go to the first category), or the reference level if a factor. For now class Date is treated as categorical.
In addition, one may supply their own data to set certain variables at any
particular values via the conditional_data
argument, for example,
using expand.grid
or tidyr::crossing
. Variables not supplied
as columns in the conditional_data
are treated as above.
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.843333 3.057333 3.758 1.199333 setosacreate_prediction_data(iris, num = median, expand.grid( Sepal.Width=c(0,3,5), Species = c('setosa', 'virginica') ) )#> Sepal.Width Species Sepal.Length Petal.Length Petal.Width #> 1 0 setosa 5.8 4.35 1.3 #> 2 3 setosa 5.8 4.35 1.3 #> 3 5 setosa 5.8 4.35 1.3 #> 4 0 virginica 5.8 4.35 1.3 #> 5 3 virginica 5.8 4.35 1.3 #> 6 5 virginica 5.8 4.35 1.3#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 6.4 3.3 5.1 1.8 setosatest_mod = lm(mpg ~ wt + cyl, mtcars) nd = create_prediction_data(test_mod$model) predict(test_mod, newdata = nd)#> 1 #> 20.09062