Add indicators for all desired variables in a data set.
onehot( data, var = NULL, nas = "na.pass", sparse = FALSE, keep.original = FALSE )
data | A data frame |
---|---|
var | A character string/vector of names to be encoded. If NULL, the default, all character and factor variables will be encoded. |
nas | What to do with missing values. For na.omit and na.exclude, any observations with missing data will be removed from the result. With na.pass, the default, the result will retain the missing values. Otherwise, with na.fail, an error will be thrown. |
sparse | Logical (default FALSE). If true, will return only the encoded variables as a sparse matrix. |
keep.original | Logical (default FALSE). Keep the original variables? Not an option if sparse is TRUE. |
A data.frame with the encoded variables, or a sparse matrix of only the encoded variables.
This function is a simple one-hot encoder, with a couple options that are commonly desired. Takes the applicable variables and creates a binary indicator column for each unique value. If supplied non-factor/character variables, it will coerce them to characters and proceed accordingly. Will handle missingness, return a sparse matrix, or keep the original variable(s) as desired.
#> 'data.frame': 150 obs. of 8 variables: #> $ Sepal.Length : num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... #> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... #> $ Petal.Length : num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... #> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... #> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... #> $ Species_setosa : num 1 1 1 1 1 1 1 1 1 1 ... #> $ Species_versicolor: num 0 0 0 0 0 0 0 0 0 0 ... #> $ Species_virginica : num 0 0 0 0 0 0 0 0 0 0 ...#> Formal class 'dgCMatrix' [package "Matrix"] with 6 slots #> ..@ i : int [1:150] 0 1 2 3 4 5 6 7 8 9 ... #> ..@ p : int [1:4] 0 50 100 150 #> ..@ Dim : int [1:2] 150 3 #> ..@ Dimnames:List of 2 #> .. ..$ : chr [1:150] "1" "2" "3" "4" ... #> .. ..$ : chr [1:3] "Species_xsetosa" "Species_xversicolor" "Species_xvirginica" #> ..@ x : num [1:150] 1 1 1 1 1 1 1 1 1 1 ... #> ..@ factors : list()#> #>#> #>#> 'data.frame': 32 obs. of 14 variables: #> $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... #> $ disp : num 160 160 108 258 360 ... #> $ hp : num 110 110 93 110 175 105 245 62 95 123 ... #> $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ... #> $ wt : num 2.62 2.88 2.32 3.21 3.44 ... #> $ qsec : num 16.5 17 18.6 19.4 17 ... #> $ am : num 1 1 1 0 0 0 0 0 0 0 ... #> $ gear : num 4 4 4 3 3 3 3 4 4 4 ... #> $ carb : num 4 4 1 1 2 1 4 2 2 4 ... #> $ cyl_4: num 0 0 1 0 0 0 0 1 1 0 ... #> $ cyl_6: num 1 1 0 1 0 1 0 0 0 1 ... #> $ cyl_8: num 0 0 0 0 1 0 1 0 0 0 ... #> $ vs_0 : num 1 1 0 0 1 0 1 0 0 0 ... #> $ vs_1 : num 0 0 1 1 0 1 0 1 1 1 ...#> 'data.frame': 150 obs. of 7 variables: #> $ Sepal.Length : num 5.1 4.9 NA NA 5 NA 4.6 NA 4.4 4.9 ... #> $ Sepal.Width : num 3.5 3 NA NA 3.6 NA 3.4 NA 2.9 3.1 ... #> $ Petal.Length : num 1.4 1.4 NA NA 1.4 NA 1.4 NA 1.4 1.5 ... #> $ Petal.Width : num 0.2 0.2 NA NA 0.2 NA 0.3 NA 0.2 0.1 ... #> $ Species_setosa : num 1 1 NA NA 1 NA 1 NA 1 1 ... #> $ Species_versicolor: num 0 0 NA NA 0 NA 0 NA 0 0 ... #> $ Species_virginica : num 0 0 NA NA 0 NA 0 NA 0 0 ...#> 'data.frame': 125 obs. of 7 variables: #> $ Sepal.Length : num 5.1 4.9 5 4.6 4.4 4.9 4.8 4.8 4.3 5.8 ... #> $ Sepal.Width : num 3.5 3 3.6 3.4 2.9 3.1 3.4 3 3 4 ... #> $ Petal.Length : num 1.4 1.4 1.4 1.4 1.4 1.5 1.6 1.4 1.1 1.2 ... #> $ Petal.Width : num 0.2 0.2 0.2 0.3 0.2 0.1 0.2 0.1 0.1 0.2 ... #> $ Species_setosa : num 1 1 1 1 1 1 1 1 1 1 ... #> $ Species_versicolor: num 0 0 0 0 0 0 0 0 0 0 ... #> $ Species_virginica : num 0 0 0 0 0 0 0 0 0 0 ...