Basic Usage

Two class


N = 2500
x = rnorm(N)
lp = 0 + .5*x
y = rbinom(N, size = 1, prob = plogis(lp))

model = glm(y ~ x, family = binomial)
predict_class = predict(model) > 0
confusion_matrix(predict_class, y)
# A tibble: 1 x 5
  Accuracy `Accuracy LL` `Accuracy UL` `Accuracy Guessing` `Accuracy P-value`
     <dbl>         <dbl>         <dbl>               <dbl>              <dbl>
1    0.591         0.572         0.611               0.513           2.88e-15

# A tibble: 1 x 19
  Positive     N `N Positive` `N Negative` `Sensitivity/Re… `Specificity/TN…
  <chr>    <int>        <int>        <int>            <dbl>            <dbl>
1 0         2500         1283         1217            0.634            0.546
# … with 13 more variables: `PPV/Precision` <dbl>, NPV <dbl>, `F1/Dice` <dbl>,
#   Prevalence <dbl>, `Detection Rate` <dbl>, `Detection Prevalence` <dbl>,
#   `Balanced Accuracy` <dbl>, FDR <dbl>, FOR <dbl>, `FPR/Fallout` <dbl>,
#   FNR <dbl>, `D Prime` <dbl>, AUC <dbl>

$`Association and Agreement`
# A tibble: 1 x 6
  Kappa `Adjusted Rand`  Yule   Phi Peirce Jaccard
  <dbl>           <dbl> <dbl> <dbl>  <dbl>   <dbl>
1 0.180          0.0329 0.351 0.181  0.180   0.394


Works for the multi-class setting too, but not all statistics are applicable/available. Note that the ‘Other’ statistics are provided for each class, as well as averaged.

p_multi = sample(letters[1:4], 250, replace = TRUE, prob = 1:4)
o_multi = sample(letters[1:4], 250, replace = TRUE, prob = 1:4)

confusion_matrix(p_multi, o_multi)
Warning in calc_agreement(conf_mat): Some association metrics may not be
calculated due to lack of 2x2 table
# A tibble: 1 x 5
  Accuracy `Accuracy LL` `Accuracy UL` `Accuracy Guessing` `Accuracy P-value`
     <dbl>         <dbl>         <dbl>               <dbl>              <dbl>
1    0.344         0.285         0.406               0.388              0.933

# A tibble: 5 x 17
  Class     N `Sensitivity/Re… `Specificity/TN… `PPV/Precision`   NPV `F1/Dice`
  <chr> <dbl>            <dbl>            <dbl>           <dbl> <dbl>     <dbl>
1 a      24             0.0833            0.907          0.0870 0.903    0.0851
2 b      54             0.278             0.806          0.283  0.802    0.280 
3 c      75             0.333             0.714          0.333  0.714    0.333 
4 d      97             0.454             0.641          0.444  0.649    0.449 
5 Aver…  62.5           0.287             0.767          0.287  0.767    0.287 
# … with 10 more variables: Prevalence <dbl>, `Detection Rate` <dbl>,
#   `Detection Prevalence` <dbl>, `Balanced Accuracy` <dbl>, FDR <dbl>,
#   FOR <dbl>, `FPR/Fallout` <dbl>, FNR <dbl>, `D Prime` <dbl>, AUC <dbl>

$`Association and Agreement`
# A tibble: 1 x 6
   Kappa `Adjusted Rand` Yule  Phi   Peirce Jaccard
   <dbl>           <dbl> <lgl> <lgl> <lgl>  <lgl>  
1 0.0652       0.0000361 NA    NA    NA     NA     

Available Statistics


  • Accuracy
  • Lower and Upper Limits
  • Guessing rate
  • Test of accuracy vs. the guessing rate

Additional Statistics

  • Relative class sizes
  • Sensitivity/Recall/True Positive Rate
  • Specificity/True Negative Rate
  • Positive Predictive Value/Precision
  • Negative Predictive Value
  • F1 score/Dice coefficient
  • Prevalence
  • Detection Rate
  • Detection Prevalence
  • Balanced Accuracy
  • False Discovery Rate
  • False Omission Rate
  • False Positive Rate/Fallout
  • False Negative Rate
  • D’
  • Area Under a ROC

Association and Agreement

  • Kappa
  • Adjusted Rand
  • Yule
  • Phi
  • Peirce/Youden J
  • Jaccard

Long vs. Wide

In wide format, you have a simple way to pluck any particular statistic.

cm = confusion_matrix(predict_class, y)

[1] 2500
cm[['Association and Agreement']][['Yule']]
[1] 0.3514883

However, if you want to present a lot of statistics, a longer format might be more ideal.

confusion_matrix(predict_class, y, longer = TRUE)
# A tibble: 5 x 2
  Statistic            Value
  <chr>                <dbl>
1 Accuracy          5.91e- 1
2 Accuracy LL       5.72e- 1
3 Accuracy UL       6.11e- 1
4 Accuracy Guessing 5.13e- 1
5 Accuracy P-value  2.88e-15

# A tibble: 18 x 3
   Positive Statistic                 Value
   <chr>    <chr>                     <dbl>
 1 0        N                      2500    
 2 0        N Positive             1283    
 3 0        N Negative             1217    
 4 0        Sensitivity/Recall/TPR    0.634
 5 0        Specificity/TNR           0.546
 6 0        PPV/Precision             0.595
 7 0        NPV                       0.586
 8 0        F1/Dice                   0.614
 9 0        Prevalence                0.513
10 0        Detection Rate            0.326
11 0        Detection Prevalence      0.547
12 0        Balanced Accuracy         0.590
13 0        FDR                       0.405
14 0        FOR                       0.414
15 0        FPR/Fallout               0.454
16 0        FNR                       0.366
17 0        D Prime                   0.458
18 0        AUC                       0.627

$`Association and Agreement`
# A tibble: 6 x 2
  Statistic      Value
  <chr>          <dbl>
1 Kappa         0.180 
2 Adjusted Rand 0.0329
3 Yule          0.351 
4 Phi           0.181 
5 Peirce        0.180 
6 Jaccard       0.394 

Either way, the tibble output will work nicely with various packages for formatting tables.