When working with linear regression and kmeans clustering in R, it’s nice to:

  • have a one-stop shop summary data frame on model statistics
  • have a data frame with prediction, residual or cluster assignment with respect to each observation

The sptidy package provides tools that make these tasks swift and easy.

Overview

This vignette is the package-wide documentation for R package sptidy. This helps those who want to use this package better understand what sptidy does. A full usage demonstration of all functions in this package is included in this vignette.

Available functions

sptidy provides functions for model evaluation and analysis. These functions work with 2 types of models: lm() and kmeans().

  • linear regression
    • tidy_lr outputs summary on lm()
    • augment_lr augments a data frame with predictions and residuals
  • kmeans clustering
    • tidy_kmeans outputs summary on kmeans()
    • augment_kmeans augments a data frame with cluster assignments

Data

First, let us load sptidy and 2 data set: longley, iris for function usage demonstration.

library(sptidy)
data(longley)
data(iris)

Function Usage Demonstration

tidy_lr(): clean summary output for lm()

sptidy::tidy_lr() provides a tidy data frame that summarizes a fitted linear regression object lm(). The argument in the function needs to be a fitted lm() object. The output data frame has 4 columns, describing coefficient estimates, standard error, t-statistics and p-values.

my_lr <- lm(Employed~., data = longley)
sptidy::tidy_lr(my_lr)
#>                    coef  std_err t_stats  p_val
#> (Intercept)  -3482.2586 890.4204 -3.9108 0.0036
#> GNP.deflator     0.0151   0.0849  0.1774 0.8631
#> GNP             -0.0358   0.0335 -1.0695 0.3127
#> Unemployed      -0.0202   0.0049 -4.1364 0.0025
#> Armed.Forces    -0.0103   0.0021 -4.8220 0.0009
#> Population      -0.0511   0.2261 -0.2261 0.8262
#> Year             1.8292   0.4555  4.0159 0.0030

augment_lr(): augment data frame from lm()

sptidy::augment_lr() augments the data frame with predictions and residuals from a fitted linear regression object lm(). The first argument is the fitted lm() object. The second and third argument refer to the feature data frame and target data frame that are fitted to the lm() object. The output data frame has additional 2 columns, describing predictions and residuals with respect to each observation.

my_lr <- lm(Employed~., data = longley)
sptidy::augment_lr(my_lr, longley[1:6], as.data.frame(longley$Employed))
#>      GNP.deflator     GNP Unemployed Armed.Forces Population Year
#> 1947         83.0 234.289      235.6        159.0    107.608 1947
#> 1948         88.5 259.426      232.5        145.6    108.632 1948
#> 1949         88.2 258.054      368.2        161.6    109.773 1949
#> 1950         89.5 284.599      335.1        165.0    110.929 1950
#> 1951         96.2 328.975      209.9        309.9    112.075 1951
#> 1952         98.1 346.999      193.2        359.4    113.270 1952
#> 1953         99.0 365.385      187.0        354.7    115.094 1953
#> 1954        100.0 363.112      357.8        335.0    116.219 1954
#> 1955        101.2 397.469      290.4        304.8    117.388 1955
#> 1956        104.6 419.180      282.2        285.7    118.734 1956
#> 1957        108.4 442.769      293.6        279.8    120.445 1957
#> 1958        110.8 444.546      468.1        263.7    121.950 1958
#> 1959        112.6 482.704      381.3        255.2    123.366 1959
#> 1960        114.2 502.601      393.1        251.4    125.368 1960
#> 1961        115.7 518.173      480.6        257.2    127.852 1961
#> 1962        116.9 554.894      400.7        282.7    130.081 1962
#>      longley$Employed predictions   residuals
#> 1947           60.323    60.05566  0.26734003
#> 1948           61.122    61.21601 -0.09401394
#> 1949           60.171    60.12471  0.04628717
#> 1950           61.187    61.59711 -0.41011462
#> 1951           63.221    62.91129  0.30971459
#> 1952           63.639    63.88831 -0.24931122
#> 1953           64.989    65.15305 -0.16404896
#> 1954           63.761    63.77418 -0.01318036
#> 1955           66.019    66.00470  0.01430477
#> 1956           67.857    67.40161  0.45539409
#> 1957           68.169    68.18627 -0.01726893
#> 1958           66.513    66.55206 -0.03905504
#> 1959           68.655    68.81055 -0.15554997
#> 1960           69.564    69.64967 -0.08567131
#> 1961           69.331    68.98907  0.34193151
#> 1962           70.551    70.75776 -0.20675783

tidy_kmeans(): clean summary output for kmeans()

sptidy::tidy_kmeans() provides a tidy data frame that summarizes a fitted kmeans clustering object kmenas(). The first argument is the fitted kmeans() object. The second argument refers to the data that was fitted to the kmeans() object. The output data frame has 3 columns, describing cluster number, cluster center and number of points within each cluster.

data <- iris[,1:3]
kclust <- kmeans(data, centers = 3)
tidy_kmeans(kclust, data)
#> # A tibble: 3 x 3
#>   cluster_number cluster_center n_points
#>            <int> <list>            <int>
#> 1              1 <dbl [3]>            96
#> 2              2 <dbl [3]>            32
#> 3              3 <dbl [3]>            22

augment_kmeans(): augment data frame from kmeans()

sptidy::augment_kmeans() augments the data frame with cluster assignment from a fitted kmeans clustering object kmeans(). The first argument is the fitted kmeans() object. The second argument refers to the data that was fitted to the kmeans() object. The output data frame has additional 1 column, describing cluster assignment with respect to each observation.

data <- iris[,1:3]
kclust <- kmeans(data, centers = 3)
augment_kmeans(kclust, data)
#>     Sepal.Length Sepal.Width Petal.Length cluster
#> 1            5.1         3.5          1.4       2
#> 2            4.9         3.0          1.4       2
#> 3            4.7         3.2          1.3       2
#> 4            4.6         3.1          1.5       2
#> 5            5.0         3.6          1.4       2
#> 6            5.4         3.9          1.7       2
#> 7            4.6         3.4          1.4       2
#> 8            5.0         3.4          1.5       2
#> 9            4.4         2.9          1.4       2
#> 10           4.9         3.1          1.5       2
#> 11           5.4         3.7          1.5       2
#> 12           4.8         3.4          1.6       2
#> 13           4.8         3.0          1.4       2
#> 14           4.3         3.0          1.1       2
#> 15           5.8         4.0          1.2       2
#> 16           5.7         4.4          1.5       2
#> 17           5.4         3.9          1.3       2
#> 18           5.1         3.5          1.4       2
#> 19           5.7         3.8          1.7       2
#> 20           5.1         3.8          1.5       2
#> 21           5.4         3.4          1.7       2
#> 22           5.1         3.7          1.5       2
#> 23           4.6         3.6          1.0       2
#> 24           5.1         3.3          1.7       2
#> 25           4.8         3.4          1.9       2
#> 26           5.0         3.0          1.6       2
#> 27           5.0         3.4          1.6       2
#> 28           5.2         3.5          1.5       2
#> 29           5.2         3.4          1.4       2
#> 30           4.7         3.2          1.6       2
#> 31           4.8         3.1          1.6       2
#> 32           5.4         3.4          1.5       2
#> 33           5.2         4.1          1.5       2
#> 34           5.5         4.2          1.4       2
#> 35           4.9         3.1          1.5       2
#> 36           5.0         3.2          1.2       2
#> 37           5.5         3.5          1.3       2
#> 38           4.9         3.6          1.4       2
#> 39           4.4         3.0          1.3       2
#> 40           5.1         3.4          1.5       2
#> 41           5.0         3.5          1.3       2
#> 42           4.5         2.3          1.3       2
#> 43           4.4         3.2          1.3       2
#> 44           5.0         3.5          1.6       2
#> 45           5.1         3.8          1.9       2
#> 46           4.8         3.0          1.4       2
#> 47           5.1         3.8          1.6       2
#> 48           4.6         3.2          1.4       2
#> 49           5.3         3.7          1.5       2
#> 50           5.0         3.3          1.4       2
#> 51           7.0         3.2          4.7       1
#> 52           6.4         3.2          4.5       3
#> 53           6.9         3.1          4.9       1
#> 54           5.5         2.3          4.0       3
#> 55           6.5         2.8          4.6       3
#> 56           5.7         2.8          4.5       3
#> 57           6.3         3.3          4.7       3
#> 58           4.9         2.4          3.3       3
#> 59           6.6         2.9          4.6       3
#> 60           5.2         2.7          3.9       3
#> 61           5.0         2.0          3.5       3
#> 62           5.9         3.0          4.2       3
#> 63           6.0         2.2          4.0       3
#> 64           6.1         2.9          4.7       3
#> 65           5.6         2.9          3.6       3
#> 66           6.7         3.1          4.4       3
#> 67           5.6         3.0          4.5       3
#> 68           5.8         2.7          4.1       3
#> 69           6.2         2.2          4.5       3
#> 70           5.6         2.5          3.9       3
#> 71           5.9         3.2          4.8       3
#> 72           6.1         2.8          4.0       3
#> 73           6.3         2.5          4.9       3
#> 74           6.1         2.8          4.7       3
#> 75           6.4         2.9          4.3       3
#> 76           6.6         3.0          4.4       3
#> 77           6.8         2.8          4.8       1
#> 78           6.7         3.0          5.0       1
#> 79           6.0         2.9          4.5       3
#> 80           5.7         2.6          3.5       3
#> 81           5.5         2.4          3.8       3
#> 82           5.5         2.4          3.7       3
#> 83           5.8         2.7          3.9       3
#> 84           6.0         2.7          5.1       3
#> 85           5.4         3.0          4.5       3
#> 86           6.0         3.4          4.5       3
#> 87           6.7         3.1          4.7       1
#> 88           6.3         2.3          4.4       3
#> 89           5.6         3.0          4.1       3
#> 90           5.5         2.5          4.0       3
#> 91           5.5         2.6          4.4       3
#> 92           6.1         3.0          4.6       3
#> 93           5.8         2.6          4.0       3
#> 94           5.0         2.3          3.3       3
#> 95           5.6         2.7          4.2       3
#> 96           5.7         3.0          4.2       3
#> 97           5.7         2.9          4.2       3
#> 98           6.2         2.9          4.3       3
#> 99           5.1         2.5          3.0       3
#> 100          5.7         2.8          4.1       3
#> 101          6.3         3.3          6.0       1
#> 102          5.8         2.7          5.1       3
#> 103          7.1         3.0          5.9       1
#> 104          6.3         2.9          5.6       1
#> 105          6.5         3.0          5.8       1
#> 106          7.6         3.0          6.6       1
#> 107          4.9         2.5          4.5       3
#> 108          7.3         2.9          6.3       1
#> 109          6.7         2.5          5.8       1
#> 110          7.2         3.6          6.1       1
#> 111          6.5         3.2          5.1       1
#> 112          6.4         2.7          5.3       1
#> 113          6.8         3.0          5.5       1
#> 114          5.7         2.5          5.0       3
#> 115          5.8         2.8          5.1       3
#> 116          6.4         3.2          5.3       1
#> 117          6.5         3.0          5.5       1
#> 118          7.7         3.8          6.7       1
#> 119          7.7         2.6          6.9       1
#> 120          6.0         2.2          5.0       3
#> 121          6.9         3.2          5.7       1
#> 122          5.6         2.8          4.9       3
#> 123          7.7         2.8          6.7       1
#> 124          6.3         2.7          4.9       3
#> 125          6.7         3.3          5.7       1
#> 126          7.2         3.2          6.0       1
#> 127          6.2         2.8          4.8       3
#> 128          6.1         3.0          4.9       3
#> 129          6.4         2.8          5.6       1
#> 130          7.2         3.0          5.8       1
#> 131          7.4         2.8          6.1       1
#> 132          7.9         3.8          6.4       1
#> 133          6.4         2.8          5.6       1
#> 134          6.3         2.8          5.1       1
#> 135          6.1         2.6          5.6       1
#> 136          7.7         3.0          6.1       1
#> 137          6.3         3.4          5.6       1
#> 138          6.4         3.1          5.5       1
#> 139          6.0         3.0          4.8       3
#> 140          6.9         3.1          5.4       1
#> 141          6.7         3.1          5.6       1
#> 142          6.9         3.1          5.1       1
#> 143          5.8         2.7          5.1       3
#> 144          6.8         3.2          5.9       1
#> 145          6.7         3.3          5.7       1
#> 146          6.7         3.0          5.2       1
#> 147          6.3         2.5          5.0       3
#> 148          6.5         3.0          5.2       1
#> 149          6.2         3.4          5.4       1
#> 150          5.9         3.0          5.1       3