covizr-vignette.Rmd
The covizr package provides easy access to Covid-19 data from Our World in Data, as well as functions to generate relevant Covid-19 charts and summaries easily. We aim to make covizr simple and easy to use. Our goal is to enable anyone with basic R programming knowledge to access and visualize Covid-19 data, and make their own informed decisions and conclusions.
We aim to provide simple visualization functions that allow users to answer questions regarding the Covid-19 pandemic as quickly as possible.
This document introduces you to to the basics of using covizr, and shows you how to apply the function.
To use the function, first install the package according to the steps in README, and import the the library with the following code:
library(covizr)
To explore the usage of covizr, you do not need extra data. Run the following function as the beginning of the code:
df <- get_data()
df
#> # A tibble: 1,505 × 67
#> iso_code continent location date total_cases new_cases new_cases_smoot…
#> <chr> <chr> <chr> <date> <dbl> <dbl> <dbl>
#> 1 AFG Asia Afghani… 2022-01-29 161290 233 249.
#> 2 AFG Asia Afghani… 2022-01-30 162111 821 352.
#> 3 AFG Asia Afghani… 2022-01-31 162926 815 433.
#> 4 AFG Asia Afghani… 2022-02-01 163555 629 472.
#> 5 AFG Asia Afghani… 2022-02-02 164190 635 500.
#> 6 AFG Asia Afghani… 2022-02-03 164727 537 532.
#> 7 AFG Asia Afghani… 2022-02-04 165358 631 614.
#> 8 ALB Europe Albania 2022-01-29 254126 0 1102
#> 9 ALB Europe Albania 2022-01-30 255741 1615 1096.
#> 10 ALB Europe Albania 2022-01-31 258543 2802 1496.
#> # … with 1,495 more rows, and 60 more variables: total_deaths <dbl>,
#> # new_deaths <dbl>, new_deaths_smoothed <dbl>, total_cases_per_million <dbl>,
#> # new_cases_per_million <dbl>, new_cases_smoothed_per_million <dbl>,
#> # total_deaths_per_million <dbl>, new_deaths_per_million <dbl>,
#> # new_deaths_smoothed_per_million <dbl>, reproduction_rate <dbl>,
#> # icu_patients <dbl>, icu_patients_per_million <dbl>, hosp_patients <dbl>,
#> # hosp_patients_per_million <dbl>, weekly_icu_admissions <dbl>, …
By default the function returns the last 7 days of data around the world. There are a few options you can parse into the function for more specific data selection:
For example, to retrieve the data between ‘2021-05-01’ to ‘2021-07-01’ of Canada and United Kingdom:
loc <- c('Canada', 'United Kingdom')
df <- get_data(date_from = "2021-05-01", date_to = "2021-07-01", location = loc)
df
#> # A tibble: 124 × 67
#> iso_code continent location date total_cases new_cases new_cases_smoot…
#> <chr> <chr> <chr> <date> <dbl> <dbl> <dbl>
#> 1 CAN North Am… Canada 2021-05-01 1227807 7429 7881.
#> 2 CAN North Am… Canada 2021-05-02 1234733 6926 7891.
#> 3 CAN North Am… Canada 2021-05-03 1243845 9112 7905.
#> 4 CAN North Am… Canada 2021-05-04 1250657 6812 7842.
#> 5 CAN North Am… Canada 2021-05-05 1258014 7357 7790.
#> 6 CAN North Am… Canada 2021-05-06 1266137 8123 7747.
#> 7 CAN North Am… Canada 2021-05-07 1274073 7936 7671.
#> 8 CAN North Am… Canada 2021-05-08 1280731 6658 7561.
#> 9 CAN North Am… Canada 2021-05-09 1287175 6444 7492.
#> 10 CAN North Am… Canada 2021-05-10 1294712 7537 7267.
#> # … with 114 more rows, and 60 more variables: total_deaths <dbl>,
#> # new_deaths <dbl>, new_deaths_smoothed <dbl>, total_cases_per_million <dbl>,
#> # new_cases_per_million <dbl>, new_cases_smoothed_per_million <dbl>,
#> # total_deaths_per_million <dbl>, new_deaths_per_million <dbl>,
#> # new_deaths_smoothed_per_million <dbl>, reproduction_rate <dbl>,
#> # icu_patients <dbl>, icu_patients_per_million <dbl>, hosp_patients <dbl>,
#> # hosp_patients_per_million <dbl>, weekly_icu_admissions <dbl>, …
plot_summary
Use plot_summary
when you want to find out what is the top variables (var
) from a particular aggregated value (val)
or metrics. As with all plotting function in this package, you need to call get_data()
first and pass the data frame into the plotting function.
For example, we can see which are the top 10 countries with highest new Covid-19 cases in the first 14 days of 2022 using the following code:
plot_df <- get_data(date_from = "2022-01-01", date_to = "2022-02-01")
plot_summary(plot_df, var = "location", val = "new_cases", date_from = "2022-01-01", date_to = "2022-01-14", top_n = 10)
var
: Use a categorical variable like location
, continent
val
: Use a numeric variable like new_cases
, new_vaccinations
, icu_patients
By default, we use sum
to aggregate your value of interest. You can use other aggregation function like mean
as well, just specify the function name as a string in the argument (e.g. fun = "mean"
).
plot_spec
We could also get information from specific countries that interest us. By default, this plot_spec function will plot the last 7 days’ new cases trend for Canada. There are a few options you can parse into the function:
For example, this plot_spec function below draws the trend of new death cases in Canada and United Kingdom over the first two weeks of January 2022.
df <- get_data(date_from="2022-01-01", date_to="2022-01-15", location = loc)
plot_spec(df, location = c('Canada', 'United Kingdom'), val="new_deaths", date_from="2022-01-01",
date_to="2022-01-15", title="Death cases in Canada and United Kingdom")
plot_metric
After looking at the trend for COVID related cases for specific countries, we can dive deeper into a particular count and visualize the trend of COVID cases with another metric. Some examples of metric which can be used are positive_rate, total_vaccinations, total_deaths, etc. The list of arguments which can be used are provided below:
The plot_metric function below charts trend of daily new COVID-19 cases versus the positive rates in Canada for the first two weeks of January 2022.
df <- get_data(date_from="2022-01-01", date_to="2022-01-15", location = c("Canada"))
plot_metric(df, loc_val = c("Canada"), metric = "positive_rate", date_from = "2022-01-01", "2022-01-15")