Using preprocess

preprocess can be used to read data in different formats such as txt, json, csv and return the data as a data frame. To use preprocess in a project:

library(EDAhelperR)
library(knitr)

Read csv data from buffer

file_path = "https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv"
df = preprocess(file_path)
kable(df[3:6, 3:6])

Pclass	Name	Sex	Age
3	Heikkinen, Miss. Laina	female	26
1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35
3	Allen, Mr. William Henry	male	35
3	Moran, Mr. James	male	NA

Read local data

file_path = readr::readr_example("mtcars.csv")
df = preprocess(file_path)
kable(df[3:6, 3:6])

disp	hp	drat	wt
108	93	3.85	2.320
258	110	3.08	3.215
360	175	3.15	3.440
225	105	2.76	3.460

Read data with different methods to dealing with missing values

file_path = readr::readr_example("mtcars.csv")
df = preprocess(file_path, method = "mean")
kable(head(df))

mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

Read data with extra `readr` settings

file_path = readr::readr_example("mtcars.csv")
df = preprocess(file_path, method = "mean", skip = 6, col_names = colnames(df))
kable(head(df))

mpg	cyl	disp	hp	drat	wt	qsec	vs	gear	carb
18.1	6	225.0	105	2.76	3.46	20.22	1	3	1
14.3	8	360.0	245	3.21	3.57	15.84	0	3	4
24.4	4	146.7	62	3.69	3.19	20.00	1	4	2
22.8	4	140.8	95	3.92	3.15	22.90	1	4	2
19.2	6	167.6	123	3.92	3.44	18.30	1	4	4
17.8	6	167.6	123	3.92	3.44	18.90	1	4	4

Read csv data from buffer

Read local data

Read data with different methods to dealing with missing values

Read data with extra readr settings

Read data with extra `readr` settings