Cut outliers based on minimum and maximum limits of ConnectionHours and ConnectionStartDateTime variables
Source:R/preprocessing.R
cut_sessions.Rd
Cut outliers based on minimum and maximum limits of ConnectionHours and ConnectionStartDateTime variables
Usage
cut_sessions(
sessions,
connection_hours_min = NA,
connection_hours_max = NA,
connection_start_min = NA,
connection_start_max = NA,
log = FALSE,
start = getOption("evprof.start.hour")
)
Arguments
- sessions
tibble, sessions data set in evprof standard format.
- connection_hours_min
numeric, minimum of connection hours (duration). If NA the minimum value is considered.
- connection_hours_max
numeric, maximum of connection hours (duration). If NA the maximum value is considered.
- connection_start_min
numeric, minimum hour of connection start (hour as numeric). If NA the minimum value is considered.
- connection_start_max
numeric, maximum hour of connection start (hour as numeric). If NA the maximum value is considered.
- log
logical, whether to transform
ConnectionStartDateTime
andConnectionHours
variables to natural logarithmic scale (base =exp(1)
).- start
integer, start hour in the x axis of the plot.
Examples
library(dplyr)
# Localize the outlying sessions above a certain threshold
california_ev_sessions %>%
sample_frac(0.05) %>%
plot_points(start = 3)
# For example sessions that start before 5 AM or that are
# longer than 20 hours are considered outliers
sessions_clean <- california_ev_sessions %>%
sample_frac(0.05) %>%
cut_sessions(
start = 3,
connection_hours_max = 20,
connection_start_min = 5
)
plot_points(sessions_clean, start = 3)