family of helper functions to identify columns to drop from adsl data set
Usage
adsl_identify(
data,
dict = NULL,
type = c("dttm", "constant", "combined", "flag", "factor", "redundant"),
dict_label = "label",
dict_param = "param",
id = "SUBJID",
trt = "TRT01A",
black_list = c("RANDNO", "SITEID", "SITENAM", "INVID", "INVNAM")
)
adsl_identify_dttm(data)
adsl_identify_constant(data)
adsl_identify_combined(
data,
dict = NULL,
dict_label = "label",
dict_param = "param"
)
adsl_identify_redundant(data, id, trt, clmn_flag)
adsl_identify_flag(data, dict, dict_param = "param", dict_label = "label")
adsl_identify_factor(
data,
id,
clmn_flag = NULL,
dict,
dict_param = "param",
dict_label = "label"
)
adsl_identify_factor_data(data)Arguments
- data
adsl-like data set in which to identify particular columns of interest
- dict, dict_param, dict_label
dict is
tibbleas created byadsl_dict()wheredict_paramanddict_labelindicate the columns indictcontaining for parameter names (column names ofdata) and labels, resp.- type
character vector determining the categories of column types to identify. defaults all possible categories:
dttm,constant,combined,flag,factor,redundant- id, trt
user-selected column names in
datafor ID and treatment column, defaulting toUSUBJIDandTRT01A, resp.- black_list
character vector of columns that should be dropped for most analyses, see details.
- clmn_flag
(factor and redundants only) character vector of names identified as flags
Value
list with two top level entries, where to_remove is a list of
column names from data that were identified as candidates for a given category
and lev_list a list required to set factor level orders.
Details
Columns meeting the following criteria are returned
adsl_identify_dttm(): methods::is(.x, "Date") is TRUE,
the label contains strings 'year', 'month', 'day', 'date' or 'time'
(not case sensitive), class is one of 'difftime',
'hms', 'Period', 'POSIXct', 'POSIXt', 'Date'
adsl_identify_constant(): identification via
janitor::remove_empty(which = 'cols'), janitor::remove_constant(na.rm = TRUE)
adsl_identify_redundant(): redundant columns to selected trt and id columns
adsl_identify_combined(): if labels (from dict) contain '/'
and all parts are column names themselves
By default, black_list contains RANDNO,
SITEID, SITENAM, INVID, INVNAM.