Identify variable sets from input matrix that might require extra steps in data preparation, e.g. skewed variables to be log transformed, counts
Usage
prepare_ml_vars(
data,
thres_count = NULL,
thres_log = NULL,
thres_lump = NULL,
remove = c(".id", ".out", ".status", ".time")
)
Arguments
- data
the data set to be searched for feature sets with specific characteristics relevant for further data preparation
- thres_count
used to detect integer columns with up to
thres_count
distinct values (might be excluded from further processing, e.g. log & normalization)- thres_log
threshold for log transformation
- thres_lump
proportion threshold for factor lumping; used to detect factors with exactly one level having a relative frequency below
thres_lump
- remove
columns to be excluded from all identified sets; defaults to c(".id", ".out", ".status", ".time")