Build the feature matrix from various sources according to a specification object
Source:R/build.R
build.Rd
The build()
function allows to build a machine learning data set from a
specification object as provided
by adam_spec()
(with or without data already attached).
Usage
build(spec, join = dplyr::inner_join, rm = FALSE)
Arguments
- spec
a specification object as provided by
adam_spec()
(eitherspec
orpath
has to be provided)- join
either function to join data sets (e.g.
dplyr::full_join()
or a character (vector) giving the names of the data sets containing the .ids to keep (e.g.join = c('adxb', 'adlb')
). defaults todplyr::inner_join
- rm
boolean. defaults to FALSE. if TRUE, a repeated measurement feature matrix with an additional
.rmtime
column is prepared. (experimental.)
Value
build()
returns a wide data set with one row per subject and standardized column names for the subject id (.id
)
and the treatment variable (.trt
), if it is provided in the spec
object. Objects with additional information on
the data are provided in the attributes of the returned object.
dict
-
param
original parameter name in the source data
column
column name of the variable in the returned data.
column
is derived fromparam
by transforming it into a valid file name and possibly adding a time extension, if multiple time points are considered for a particular parameter.label
parameter label
source
source id provided by the specification object. If created with
adam_spec()
, this is the name of the domain.type
ADaM data type of the source data (adsl, bds or occds)
unit
parameter unit (if applicable)
time
measurement time point (if applicable)
spec_id
name of the corresponding spec entry (if applicable)
source
file path and md5 checksums of the source data sets
Details
Missing values in variables from occurrence data sets are interpreted as 'absence of event', whereas NAs in adsl and bds data are considered to be true missing values. For missing values in occds data after joining with other data sets, missing values are replace by 0 for numerics, an additional level 'none' is introduced for for factors.
See also
build_adsl()
, build_bds()
,
build_occds()