Create specification object for ADaM data sets of type occds
Source: R/adam_spec_occds.R
adam_spec_occds.RdGiven a file containing a occds data set (e.g. admh or adcm),
adam_spec_occds()
will create a specification object for use in build_occds()
to prepare the data
to be used in machine learning. The main task is to collect the key columns
for reshaping the
data into wide format and prepare the data filter.
Usage
adam_spec_occds(
file = NULL,
data = NULL,
id = "USUBJID",
label = NULL,
value = NULL,
valuen = NULL,
filter = NULL,
count = TRUE,
attach_data = FALSE
)Arguments
- file
the path of the sas(7bdat) or rds file to process, ignored if
datais provided- data
tibble with the data in occds format for which the specification is created
- id
name of id column to be kept and used for merge of data sets
- label
name of the column that identifies the occurrence labels. Defaults to NULL, will be guessed if not set (see Details).
- value
optional value column (e.g. AE severity). Defaults to
NULL, which leads to an Y/N coding of the event.- valuen
optional numeric coding column for
value. Defaults toNULL, ignored ifvalueisNULL.- filter
character vector of filters to be applied to the bds data set. Individual filters will only be considered if the resulting data set has positive number of rows. Defaults to
NULL.- count
boolean, defaults to
FALSE.- attach_data
boolean. attach the imported raw data in
dataslot of output object
Value
A list containing the following
file,md5the name and md5 checksum, resp., of the file the generated spec is based upon
datathe raw data set if
attach_data,NULLotherwisedata_infoa list containing the number of subjects
nsubjand columnsncolin the data after applyingfiltertypecharacter string
occds, generally giving the type of ADaM data set processed (adsl/bds/occds)filtersubset of
filterthat yields valid and non-empty result when applied individually (usingcheck_filter())idpassing unchanged input
label,value,valuennames of the key columns to be used in
build_occds()for reshapingspec_idcharacter string, generally the name of the domain
dicta tibble with unique combinations within the
paramandlabelcolumn (if present in the data set) to be used as a data dictionary
Details
For file names 'adae.sas7bdat', 'adcm.sas7bdat' and 'admh.sas7bdat',
values for
arguments label will be guessed if not provided.
Please refer to adam_guess() for details on guessing procedure.
Function will exit if label is neither provided nor can be guessed.
Note that the original values in the label column will end up being
the parameter labels,
not the parameters in the ML feature matrix. These might be modified later
using make.names() or the like in prepare_ml().