Changelog
Source:NEWS.md
martini 0.7.0
breaking changes
default for id column changed to USUBJID according to ADaM implementation guide
build(join = )default changed fromdplyr::inner_join()todplyr::full_join(), which is equivalent to “adsl” (if included) according to CDISC standardsdefault for correlation method for
prepare_ml()sprep_step_corris nowspearmaninstead ofpearsonin consistency with our major use case with random forests. Can be controlled viaprepare_ml(corr_method).-
the prepare_ml() output object structure has changed
- similar to the data that is provided in its raw form as well as prepped and ready to use for ML, the recipe is now provided in raw and prepped version as well to allow for proper tuning workflows
- removedcorr_keep
- recipe log step now called log_skewness. Info on log transformed values to be extracted from new output slot
-
the default
recipeitself has been modified, making use of new custom steps that are adaptations of existingrecipesfunctions-
step_corr_keep()offers the functionality ofrecipes::step_corr()with the option to nominate preferred representatives of correlated groups. -
step_log_skewness()to transform variables exceeding a threshold for skewness before imputation by means ofstep_impute_knn()for a more robust imputation. Optionally, variables can be transformed back to their original scale right after imputation usingstep_log_skewness_undo(). -
step_other2()is pretty much identical withrecipes::step_other()except its behavior in case of a single class that does not meet the required minimum size as defined bythreshold: while the original version renames the class and silently adds novel levels to this class, our version would leave the single class unmodified and raise an error in case a new data set comes with novel levels for that variable. This change was handled manually before and does not change results.
-
argument
prep_recipeis deprecated in favor ofcustom_recipefor consistencyexample data sets in the package are updated according to new output structure described above
vars_countdeprecated in favor ofvars_no_trafo: count variables are no longer guessed and automatically excluded from transformation, but need to be specified by the user.vars_ordinalscoredeprecated.
major changes
the output of prepare_ml() now comes with its own class
martini_mland corresponding print methodthe feature matrix check functionality has been extended. [check_freq()] is now part of a new function [check_feature()] that can e.g. also check for outliers in numeric variables. It is run by default in [prepare_ml()], but recommended to run prior to calling [prepare_ml()].
minor changes
- [adam_spec()]’s option to include data sets in ADaM format that are not part of the limited internal library has been extended to data sets following
occdsdata structure:adam_spec(add_occds = ..., add_bds = ...).
documentation
- added examples e.g. to
adam_spec(),build(), andprepare_ml()
martini 0.6.4
major changes
to accommodate for handling of adsl data sets in ADaM 2.0 format, the structure of
fct_levelsentry was changed to name value pairsadded function
adjust_spec_filter(), removingappendargument fromadjust_spec()adjust_spec()now provides extensive checks of user defined modifications to aspecobject. Part of them relies on the data being available, which is why we strongly recommend to attach data inadam_spec(), if possible, which is why the default forattach_datainadam_spec()is nowTRUE
martini 0.6.3
- preparations for bay-open: added code of conduct, updated license and added code owner and contributing information
martini 0.6.1
- updated output of correlation handling (
removals$colsstructure and addition of alternative labels in dictionary)
martini 0.6.0
in rare cases, samples could end up being removed from training data set due to incomplete imputation from
recipes::step_impute_knn(). For this edge case, additional imputation steps were added to ensure full data set usage for training (recipes::step_impute_median()for numerics andrecipes::step_impute_mode()for factors.)-
update example objects, since
prepare_ml()output object- has new entry high_corr for more transparency on feature dropping for correlation (
recipes::step_corr()) - no longer contains redundant slot for split object
- has new entry high_corr for more transparency on feature dropping for correlation (
variables that are defined in
prepare_ml()’svars_keep_corrbut not present in data set are ignored (previously an error was thrown)improved handling of independent data sets for prediction
improved test suite
martini 0.5.1
- Added experimental parameter
rminbuild()to allow for the preparation of a wide data set suitable for repeated measurement outcomes (one row is a subject at a specific time point)
martini 0.5.0
- fixed package data sets. Update of {recipes} package introduced NAs in prepared ML data, which were now removed.
- some refactoring
- speed improvements in prepare_ml() (affects imputation seed)
- added some tests
martini 0.4.3
- update of example data sets
- minor bug fixes related to incomplete bds data sets (e.g. missing units)
martini 0.4.2
- improved messaging (wip)
- updated docu
- switched to testthat 3e and updated tests
- some refactoring
martini 0.4.0
- `build_bds()´ converts values character values (e.g. AVALC) to either numerics or factors, based on observed values per parameter (e.g. PARAMCD)
martini 0.3.4
- Added a
NEWS.mdfile to track changes to the package. - Dictionary (if available) is updated if variables are added/dropped with
adjust_adsl(). - If data is attached to the spec object, usage of
adjust_adsl()andadjust_spec()will update the data_info and filter check attributes shown by the print method