MDD in Regulatory Context

library(drcHelper)
#> Loading required package: drc
#> Loading required package: MASS
#> Loading required package: drcData
#> 
#> 'drc' has been loaded.
#> Please cite R and 'drc' if used for a publication,
#> for references type 'citation()' and 'citation('drc')'.
#> 
#> Attaching package: 'drc'
#> The following objects are masked from 'package:stats':
#> 
#>     gaussian, getInitial
source("../knitr-setup.R")

In the regulatory context, MDD% stands for the minimum percent change (relative to the control mean) that would be deemed statistically significant given the test’s critical value, the residual variability, and the replication. There are two common ways to define it; the often used way in ecotoxicology is ofent the significance-threshold version (no power term).

Key definitions

Absolute minimum detectable difference (MDD, on the response scale):
- For a control vs. dose comparison with pooled residual variance MSE and per-group sample sizes n_c (control) and n_t (treatment), the standard error of a difference is:
  - \[SE_{diff} = \sqrt{MSE \cdot \left(\frac{1}{n_c} + \frac{1}{n_t}\right)}\]
- Given a one-sided critical value (e.g., Williams’ Tcrit), the absolute MDD is:
  - \[MDD = T_{crit} \cdot SE_{diff}\]
Percent MDD relative to control mean:
- If the control mean is μ_c, then
  - \[MDD\% = 100 \cdot \frac{MDD}{\mu_c} = 100 \cdot \frac{T_{crit} \cdot SE_{diff}}{\mu_c}\]

Notes

Significance-threshold vs. power-based:
- The above uses only Tcrit (significance threshold at alpha). Some literature calls this MSD (Minimum Significant Difference).
- A power-based MDD (to achieve power 1−β) adds a second quantile:
  - \[MDD_{power} = \left(t_{1-\alpha,\ df} + t_{1-\beta,\ df}\right) \cdot SE_{diff}\]
  - and \[MDD\%_{power} = 100 \cdot \frac{MDD_{power}}{\mu_c}\]
- Your Williams expected includes “Tcrit” per dose; this aligns with the significance-threshold version (no power term).
Which Tcrit to use:
- For Williams: use the Tcrit reported per dose/step from broom_williams(method = “Williams_PMCMRplus”), which you have available.
- For Dunnett: a true Dunnett critical value depends on the number of groups (multivariate t). If you don’t have a reported Tcrit, computing an exact Dunnett Tcrit is more involved; for Williams, you already have it, so we can compute MDD% reliably.

What the function does

Accepts a formula x of the form Response ~ Dose (Dose can be numeric or factor).
Ensures the control (reference) group is the first factor level (by default the lowest dose).
Computes the per-contrast standard error SE_diff using the appropriate method:
- Williams/Dunnett: pooled ANOVA MSE (or directly from glht sigma for Dunnett).
- Student’s t: pooled-variance SE.
- Welch: group-specific-variance SE and Welch–Satterthwaite df.
Determines the critical value Tcrit:
- Williams: from broom_williams t’-crit per contrast.
- Dunnett: from multcomp summary.glht test$qfunction at your alpha.
- Student’s t/Welch: qt with per-contrast df.
Computes MDD = Tcrit × SE_diff and MDD% = 100 × MDD / control_mean.
Returns a tidy tibble with dose, n_c, n_t, df (when applicable), SE_diff, Tcrit, MDD, MDD_pct, method, alpha, alternative, and control_mean.

Code: compute_mdd_generic

compute_mdd_generic <- function(x, data,
                                test = c("Williams_PMCMRplus", "Williams_JG", "Dunnett", "t", "Welch"),
                                alpha = 0.05,
                                alternative = c("smaller", "greater", "two-sided")) {
  test <- match.arg(test)
  alternative <- match.arg(alternative)

  # Parse formula
  if (!inherits(x, "formula")) stop("x must be a formula, e.g., Response ~ Dose")
  terms_obj <- terms(x)
  resp_name <- rownames(attr(terms_obj, "factors"))[1]
  factor_name <- attr(terms_obj, "term.labels")[1]

  df <- data
  if (!is.data.frame(df)) stop("data must be a data.frame")

  # Prepare dose factor with control first (lowest dose)
  dose_raw <- df[[factor_name]]
  dose_num <- suppressWarnings(as.numeric(gsub(",", ".", as.character(dose_raw))))
  # If numeric conversion fails everywhere, treat as factor and try to sort levels by numeric content
  if (all(is.na(dose_num))) {
    # Attempt to parse numeric part from strings
    dose_num <- suppressWarnings(as.numeric(gsub(",", ".", gsub("[^0-9.,-]", "", as.character(dose_raw)))))
  }
  # Build factor levels: sort unique numeric doses; fallback to existing factor levels
  if (!all(is.na(dose_num))) {
    levels_sorted <- sort(unique(dose_num))
    df$Dose_factor <- factor(dose_num, levels = levels_sorted)
  } else {
    df$Dose_factor <- if (is.factor(dose_raw)) factor(dose_raw) else factor(dose_raw)
    warning("Dose values not numeric; using existing factor levels. Ensure control is the first level.")
  }

  # Basic stats per group
  resp <- df[[resp_name]]
  ctrl_level <- levels(df$Dose_factor)[1]
  ctrl_idx <- df$Dose_factor == ctrl_level
  mu_c <- mean(resp[ctrl_idx], na.rm = TRUE)
  n_c <- sum(ctrl_idx)

  # One-way ANOVA for pooled MSE (used by Williams/Dunnett/t pooled)
  aov_fit <- stats::aov(resp ~ df$Dose_factor)
  mse <- tryCatch(summary(aov_fit)[[1]]["Residuals", "Mean Sq"], error = function(e) NA_real_)
  df_resid <- tryCatch(summary(aov_fit)[[1]]["Residuals", "Df"], error = function(e) NA_real_)

  # Per-dose counts and SDs
  group_n <- aggregate(resp ~ df$Dose_factor, FUN = function(z) sum(!is.na(z)))
  group_sd <- aggregate(resp ~ df$Dose_factor, FUN = stats::sd)
  names(group_n) <- c("Dose_factor", "n_t")
  names(group_sd) <- c("Dose_factor", "sd_t")

  # Helper to compute one-/two-sided quantile tail
  qtail <- function(df, alt, alpha) {
    if (alt == "two-sided") stats::qt(1 - alpha / 2, df = df) else stats::qt(1 - alpha, df = df)
  }

  # Initialize result container
  res_list <- list()

  if (test %in% c("Williams_PMCMRplus", "Williams_JG")) {
    # Direction for Williams
    direction <- if (alternative == "smaller") "decreasing" else "increasing"
    bw <- try(drcHelper::broom_williams(x, data = df, method = if (test == "Williams_PMCMRplus") "Williams_PMCMRplus" else "Williams_JG", direction = direction), silent = TRUE)
    if (inherits(bw, "try-error") || nrow(bw) == 0) stop("broom_williams failed to produce results.")

    bw <- as.data.frame(bw)
    # Map Dose from comparison ("a - 0 <= 0" or "a - 0 >= 0")
    comp_clean <- gsub("\\s*(<=|>=)\\s*0\\s*$", "", as.character(bw$comparison))
    dose_contrast <- suppressWarnings(as.numeric(gsub(",", ".", sub(" - 0$", "", comp_clean))))
    if (all(is.na(dose_contrast)) && test == "Williams_JG") {
      # Fallback: assign ascending treatment doses excluding control
      trt_levels <- levels(df$Dose_factor)[-1]
      dose_contrast <- suppressWarnings(as.numeric(trt_levels))
      dose_contrast <- dose_contrast[seq_len(min(length(dose_contrast), nrow(bw)))]
      # If lengths mismatch, pad with NA
      if (length(dose_contrast) < nrow(bw)) {
        dose_contrast <- c(dose_contrast, rep(NA_real_, nrow(bw) - length(dose_contrast)))
      }
    }
    names(bw)[names(bw) == "`t'-crit"] <- "Tcrit"
    # Merge per-dose counts
    trt_n <- merge(data.frame(Dose_factor = factor(dose_contrast, levels = levels(df$Dose_factor))),
                   group_n, by = "Dose_factor", all.x = TRUE)
    n_t <- trt_n$n_t
    # SE_diff via pooled MSE
    if (is.na(mse)) stop("Cannot compute pooled MSE from ANOVA; MDD cannot be computed.")
    SE_diff <- sqrt(mse * (1 / n_c + 1 / n_t))  # $$SE_{diff} = \sqrt{MSE \cdot (1/n_c + 1/n_t)}$$
    MDD <- bw$Tcrit * SE_diff                   # $$MDD = T_{crit} \cdot SE_{diff}$$
    MDD_pct <- if (!is.na(mu_c) && mu_c != 0) 100 * MDD / mu_c else NA_real_  # $$MDD\% = 100 \cdot MDD / \mu_c$$
    res_list <- list(
      dose = dose_contrast,
      n_c = rep(n_c, length(n_t)),
      n_t = n_t,
      df = rep(df_resid, length(n_t)),
      SE_diff = SE_diff,
      Tcrit = bw$Tcrit,
      MDD = MDD,
      MDD_pct = MDD_pct,
      method = test
    )
  } else if (test == "Dunnett") {
    # Need multcomp
    if (!requireNamespace("multcomp", quietly = TRUE)) stop("Package 'multcomp' is required for Dunnett MDD.")
    # Build Dunnett contrasts and glht
    contrasts_call <- as.call(list(as.name("mcp")))
    contrasts_call[[2]] <- "Dunnett"
    names(contrasts_call)[2] <- factor_name
    dunnett_contrasts <- eval(contrasts_call)
    # Construct aov from formula x
    aov_model <- stats::aov(x, data = df)
    gl <- multcomp::glht(aov_model, linfct = dunnett_contrasts)
    sm <- summary(gl)
    # Dose from rownames of coefficients like "0.0448 - 0"
    comp <- names(sm$test$coefficients)
    dose_contrast <- suppressWarnings(as.numeric(gsub(",", ".", sub(" - 0$", "", comp))))
    # Standard errors per contrast come from sigma in summary (vector)
    SE_diff <- as.numeric(sm$test$sigma)
    # Critical value from qfunction
    qfun <- sm$test$qfunction
    Tcrit <- tryCatch(as.numeric(qfun(conf.level = 1 - alpha, adjusted = TRUE)), error = function(e) NA_real_)
    if (is.na(Tcrit)) {
      warning("Could not obtain Dunnett adjusted critical value; using unadjusted qt with residual df.")
      Tcrit <- qtail(df_resid, alternative, alpha)
    }
    MDD <- Tcrit * SE_diff
    MDD_pct <- if (!is.na(mu_c) && mu_c != 0) 100 * MDD / mu_c else NA_real_
    # n_t per contrast
    trt_n <- merge(data.frame(Dose_factor = factor(dose_contrast, levels = levels(df$Dose_factor))),
                   group_n, by = "Dose_factor", all.x = TRUE)
    n_t <- trt_n$n_t
    res_list <- list(
      dose = dose_contrast,
      n_c = rep(n_c, length(n_t)),
      n_t = n_t,
      df = rep(df_resid, length(SE_diff)),
      SE_diff = SE_diff,
      Tcrit = rep(Tcrit, length(SE_diff)),
      MDD = MDD,
      MDD_pct = MDD_pct,
      method = "Dunnett_multcomp"
    )
  } else if (test == "t") {
    # Pooled-variance per contrast (control vs each treatment)
    # Compute control SD
    sd_c <- stats::sd(resp[ctrl_idx])
    trt_levels <- levels(df$Dose_factor)[-1]
    out_rows <- lapply(trt_levels, function(lv) {
      trt_idx <- df$Dose_factor == lv
      n_t <- sum(trt_idx)
      sd_t <- stats::sd(resp[trt_idx])
      # pooled variance
      sp2 <- ((n_c - 1) * sd_c^2 + (n_t - 1) * sd_t^2) / (n_c + n_t - 2)  # $$s_p^2 = ((n_c-1)s_c^2 + (n_t-1)s_t^2) / (n_c+n_t-2)$$
      SE_diff <- sqrt(sp2 * (1 / n_c + 1 / n_t))                           # $$SE_{diff} = \sqrt{s_p^2 (1/n_c + 1/n_t)}$$
      df_ct <- n_c + n_t - 2
      Tcrit <- qtail(df_ct, alternative, alpha)
      MDD <- Tcrit * SE_diff
      MDD_pct <- if (!is.na(mu_c) && mu_c != 0) 100 * MDD / mu_c else NA_real_
      data.frame(dose = suppressWarnings(as.numeric(lv)), n_c = n_c, n_t = n_t,
                 df = df_ct, SE_diff = SE_diff, Tcrit = Tcrit, MDD = MDD, MDD_pct = MDD_pct,
                 method = "Student_t")
    })
    res_df <- do.call(rbind, out_rows)
    return(res_df)
  } else if (test == "Welch") {
    # Welch SE and df per contrast
    sd_c <- stats::sd(resp[ctrl_idx])
    trt_levels <- levels(df$Dose_factor)[-1]
    out_rows <- lapply(trt_levels, function(lv) {
      trt_idx <- df$Dose_factor == lv
      n_t <- sum(trt_idx)
      sd_t <- stats::sd(resp[trt_idx])
      SE_diff <- sqrt(sd_c^2 / n_c + sd_t^2 / n_t)                         # $$SE_{diff} = \sqrt{s_c^2/n_c + s_t^2/n_t}$$
      df_w <- (sd_c^2 / n_c + sd_t^2 / n_t)^2 /
        ((sd_c^2 / n_c)^2 / (n_c - 1) + (sd_t^2 / n_t)^2 / (n_t - 1))     # $$df_{Welch} = \frac{(s_c^2/n_c + s_t^2/n_t)^2}{(s_c^2/n_c)^2/(n_c-1) + (s_t^2/n_t)^2/(n_t-1)}$$
      Tcrit <- qtail(df_w, alternative, alpha)
      MDD <- Tcrit * SE_diff
      MDD_pct <- if (!is.na(mu_c) && mu_c != 0) 100 * MDD / mu_c else NA_real_
      data.frame(dose = suppressWarnings(as.numeric(lv)), n_c = n_c, n_t = n_t,
                 df = df_w, SE_diff = SE_diff, Tcrit = Tcrit, MDD = MDD, MDD_pct = MDD_pct,
                 method = "Welch_t")
    })
    res_df <- do.call(rbind, out_rows)
    return(res_df)
  }

  # Assemble and return tibble for Williams/Dunnett branches
  res_df <- data.frame(
    Dose = res_list$dose,
    n_c = res_list$n_c,
    n_t = res_list$n_t,
    df = res_list$df,
    SE_diff = res_list$SE_diff,
    Tcrit = res_list$Tcrit,
    MDD = res_list$MDD,
    MDD_pct = res_list$MDD_pct,
    method = res_list$method,
    alpha = alpha,
    alternative = alternative,
    control_mean = mu_c
  )
  # Order by Dose ascending
  res_df <- res_df[order(res_df$Dose), ]
  res_df
}

Usage examples

Williams PMCMRplus (your MOCK0065 Growth Rate):

# Helper to convert dose strings to numeric, handling various formats
convert_dose <- function(x) {
  if (length(x) == 0) return(numeric(0))
  xc <- as.character(x)
  xc <- trimws(xc)
  xc[xc %in% c("", "n/a", "NA")] <- NA_character_
  # Normalize decimal commas and keep scientific notation (e.g., "4,48E-2" -> "4.48E-2")
  xc <- gsub(",", ".", xc, fixed = TRUE)
  out <- suppressWarnings(as.numeric(xc))
  return(out)
}

convert_numeric <- function(x) {
  if (length(x) == 0) return(numeric(0))
  xc <- as.character(x)
  xc <- trimws(xc)
  xc[xc %in% c("", "n/a", "NA")] <- NA_character_
  xc <- gsub(",", ".", xc, fixed = TRUE)
  suppressWarnings(as.numeric(xc))
}

dose_from_comparison <- function(comp_vec) {
  if (length(comp_vec) == 0) return(numeric(0))
  vapply(comp_vec, function(s) {
    if (is.na(s)) return(NA_real_)
    parts <- strsplit(s, " - ", fixed = TRUE)[[1]]
    convert_dose(parts[1])
  }, FUN.VALUE = numeric(1))
}
w_ep <- test_cases_data %>% dplyr::filter(`Study ID` == "MOCK0065", Endpoint == "Growth Rate")
w_ep$Dose_numeric <- convert_dose(w_ep$Dose)
w_ep$Dose_factor <- factor(w_ep$Dose_numeric, levels = sort(unique(w_ep$Dose_numeric)))
res_william <- broom_williams(Response ~ Dose_factor, data = w_ep,
               test = "Williams_PMCMRplus", alpha = 0.05, alternative = "less")
mdd_w_pm <- compute_mdd_generic(Response ~ Dose_factor, data = w_ep,
                                test = "Williams_PMCMRplus", alpha = 0.05, alternative = "smaller")
print(mdd_w_pm)

Dunnett:

mdd_dunn <- compute_mdd_generic(Response ~ Dose_factor, data = w_ep,
                                test = "Dunnett", alpha = 0.05, alternative = "smaller")
print(mdd_dunn)

- Student’s t (pooled, control vs each dose):

mdd_t <- compute_mdd_generic(Response ~ Dose_factor, data = w_ep,
                             test = "t", alpha = 0.05, alternative = "smaller")
print(mdd_t)

Welch:

mdd_welch <- compute_mdd_generic(Response ~ Dose_factor, data = w_ep,
                                 test = "Welch", alpha = 0.05, alternative = "two-sided")
print(mdd_welch)

Notes and options - Alternative handling: MDD uses a critical value magnitude; the choice of one-sided vs two-sided changes the quantile. If your expected MDD% was computed with two-sided thresholds, set alternative = “two-sided”. - For Dunnett, the adjusted Tcrit is taken from summary(glht)$test$qfunction. If unavailable, we fallback to unadjusted qt with residual df. - For Williams_JG, comparison strings might not carry doses; we assign doses in ascending order excluding control. If you need exact mapping, we can inspect the internal williamsTest_JG output via drcHelper to align rows to doses. - You can extend the test argument to include other tests (e.g., “Dunn”) once you define the appropriate Tcrit. Dunn typically uses z critical values; for MDD you would use zcrit × SE_diff on a rank-based scale, which is a different construct; unless your expected MDD% is defined explicitly for Dunn, I recommend sticking to parametric tests above. - If you’d like me to integrate this into your validation report, I can add an MDD% metric branch that uses compute_mdd_generic under the hood and compares to expected “MDD%” rows for Williams.

Implementation in `drcHelper`

Reusing existing test objects and building a modular system around them is far more efficient and robust.

library(drcHelper)
Results <- broom_williams(Response ~ factor(Dose), data = dat_medium,
               test = "Williams_JG", alpha = 0.05, alternative = "less")

compute_mdd_williams(Results,data=dat_medium,formula = Response ~ factor(Dose))
#> # A tibble: 6 × 2
#>    Dose MDD_pct
#>   <dbl>   <dbl>
#> 1  0.94    8.24
#> 2  1.88    8.57
#> 3  3.75    8.68
#> 4  7.5     8.72
#> 5 15       8.76
#> 6 30       8.76

Implementation in the Validation for Williams

Here’s a self-contained helper to compute per-dose MDD% for Williams and return a tidy table. It uses your existing helpers and broom_williams output.

How to integrate into the validation workflow

In the Williams run_actual handler, compute the prelim summary as before (for Mean, SD, %Inhibition, CV), and also compute MDD%:

prelim <- compute_prelim_summary(endpoint_data)
mdd_tbl <- compute_mdd_williams(endpoint_data, alternative = alternative)

# Return both; the generic runner can join MDD% when expected rows exist
list(actual_df = actual_df, group_means = prelim, mdd = mdd_tbl)

In your expected parsing (parse_expected_metrics for Williams), you already capture “MDD%”:

out$mdd <- pick("\\bMDD%\\b"); 
names(out$mdd)[2] <- "Expected_MDDpct"

In the generic runner’s join stage for Williams, add:

if (!is.null(exp_splits$mdd) && nrow(exp_splits$mdd)) {
  mdd_join <- exp_splits$mdd %>%
    dplyr::left_join(actual$mdd[, c("Dose", "Actual_MDDpct")], by = "Dose") %>%
    dplyr::mutate(
      Endpoint = endpoint,
      Diff = abs(Actual_MDDpct - Expected_MDDpct),
      Status = dplyr::case_when(
        is.na(Expected_MDDpct) | is.na(Actual_MDDpct) ~ "MISSING",
        Diff <= tolerance ~ "PASS", TRUE ~ "FAIL")
    ) %>%
    dplyr::transmute(Endpoint, Dose, metric = "MDD%", Actual = Actual_MDDpct, Expected = Expected_MDDpct, Status)
  pieces <- c(pieces, list(mdd_join))
}

Handling “greater” vs “smaller”

The formula for MDD% doesn’t change with direction; it’s the minimum absolute difference required for significance. You typically compare magnitudes (positive values). If you want to encode a “directional” MDD% (signed), you can attach sign = ifelse(alternative==“greater”, +1, -1) and report MDD%*sign, but most reports use a positive threshold.

Unequal sample sizes and heteroscedasticity

The SE_diff above already handles unequal n via 1/n_c + 1/n_t.
If variances are heterogeneous and a Welch-type approach is used, you would estimate SE_diff via group-specific variances. Williams is based on a pooled-variance ANOVA, so the pooled MSE is appropriate here.

Optional: power-based MDD%

If we later need to compute a power-based MDD% (target power 1−β), use:
- \[MDD_{power} = \left(t_{1-\alpha,\ df} + t_{1-\beta,\ df}\right) \cdot SE_{diff}\]
- Then \[MDD\%_{power} = 100 \cdot MDD_{power} / \mu_c\]
df can be taken from the residual df of the ANOVA model. You can parameterize β (e.g., 0.2 for 80% power) and compute the quantiles using stats::qt.

Notes

We probably need to adjust if the expected MDD% uses a different SE (e.g., contrast-specific SE in Williams instead of the simple (1/n_c + 1/n_t) form).

Key definitions

Code: compute_mdd_generic

Implementation in drcHelper

Implementation in the Validation for Williams

How to integrate into the validation workflow

Handling “greater” vs “smaller”

Unequal sample sizes and heteroscedasticity

Optional: power-based MDD%

Notes

Implementation in `drcHelper`