Biomarker Data Science Pipeline

Cardiovascular · PCSK9 Inhibitor (evolocumab-like) · HeFH

Executive Summary

This report analyses simulated Phase II trial data for a PCSK9 inhibitor (evolocumab-like) in heterozygous familial hypercholesterolaemia (HeFH). PCSK9 inhibition upregulates hepatic LDL receptors, markedly reducing LDL-C — the primary causal risk factor for atherosclerotic cardiovascular disease (ASCVD). Three analytical streams are applied:

Multi-omics — hepatic gene expression (LDLR, PCSK9, HMGCR pathway) and Olink cardiovascular proteomics (primary readout: LDL-C NPX proxy)
Longitudinal & survival — lme4 LDL-C trajectory modelling, Emax PK/PD, Kaplan–Meier time-to-LDL-target, Cox PH for MACE risk
ML pipeline — patient phenotyping, UMAP of lipid/inflammatory profiles, elastic-net + random forest LDL-C response prediction

1 Background & Objectives

1.1 Scientific Rationale

PCSK9 (proprotein convertase subtilisin/kexin type 9) is a serine protease secreted by hepatocytes that binds the LDL receptor (LDLR) and directs it towards lysosomal degradation. This reduces LDLR recycling to the cell surface, impairing LDL-C clearance. Anti-PCSK9 monoclonal antibodies (evolocumab, alirocumab) prevent PCSK9–LDLR binding, increasing LDLR density and dramatically lowering circulating LDL-C.

Lipid biomarker hierarchy in ASCVD risk:

Biomarker	Role	PCSK9i effect
LDL-C (primary)	Causal ASCVD driver	−50 to −70% from baseline
ApoB	LDL particle number (more predictive)	−40 to −55%
Lp(a)	Genetic residual risk; modest PCSK9i effect	−20 to −30%
hsCRP	Inflammatory risk (statin-responsive)	Minimal direct effect
Non-HDL-C	Includes all atherogenic particles	−50 to −60%

HeFH clinical context:

Prevalence: ~1:250 globally; severely elevated LDL-C from birth
Statin intolerance / inadequate LDL-C control common in HeFH
PCSK9 inhibitors achieve guideline LDL-C targets (<1.4 mmol/L) in >70% of HeFH patients
FOURIER (evolocumab) and ODYSSEY OUTCOMES (alirocumab) demonstrated significant MACE reduction

1.2 Study Objectives

Characterise the transcriptomic signature of LDL-C super-responders vs partial responders at week 12
Quantify LDL-C, ApoB, and Lp(a) NPX trajectories using an Emax PK/PD framework
Assess time-to-LDL-target (< guideline threshold) and MACE-surrogate endpoint differences between responder strata
Develop a baseline multi-omics classifier for LDL-C response prediction to guide patient selection

Document Status

Field	Detail
Protocol	CV-PCSK9-003
Therapeutic area	Cardiovascular
Mechanism	PCSK9 Inhibitor (evolocumab-like / repatha)
Data cut	Simulated (seed = 456)
Pipeline version	1.0.0
Classification	Confidential — Internal Use Only

2 Data Simulation

Show code

data_list <- simulate_trial_data(
  n_patients = p("n_patients"),
  n_genes    = 500,
  n_proteins = 50,
  seed       = p("seed")
)

demo            <- data_list$demographics
longitudinal    <- data_list$longitudinal
transcriptomics <- data_list$transcriptomics
batch_df        <- data_list$batch
proteomics      <- data_list$proteomics
survival_df     <- data_list$survival

2.1 Cohort Overview

Show code

n_act   <- sum(demo$treatment == 1)
n_pbo   <- sum(demo$treatment == 0)
r_act   <- mean(demo$true_responder[demo$treatment == 1])
r_pbo   <- mean(demo$true_responder[demo$treatment == 0])
n_genes <- ncol(transcriptomics) - 1
n_prot  <- length(data_list$protein_names)
weeks   <- unique(longitudinal$week)

tibble(
  Parameter = c(
    "Total patients enrolled",
    sprintf("Active arm (%s)", p("drug_class")),
    "Placebo arm",
    "Responder rate \u2014 Active",
    "Responder rate \u2014 Placebo",
    "Transcriptomic features (genes)",
    "Proteomic features (Olink proteins)",
    "Assessment timepoints (weeks)",
    "Total longitudinal records"
  ),
  Value = c(
    nrow(demo), n_act, n_pbo,
    sprintf("%.1f%%", r_act * 100),
    sprintf("%.1f%%", r_pbo * 100),
    n_genes, n_prot,
    paste(sort(weeks), collapse = ", "),
    nrow(longitudinal)
  )
) |>
  kbl(booktabs = TRUE, align = c("l", "r")) |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) |>
  row_spec(0, bold = TRUE, color = "white", background = ac)

Table 1: Simulated trial cohort — demographic summary

Parameter	Value
Total patients enrolled	200
Active arm (Anti-PCSK9 mAb)	109
Placebo arm	91
Responder rate — Active	74.3%
Responder rate — Placebo	17.6%
Transcriptomic features (genes)	500
Proteomic features (Olink proteins)	50
Assessment timepoints (weeks)	0, 4, 8, 12, 24
Total longitudinal records	1000

Simulation Parameters

All data are fully synthetic (seed = 456). The simulation encodes realistic biological structure: batch effects in transcriptomics, Emax-shaped primary biomarker trajectories, and gene-expression-linked responder status calibrated to the Heterozygous Familial Hypercholesterolaemia (HeFH) setting.

3 Step 1 — Multi-Omics Analysis

Show code

qc_res    <- qc_transcriptomics(transcriptomics, batch_df, demo)
de_df     <- differential_expression(qc_res$expr_filtered, demo)
prot_res  <- proteomics_analysis(proteomics)
omics_int <- multiomics_integration(qc_res$expr_filtered, proteomics, demo)

3.1 Transcriptomics Quality Control

Show code

pca_df <- if (inherits(qc_res$pca_res, "prcomp")) {
  as.data.frame(qc_res$pca_res$x[, 1:2]) |>
    tibble::rownames_to_column("patient_id") |>
    dplyr::left_join(dplyr::select(demo, patient_id, treatment), by = "patient_id")
} else {
  df <- as.data.frame(qc_res$pca_res)
  if (!"PC1" %in% names(df)) names(df)[1:2] <- c("PC1", "PC2")
  if (!"treatment" %in% names(df))
    df <- tibble::rownames_to_column(df, "patient_id") |>
      dplyr::left_join(dplyr::select(demo, patient_id, treatment), by = "patient_id")
  df
}

ggplot(pca_df, aes(PC1, PC2, colour = factor(treatment))) +
  geom_point(alpha = 0.75, size = 2.5) +
  stat_ellipse(level = 0.90, linetype = "dashed") +
  scale_colour_manual(
    values = c("0" = "#AAAAAA", "1" = ac),
    labels = c("0" = "Placebo", "1" = "Active")
  ) +
  labs(title = "Transcriptomics PCA \u2014 Post Batch Correction",
       subtitle = "90% confidence ellipses per arm",
       x = "PC1", y = "PC2", colour = "Treatment")

Figure 1: PCA of batch-corrected transcriptomics, coloured by treatment arm.

3.2 Differential Expression (Welch t-test, BH-FDR)

Show code

ggplot(de_norm, aes(fc, -log10(pval_raw), colour = sig)) +
  geom_point(alpha = 0.55, size = 1.6) +
  geom_hline(yintercept = -log10(0.05), linetype = "dashed", colour = "grey40") +
  geom_vline(xintercept = c(-1, 1), linetype = "dashed", colour = "grey40") +
  scale_colour_manual(values = c("Up in Active" = "#C0392B",
                                  "Down in Active" = "#2980B9", "NS" = "grey70")) +
  labs(title = "Differential Expression: Active vs Placebo (Baseline)",
       subtitle = sprintf("%d genes at FDR < 5%% (Welch t-test, BH)", n_de),
       x = "log\u2082 Fold Change", y = "-log\u2081\u2080(p-value)", colour = NULL) +
  annotate("text", x = Inf, y = Inf, label = sprintf("n DE = %d", n_de),
           hjust = 1.1, vjust = 1.3, size = 4, colour = ac, fontface = "bold")

Figure 2: Volcano plot of baseline differential expression (Active vs Placebo).

Show code

tbl_cols <- intersect(c("gene_id", "fc", "AveExpr", "t_stat", "pval_raw", "fdr"),
                      names(de_norm))
de_norm |>
  dplyr::filter(fdr < 0.05) |>
  dplyr::arrange(fdr) |>
  dplyr::slice_head(n = 10) |>
  dplyr::select(all_of(tbl_cols)) |>
  dplyr::rename_with(~ dplyr::case_match(.x,
    "gene_id" ~ "Gene", "fc" ~ "log\u2082FC", "AveExpr" ~ "Ave. Expr",
    "t_stat" ~ "t", "pval_raw" ~ "p-value", "fdr" ~ "FDR", .default = .x)) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kbl(booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE) |>
  column_spec(length(tbl_cols), bold = TRUE) |>
  row_spec(0, bold = TRUE, color = "white", background = ac)

Table 2: Top 10 differentially expressed genes (ranked by adjusted p-value)

Gene	log₂FC	Ave. Expr	t	FDR
GENE0068	1.6345	NA	5.3150	0e+00
GENE0196	1.4516	NA	5.2899	0e+00
GENE0453	1.4897	NA	5.0494	1e-04
GENE0304	1.4509	NA	4.9151	1e-04
GENE0224	1.4227	NA	4.9094	1e-04
GENE0359	1.3353	NA	4.8245	1e-04
GENE0350	1.2866	NA	4.6787	2e-04
GENE0269	-1.1758	NA	-4.3495	6e-04
GENE0446	1.2062	NA	4.3361	6e-04
GENE0070	1.3006	NA	4.2535	7e-04

3.3 Olink Proteomics — NPX Dynamics

Show code

prot_long <- local({
  df       <- as.data.frame(proteomics)
  prot_col <- intersect(c("protein","protein_name","analyte","Assay","OlinkID"), names(df))
  npx_col  <- intersect(c("NPX","npx","value","expression","NPX_value"), names(df))
  wk_col   <- intersect(c("week","Week","time","timepoint","visit"), names(df))
  meta_candidates <- c("patient_id","treatment","week","Week","time","timepoint",
                        "true_responder")

  if (length(prot_col) > 0 && length(npx_col) > 0) {
    # ── Already long ─────────────────────────────────────────────────────────
    out <- dplyr::rename(df, protein = !!prot_col[1], NPX = !!npx_col[1])
    if (length(wk_col) > 0 && wk_col[1] != "week")
      out <- dplyr::rename(out, week = !!wk_col[1])
    out
  } else {
    # ── Wide format: pivot all non-metadata columns to long ───────────────────
    meta_cols <- intersect(meta_candidates, names(df))
    out <- tidyr::pivot_longer(df,
                               cols      = -dplyr::all_of(meta_cols),
                               names_to  = "protein",
                               values_to = "NPX")
    alt <- intersect(c("Week","time","timepoint"), names(out))
    if (!"week" %in% names(out) && length(alt) > 0)
      out <- dplyr::rename(out, week = !!alt[1])
    out
  }
})

# ── Resolve which protein to plot ─────────────────────────────────────────────
available_proteins <- unique(prot_long$protein)
plot_protein <- if (p("primary_biomarker") %in% available_proteins) {
  p("primary_biomarker")
} else {
  # Pick the closest match by name, or fall back to first protein
  match_idx <- agrep(p("primary_biomarker"), available_proteins,
                     ignore.case = TRUE, max.distance = 0.4)
  if (length(match_idx) > 0) available_proteins[match_idx[1]] else available_proteins[1]
}
if (plot_protein != p("primary_biomarker"))
  message(sprintf("Primary biomarker '%s' not found in proteomics data. Plotting '%s' instead.",
                  p("primary_biomarker"), plot_protein))

md      <- as.data.frame(prot_res$mean_delta)
pc_col  <- intersect(c("protein","protein_name","analyte"), names(md))[1]
get_d   <- function(trt) {
  rows <- !is.na(pc_col) & md[[pc_col]] == plot_protein & md$treatment == trt
  if (any(rows, na.rm = TRUE)) md$mean_delta[rows][1] else NA_real_
}
act_pb  <- get_d(1); pbo_pb <- get_d(0)
sub_txt <- if (!is.na(act_pb) && !is.na(pbo_pb))
  sprintf("Active \u0394 (Wk0\u219212): %+.3f  |  Placebo \u0394: %+.3f", act_pb, pbo_pb) else
  sprintf("%s NPX change from baseline", plot_protein)

prot_plot <- prot_long |>
  dplyr::filter(protein == plot_protein)

# Only join true_responder from demo if the column isn't already present
if (!"true_responder" %in% names(prot_plot))
  prot_plot <- dplyr::left_join(
    prot_plot, dplyr::select(demo, patient_id, true_responder), by = "patient_id"
  )

prot_plot |>
  dplyr::mutate(Arm = ifelse(treatment == 1, "Active", "Placebo"),
                Responder = ifelse(true_responder == 1, "Responder", "Non-Responder")) |>
  dplyr::group_by(Arm, Responder, week) |>
  dplyr::summarise(mn = mean(NPX), se = sd(NPX)/sqrt(dplyr::n()), .groups = "drop") |>
  ggplot(aes(week, mn, colour = Arm, linetype = Responder, fill = Arm)) +
    geom_ribbon(aes(ymin = mn - se, ymax = mn + se), alpha = 0.15, colour = NA) +
    geom_line(linewidth = 1) + geom_point(size = 2.5) +
    scale_colour_manual(values = c("Active" = ac, "Placebo" = "#AAAAAA")) +
    scale_fill_manual(values   = c("Active" = ac, "Placebo" = "#AAAAAA")) +
    scale_x_continuous(breaks = sort(unique(prot_long$week))) +
    labs(title = sprintf("%s NPX Over Time", p("primary_biomarker")),
         subtitle = sub_txt, x = "Week", y = "NPX (log\u2082)",
         colour = "Treatment", fill = "Treatment", linetype = "Responder Status")

Figure 3: **LDL_C** NPX trajectories by arm and responder status. Ribbons: ±1 SE.

3.4 Cross-Modal Integration

Show code

tryCatch({
  grid::grid.newpage()
  plot_multiomics(qc_res$expr_filtered, de_df, demo, proteomics,
                  omics_int$cross_corr, qc_res$pca_res)
}, error = function(e) {
  png_path <- file.path("outputs", "01_multiomics_analysis.png")
  if (file.exists(png_path)) knitr::include_graphics(png_path)
  else message("plot_multiomics() error: ", conditionMessage(e))
})

Figure 4: Cross-modal correlation heatmap — transcriptomic PCs vs Olink NPX.

Figure 5: Cross-modal correlation heatmap — transcriptomic PCs vs Olink NPX.

4 Step 2 — Longitudinal Modelling & Survival

Show code

lme_fit  <- fit_mixed_effects_model(longitudinal)
emax_res <- fit_emax_pd_model(longitudinal)
km_res   <- kaplan_meier_analysis(survival_df)
cox_res  <- cox_ph_analysis(survival_df)

4.1 Linear Mixed-Effects Model

\[Y_{ij} = \beta_0 + \beta_1\,\text{week}_{ij} + \beta_2\,\text{trt}_i + \beta_3\,(\text{week}\times\text{trt})_{ij} + b_i + \varepsilon_{ij}\]

Show code

lme_tbl <- if (requireNamespace("broom.mixed", quietly = TRUE)) {
  broom.mixed::tidy(lme_fit, effects = "fixed", conf.int = TRUE) |>
    dplyr::select(dplyr::any_of(c("term","estimate","std.error","statistic","conf.low","conf.high")))
} else {
  sm <- summary(lme_fit); cm <- as.data.frame(sm$coefficients)
  tibble::tibble(term = rownames(cm), estimate = cm[[1]], std.error = cm[[2]],
                 statistic = cm[[grep("t.value|t value|z.value|z value", names(cm))[1]]])
}
int_rows <- which(grepl("week.*trt|trt.*week|week:treat|treat.*:.*week",
                         lme_tbl$term, ignore.case = TRUE))

lme_kbl <- lme_tbl |>
  dplyr::mutate(dplyr::across(where(is.numeric), \(x) round(x, 3))) |>
  dplyr::rename(Term = term, Estimate = estimate, SE = std.error,
                `t-stat` = dplyr::any_of("statistic"),
                `CI Low`  = dplyr::any_of("conf.low"),
                `CI High` = dplyr::any_of("conf.high")) |>
  kbl(booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE) |>
  row_spec(0, bold = TRUE, color = "white", background = ac)
if (length(int_rows) > 0)
  lme_kbl <- kableExtra::row_spec(lme_kbl, int_rows, bold = TRUE, background = "#EAF3FB")
lme_kbl

Table 3: LME fixed effects — week × treatment interaction is the primary estimand.

Term	Estimate	SE	t-stat	CI Low	CI High
(Intercept)	6.410	0.119	53.950	6.177	6.643
week	-0.007	0.009	-0.724	-0.025	0.012
treatment	-0.436	0.161	-2.707	-0.751	-0.120
week:treatment	-0.086	0.013	-6.822	-0.111	-0.062

4.2 Emax Pharmacodynamic Model

\[E(t) = E_0 - \frac{E_{\max}\cdot t^\gamma}{EC_{50}^\gamma + t^\gamma}\]

Show code

tryCatch({
  grid::grid.newpage()
  plot_longitudinal_survival(longitudinal, survival_df, lme_fit, emax_res, km_res, cox_res)
}, error = function(e) {
  png_path <- file.path("outputs", "02_longitudinal_survival.png")
  if (file.exists(png_path)) knitr::include_graphics(png_path)
  else message("plot_longitudinal_survival() error: ", conditionMessage(e))
})

#> 
#>   ✓ Saved: outputs/02_longitudinal_survival.png

Show code

as.data.frame(emax_res$params) |>
  dplyr::mutate(dplyr::across(where(is.numeric), \(x) round(x, 3))) |>
  kbl(booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE) |>
  row_spec(0, bold = TRUE, color = "white", background = ac)

Table 4: Emax PD parameter estimates by responder stratum.

	Responders	Non.Responders
emax	82.130	37.871
ec50	4.034	3.700

4.3 Kaplan–Meier & Cox PH

Show code

tryCatch({
  if (!requireNamespace("survminer", quietly = TRUE)) stop("survminer not installed")
  print(survminer::ggsurvplot(
    km_res$km_fit, data = survival_df,
    palette = c("#AAAAAA", ac), conf.int = TRUE, pval = TRUE, risk.table = TRUE,
    ggtheme = theme_trial(), legend.labs = c("Placebo","Active"),
    title = "Time to Sustained Clinical Response",
    xlab = "Time (weeks)", ylab = "Response-free probability"
  ))
}, error = function(e) {
  plot(km_res$km_fit, col = c("#AAAAAA", ac), lwd = 2,
       xlab = "Time (weeks)", ylab = "Response-free probability",
       main = "Time to Sustained Clinical Response")
  legend("topright", legend = c("Placebo","Active"), col = c("#AAAAAA", ac), lwd = 2)
})

Figure 6: Kaplan–Meier curves — time to sustained response by treatment arm.

Show code

cox_tbl  <- broom::tidy(cox_res$cox_fit, exponentiate = TRUE, conf.int = TRUE) |>
  dplyr::select(dplyr::any_of(c("term","estimate","conf.low","conf.high","p.value")))
trt_rows <- which(grepl("treatment|trt", cox_tbl$term, ignore.case = TRUE))

cox_kbl <- cox_tbl |>
  dplyr::mutate(
    dplyr::across(dplyr::any_of(c("estimate","conf.low","conf.high")), \(x) round(x, 3)),
    p.value = ifelse(p.value < 0.001, "<0.001", as.character(round(p.value, 3)))
  ) |>
  dplyr::rename(Covariate = term, HR = estimate,
                `95% CI Low`  = dplyr::any_of("conf.low"),
                `95% CI High` = dplyr::any_of("conf.high"),
                `p-value` = p.value) |>
  kbl(booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE) |>
  row_spec(0, bold = TRUE, color = "white", background = ac)
if (length(trt_rows) > 0)
  cox_kbl <- kableExtra::row_spec(cox_kbl, trt_rows, bold = TRUE, background = "#FEF3E8")
cox_kbl

Table 5: Cox PH model — covariate hazard ratios for time to response.

Covariate	HR	95% CI Low	95% CI High	p-value
treatment	11.276	5.818	21.854	<0.001
baseline_igg_z	0.798	0.615	1.036	0.09
latent_biology_z	2.711	2.161	3.402	<0.001

5 Step 3 — Machine Learning Pipeline

Show code

dr_res      <- dimensionality_reduction(qc_res$expr_filtered, demo)
cluster_res <- patient_clustering(dr_res$pca_res, demo, n_clusters = 3L)
ml_res      <- biomarker_response_prediction(qc_res$expr_filtered, proteomics, demo)

5.1 Dimensionality Reduction & Clustering

Show code

tryCatch({
  grid::grid.newpage()
  plot_ml_results(dr_res, cluster_res, ml_res)
}, error = function(e) {
  png_path <- file.path("outputs", "03_ml_pipeline.png")
  if (file.exists(png_path)) knitr::include_graphics(png_path)
  else message("plot_ml_results() error: ", conditionMessage(e))
})

#> 
#>   ✓ Saved: outputs/03_ml_pipeline.png

Show code

cluster_tbl <- tryCatch({
  df <- as.data.frame(cluster_res$cluster_summary)
  if (nrow(df) == 0) stop("empty")
  df
}, error = function(e) {
  cv <- if (!is.null(cluster_res$clusters))    cluster_res$clusters    else
        if (!is.null(cluster_res$cluster))     cluster_res$cluster     else
        if (!is.null(cluster_res$assignments)) cluster_res$assignments else
        stop("No cluster vector in cluster_res: ", paste(names(cluster_res), collapse = ", "))
  ci <- as.integer(unlist(cv))
  n  <- nrow(demo)
  if (length(ci) %% n == 0 && length(ci) > n) ci <- ci[seq_len(n)]
  demo |>
    dplyr::mutate(Cluster = ci) |>
    dplyr::group_by(Cluster) |>
    dplyr::summarise(N = dplyr::n(),
                     `Active (%)` = round(mean(treatment == 1) * 100, 1),
                     `Responders (%)` = round(mean(true_responder == 1) * 100, 1),
                     .groups = "drop")
})
cluster_tbl |>
  dplyr::mutate(dplyr::across(where(is.numeric), \(x) round(x, 2))) |>
  kbl(booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE) |>
  row_spec(0, bold = TRUE, color = "white", background = ac)

Table 6: Patient cluster composition by treatment arm and responder status.

Cluster	N	Active (%)	Responders (%)
NA	200	54.5	48.5

5.2 Biomarker Response Prediction

Show code

tibble::tibble(Feature = ml_res$selected_feats) |>
  dplyr::mutate(Rank = dplyr::row_number()) |>
  dplyr::relocate(Rank) |>
  kbl(booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE, font_size = 12) |>
  row_spec(0, bold = TRUE, color = "white", background = ac)

Table 7: Elastic-net selected features at optimal λ.

Rank	Feature
1	GENE0011
2	GENE0031
3	GENE0033
4	GENE0048
5	GENE0056
6	GENE0068
7	GENE0070
8	GENE0089
9	GENE0103
10	GENE0116
11	GENE0118
12	GENE0162
13	GENE0181
14	GENE0184
15	GENE0196
16	GENE0215
17	GENE0223
18	GENE0224
19	GENE0232
20	GENE0246
21	GENE0269
22	GENE0275
23	GENE0289
24	GENE0304
25	GENE0305
26	GENE0308
27	GENE0314
28	GENE0316
29	GENE0350
30	GENE0351
31	GENE0359
32	GENE0372
33	GENE0374
34	GENE0377
35	GENE0378
36	GENE0399
37	GENE0415
38	GENE0446
39	GENE0453
40	GENE0458
41	GENE0479
42	IL6
43	TNF
44	IL10
45	IL17A
46	CRP
47	SAA
48	PROT019

Show code

imp_df   <- as.data.frame(ml_res$importance_df)
feat_col <- intersect(c("feature","Feature","variable","Variable","gene"), names(imp_df))[1]
imp_col  <- intersect(c("importance","MeanDecreaseGini","MeanDecreaseAccuracy",
                         "IncNodePurity","Overall"), names(imp_df))[1]
if (is.na(feat_col) || is.na(imp_col))
  stop("Cannot identify columns in ml_res$importance_df: ",
       paste(names(imp_df), collapse = ", "))
imp_df <- dplyr::rename(imp_df, feature = !!feat_col, importance = !!imp_col)

imp_df |>
  dplyr::slice_head(n = min(20L, nrow(imp_df))) |>
  dplyr::mutate(feature = forcats::fct_reorder(feature, importance)) |>
  ggplot(aes(importance, feature, fill = importance)) +
    geom_col(show.legend = FALSE) +
    scale_fill_gradient(low = "#CCCCCC", high = ac) +
    labs(title = "Random Forest \u2014 Variable Importance",
         subtitle = sprintf("Top biomarker: %s  |  OOB AUROC = %.3f",
                            imp_df$feature[which.max(imp_df$importance)], ml_res$auc_oob),
         x = "Mean Decrease in Gini Impurity", y = NULL)

Figure 7: Top 20 random forest variable importance scores (mean decrease in Gini).

Show code

tryCatch({
  if (!requireNamespace("pROC", quietly = TRUE)) stop("pROC not installed")
  scores  <- if (!is.null(ml_res$oob_probs)) ml_res$oob_probs else
             if (!is.null(ml_res$oob_pred))  ml_res$oob_pred  else
             stop("No OOB scores in ml_res")
  labels  <- if (!is.null(ml_res$y_true))   ml_res$y_true    else
             if (!is.null(ml_res$labels))    ml_res$labels    else
             stop("No true labels in ml_res")
  roc_obj <- pROC::roc(labels, scores, quiet = TRUE)
  pROC::ggroc(roc_obj, colour = ac, linewidth = 1.1) +
    geom_abline(slope = 1, intercept = 1, linetype = "dashed", colour = "grey60") +
    annotate("text", x = 0.25, y = 0.1,
             label = sprintf("AUROC = %.3f", pROC::auc(roc_obj)),
             size = 5, colour = ac, fontface = "bold") +
    labs(title = "Random Forest \u2014 OOB ROC Curve",
         x = "Specificity", y = "Sensitivity")
}, error = function(e) message("ROC skipped: ", conditionMessage(e)))

6 Key Findings

Show code

tibble::tibble(
  Domain = c("Multi-Omics","Multi-Omics",
             "Longitudinal","Longitudinal","Longitudinal","Longitudinal",
             "ML Pipeline","ML Pipeline","ML Pipeline"),
  Finding = c(
    sprintf("%d genes differentially expressed at baseline (Welch t-test, FDR < 5%%)", n_de),
    sprintf("%s NPX \u0394 (Wk0\u219212): Active %s vs Placebo %s",
            p("primary_biomarker"), fmt_d(act_pb), fmt_d(pbo_pb)),
    "lme4 LME: significant week \u00d7 treatment interaction",
    "Emax PD: Responders ~80% vs Non-Responders ~40% primary-biomarker reduction",
    sprintf("Log-rank p = %.5f (active responds significantly earlier)", km_res$lr_p),
    sprintf("Cox PH treatment HR = %.2f\u00d7 (adjusted)", cox_res$trt_hr),
    sprintf("Elastic-net selected %d features at optimal \u03bb", length(ml_res$selected_feats)),
    sprintf("Random forest OOB AUROC = %.3f", ml_res$auc_oob),
    sprintf("Top predictive biomarker: %s", top_b)
  )
) |>
  kbl(booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = TRUE) |>
  column_spec(1, bold = TRUE, width = "3cm") |>
  row_spec(0, bold = TRUE, color = "white", background = ac) |>
  kableExtra::collapse_rows(columns = 1, valign = "top")

Table 8: Summary of key findings across all analytical domains.

Domain	Finding
Multi-Omics	40 genes differentially expressed at baseline (Welch t-test, FDR < 5%)
Multi-Omics	LDL_C NPX Δ (Wk0→12): Active -0.641 vs Placebo -0.075
Longitudinal	lme4 LME: significant week × treatment interaction
	Emax PD: Responders ~80% vs Non-Responders ~40% primary-biomarker reduction
	Log-rank p = 0.00000 (active responds significantly earlier)
	Cox PH treatment HR = 11.28× (adjusted)
ML Pipeline	Elastic-net selected 48 features at optimal λ
	Random forest OOB AUROC = 0.981
	Top predictive biomarker: GENE0068

7 Session Information

Expand for full session info

if (requireNamespace("sessioninfo", quietly = TRUE)) sessioninfo::session_info() else sessionInfo()

#> ─ Session info ─────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.2 (2024-10-31)
#>  os       macOS 26.3
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Brussels
#>  date     2026-05-13
#>  pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>  quarto   1.7.32 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto
#> 
#> ─ Packages ─────────────────────────────────────────────────────────────────────────────
#>  package              * version    date (UTC) lib source
#>  abind                  1.4-8      2024-09-12 [1] CRAN (R 4.4.1)
#>  askpass                1.2.1      2024-10-04 [1] CRAN (R 4.4.1)
#>  backports              1.5.0      2024-05-23 [1] CRAN (R 4.4.1)
#>  Biobase              * 2.66.0     2024-11-08 [1] Bioconductor 3.20 (R 4.4.1)
#>  BiocGenerics         * 0.52.0     2024-11-08 [1] Bioconductor 3.20 (R 4.4.1)
#>  boot                   1.3-32     2025-08-29 [1] CRAN (R 4.4.1)
#>  broom                * 1.0.12     2026-01-27 [1] CRAN (R 4.4.3)
#>  broom.mixed            0.2.9.7    2026-02-17 [1] CRAN (R 4.4.3)
#>  car                    3.1-3      2024-09-27 [1] CRAN (R 4.4.1)
#>  carData                3.0-6      2026-01-30 [1] CRAN (R 4.4.3)
#>  caret                  7.0-1      2024-12-10 [1] CRAN (R 4.4.1)
#>  class                  7.3-23     2025-01-01 [1] CRAN (R 4.4.1)
#>  cli                    3.6.5      2025-04-23 [1] CRAN (R 4.4.1)
#>  cluster              * 2.1.8.1    2025-03-12 [1] CRAN (R 4.4.1)
#>  codetools              0.2-20     2024-03-31 [1] CRAN (R 4.4.2)
#>  crayon                 1.5.3      2024-06-20 [1] CRAN (R 4.4.1)
#>  data.table             1.18.2.1   2026-01-27 [1] CRAN (R 4.4.3)
#>  DelayedArray           0.32.0     2024-11-08 [1] Bioconductor 3.20 (R 4.4.1)
#>  digest                 0.6.39     2025-11-19 [1] CRAN (R 4.4.3)
#>  dplyr                * 1.2.0      2026-02-03 [1] CRAN (R 4.4.3)
#>  evaluate               1.0.5      2025-08-27 [1] CRAN (R 4.4.1)
#>  farver                 2.1.2      2024-05-13 [1] CRAN (R 4.4.1)
#>  fastmap                1.2.0      2024-05-15 [1] CRAN (R 4.4.1)
#>  forcats                1.0.1      2025-09-25 [1] CRAN (R 4.4.1)
#>  foreach                1.5.2      2022-02-02 [1] CRAN (R 4.4.0)
#>  Formula                1.2-5      2023-02-24 [1] CRAN (R 4.4.1)
#>  furrr                  0.3.1      2022-08-15 [1] CRAN (R 4.4.0)
#>  future                 1.69.0     2026-01-16 [1] CRAN (R 4.4.3)
#>  future.apply           1.20.1     2025-12-09 [1] CRAN (R 4.4.3)
#>  generics               0.1.4      2025-05-09 [1] CRAN (R 4.4.1)
#>  GenomeInfoDb         * 1.42.3     2025-01-27 [1] Bioconductor 3.20 (R 4.4.2)
#>  GenomeInfoDbData       1.2.13     2026-05-07 [1] Bioconductor
#>  GenomicRanges        * 1.58.0     2024-11-08 [1] Bioconductor 3.20 (R 4.4.1)
#>  ggplot2              * 4.0.1      2025-11-14 [1] CRAN (R 4.4.3)
#>  ggpubr                 0.6.2      2025-10-17 [1] CRAN (R 4.4.1)
#>  ggrepel              * 0.9.6      2024-09-07 [1] CRAN (R 4.4.1)
#>  ggsignif               0.6.4      2022-10-13 [1] CRAN (R 4.4.0)
#>  glmnet               * 4.1-10     2025-07-17 [1] CRAN (R 4.4.1)
#>  globals                0.18.0     2025-05-08 [1] CRAN (R 4.4.1)
#>  glue                   1.8.0      2024-09-30 [1] CRAN (R 4.4.1)
#>  gower                  1.0.2      2024-12-17 [1] CRAN (R 4.4.1)
#>  gridExtra            * 2.3        2017-09-09 [1] CRAN (R 4.4.1)
#>  gtable                 0.3.6      2024-10-25 [1] CRAN (R 4.4.1)
#>  hardhat                1.4.2      2025-08-20 [1] CRAN (R 4.4.1)
#>  htmltools              0.5.9      2025-12-04 [1] CRAN (R 4.4.3)
#>  htmlwidgets            1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
#>  httr                   1.4.7      2023-08-15 [1] CRAN (R 4.4.0)
#>  ipred                  0.9-15     2024-07-18 [1] CRAN (R 4.4.0)
#>  IRanges              * 2.40.1     2024-12-05 [1] Bioconductor 3.20 (R 4.4.2)
#>  iterators              1.0.14     2022-02-05 [1] CRAN (R 4.4.1)
#>  jsonlite               2.0.0      2025-03-27 [1] CRAN (R 4.4.1)
#>  kableExtra           * 1.4.0      2024-01-24 [1] CRAN (R 4.4.0)
#>  km.ci                  0.5-6      2022-04-06 [1] CRAN (R 4.4.0)
#>  KMsurv                 0.1-6      2025-05-20 [1] CRAN (R 4.4.1)
#>  knitr                * 1.51       2025-12-20 [1] CRAN (R 4.4.3)
#>  labeling               0.4.3      2023-08-29 [1] CRAN (R 4.4.1)
#>  lattice                0.22-7     2025-04-02 [1] CRAN (R 4.4.1)
#>  lava                   1.8.2      2025-10-30 [1] CRAN (R 4.4.1)
#>  lifecycle              1.0.5      2026-01-08 [1] CRAN (R 4.4.3)
#>  limma                * 3.62.2     2025-01-09 [1] Bioconductor 3.20 (R 4.4.2)
#>  listenv                0.10.0     2025-11-02 [1] CRAN (R 4.4.1)
#>  lme4                 * 1.1-38     2025-12-02 [1] CRAN (R 4.4.3)
#>  lmerTest             * 3.2-1      2026-03-05 [1] CRAN (R 4.4.3)
#>  lubridate              1.9.4      2024-12-08 [1] CRAN (R 4.4.1)
#>  magrittr               2.0.4      2025-09-12 [1] CRAN (R 4.4.1)
#>  MASS                   7.3-65     2025-02-28 [1] CRAN (R 4.4.1)
#>  Matrix               * 1.7-4      2025-08-28 [1] CRAN (R 4.4.1)
#>  MatrixGenerics       * 1.18.1     2025-01-09 [1] Bioconductor 3.20 (R 4.4.2)
#>  matrixStats          * 1.5.0      2025-01-07 [1] CRAN (R 4.4.1)
#>  minqa                  1.2.8      2024-08-17 [1] CRAN (R 4.4.1)
#>  ModelMetrics           1.2.2.2    2020-03-17 [1] CRAN (R 4.4.1)
#>  nlme                   3.1-168    2025-03-31 [1] CRAN (R 4.4.1)
#>  nloptr                 2.2.1      2025-03-17 [1] CRAN (R 4.4.1)
#>  nnet                   7.3-20     2025-01-01 [1] CRAN (R 4.4.1)
#>  numDeriv               2016.8-1.1 2019-06-06 [1] CRAN (R 4.4.1)
#>  openssl                2.3.4      2025-09-30 [1] CRAN (R 4.4.1)
#>  otel                   0.2.0      2025-08-29 [1] CRAN (R 4.4.1)
#>  parallelly             1.46.1     2026-01-08 [1] CRAN (R 4.4.3)
#>  patchwork            * 1.3.2      2025-08-25 [1] CRAN (R 4.4.1)
#>  pheatmap             * 1.0.13     2025-06-05 [1] CRAN (R 4.4.1)
#>  pillar                 1.11.1     2025-09-17 [1] CRAN (R 4.4.1)
#>  pkgconfig              2.0.3      2019-09-22 [1] CRAN (R 4.4.1)
#>  plyr                   1.8.9      2023-10-02 [1] CRAN (R 4.4.1)
#>  png                    0.1-8      2022-11-29 [1] CRAN (R 4.4.1)
#>  pROC                 * 1.19.0.1   2025-07-31 [1] CRAN (R 4.4.1)
#>  prodlim                2025.04.28 2025-04-28 [1] CRAN (R 4.4.1)
#>  purrr                * 1.2.1      2026-01-09 [1] CRAN (R 4.4.3)
#>  R6                     2.6.1      2025-02-15 [1] CRAN (R 4.4.1)
#>  ragg                   1.5.0      2025-09-02 [1] CRAN (R 4.4.1)
#>  randomForest         * 4.7-1.2    2024-09-22 [1] CRAN (R 4.4.1)
#>  rbibutils              2.4.1      2026-01-21 [1] CRAN (R 4.4.3)
#>  RColorBrewer         * 1.1-3      2022-04-03 [1] CRAN (R 4.4.1)
#>  Rcpp                   1.1.1      2026-01-10 [1] CRAN (R 4.4.3)
#>  Rdpack                 2.6.5      2026-01-23 [1] CRAN (R 4.4.3)
#>  recipes                1.3.1      2025-05-21 [1] CRAN (R 4.4.1)
#>  reformulas             0.4.3.1    2026-01-08 [1] CRAN (R 4.4.3)
#>  reshape2               1.4.5      2025-11-12 [1] CRAN (R 4.4.1)
#>  reticulate             1.44.1     2025-11-14 [1] CRAN (R 4.4.3)
#>  rlang                  1.2.0      2026-04-06 [1] CRAN (R 4.4.2)
#>  rmarkdown              2.30       2025-09-28 [1] CRAN (R 4.4.1)
#>  rpart                  4.1.24     2025-01-07 [1] CRAN (R 4.4.1)
#>  RSpectra               0.16-2     2024-07-18 [1] CRAN (R 4.4.0)
#>  rstatix                0.7.3      2025-10-18 [1] CRAN (R 4.4.1)
#>  rstudioapi             0.18.0     2026-01-16 [1] CRAN (R 4.4.3)
#>  Rtsne                  0.17       2023-12-07 [1] CRAN (R 4.4.1)
#>  S4Arrays               1.6.0      2024-11-08 [1] Bioconductor 3.20 (R 4.4.1)
#>  S4Vectors            * 0.44.0     2024-11-08 [1] Bioconductor 3.20 (R 4.4.1)
#>  S7                     0.2.1      2025-11-14 [1] CRAN (R 4.4.3)
#>  scales                 1.4.0      2025-04-24 [1] CRAN (R 4.4.1)
#>  sessioninfo            1.2.3      2025-02-05 [1] CRAN (R 4.4.1)
#>  shape                  1.4.6.1    2024-02-23 [1] CRAN (R 4.4.1)
#>  SparseArray            1.6.2      2025-02-20 [1] Bioconductor 3.20 (R 4.4.2)
#>  statmod                1.5.1      2025-10-09 [1] CRAN (R 4.4.1)
#>  stringi                1.8.7      2025-03-27 [1] CRAN (R 4.4.1)
#>  stringr                1.6.0      2025-11-04 [1] CRAN (R 4.4.1)
#>  SummarizedExperiment * 1.36.0     2024-11-08 [1] Bioconductor 3.20 (R 4.4.1)
#>  survival             * 3.8-6      2026-01-16 [1] CRAN (R 4.4.3)
#>  survminer              0.5.1      2025-09-02 [1] CRAN (R 4.4.1)
#>  survMisc               0.5.6      2022-04-07 [1] CRAN (R 4.4.0)
#>  svglite                2.2.2      2025-10-21 [1] CRAN (R 4.4.1)
#>  systemfonts            1.3.1      2025-10-01 [1] CRAN (R 4.4.1)
#>  textshaping            1.0.4      2025-10-10 [1] CRAN (R 4.4.1)
#>  tibble               * 3.3.1      2026-01-11 [1] CRAN (R 4.4.3)
#>  tidyr                * 1.3.2      2025-12-19 [1] CRAN (R 4.4.3)
#>  tidyselect             1.2.1      2024-03-11 [1] CRAN (R 4.4.0)
#>  timechange             0.4.0      2026-01-29 [1] CRAN (R 4.4.3)
#>  timeDate               4052.112   2026-01-28 [1] CRAN (R 4.4.3)
#>  UCSC.utils             1.2.0      2024-11-08 [1] Bioconductor 3.20 (R 4.4.1)
#>  umap                   0.2.10.0   2023-02-01 [1] CRAN (R 4.4.0)
#>  utf8                   1.2.6      2025-06-08 [1] CRAN (R 4.4.1)
#>  vctrs                  0.7.1      2026-01-23 [1] CRAN (R 4.4.3)
#>  viridisLite            0.4.2      2023-05-02 [1] CRAN (R 4.4.1)
#>  withr                  3.0.2      2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun                   0.56       2026-01-18 [1] CRAN (R 4.4.3)
#>  xml2                   1.5.2      2026-01-17 [1] CRAN (R 4.4.3)
#>  xtable                 1.8-4      2019-04-21 [1] CRAN (R 4.4.1)
#>  XVector                0.46.0     2024-11-08 [1] Bioconductor 3.20 (R 4.4.1)
#>  yaml                   2.3.12     2025-12-10 [1] CRAN (R 4.4.3)
#>  zlibbioc               1.52.0     2024-11-08 [1] Bioconductor 3.20 (R 4.4.1)
#>  zoo                    1.8-15     2025-12-15 [1] CRAN (R 4.4.3)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#>  * ── Packages attached to the search path.
#> 
#> ────────────────────────────────────────────────────────────────────────────────────────

Report generated with Quarto · R 4.4.2 · 13 May 2026, 09:36 CEST