Clinical Trial Analysis: Full End-to-End SDTM to ADaM Pipeline
CDISC-Compliant ADaM Dataset Construction and TLF Generation
Author
Ndoh Penn
Published
April 17, 2026
1. Setup and Environment
Introduction
This document presents a full end-to-end clinical trial analysis pipeline, covering:
SDTM Data Loading — Demographics (DM), Adverse Events (AE), Exposure (EX), Disposition (DS), Vital Signs (VS)
ADaM Dataset Construction — ADSL, ADAE, ADVS
Quality Checks — Record counts, population flags, missing data summaries
Tables, Listings, and Figures (TLFs) — Demographic summary, AE incidence table, Kaplan–Meier survival curve, vital signs over time
All derivations follow CDISC ADaM Implementation Guide conventions using the admiral package from the pharmaverse ecosystem.
2. Data Loading
In a production environment, SDTM .xpt files are loaded from a validated data transfer location. Here we use the admiral.test sample datasets to ensure full reproducibility.
ADSL contains one row per subject and is the foundation for all subsequent ADaM datasets.
3.1 Treatment Variables
Show Code
adsl <- dm |># ---- Planned & Actual Treatment from first non-zero dose record ---- admiral::derive_vars_merged(dataset_add = ex,by_vars =exprs(USUBJID),order =exprs(EXSEQ),mode ="first",new_vars =exprs(TRT01P = EXTRT, TRT01A = EXTRT),filter_add = EXDOSE >0 )
3.2 Safety Population Flag
Show Code
adsl <- adsl |># SAFFL = "Y" if subject received at least one dose admiral::derive_var_merged_exist_flag(dataset_add = ex,by_vars =exprs(USUBJID),new_var = SAFFL,condition = EXDOSE >0 )
3.3 Intent-to-Treat Population Flag
Show Code
adsl <- adsl |># ITTFL = "Y" for all randomised subjects (ARM is not screen failure / not missing) dplyr::mutate(ITTFL = dplyr::if_else(!is.na(ARM) & ARM !="Screen Failure","Y", "N",missing ="N" ) )
3.4 Study Dates and Duration
Show Code
adsl <- adsl |># Convert ISO 8601 character dates to SAS-style numeric date variables admiral::derive_vars_dt(new_vars_prefix ="TRTS",dtc = RFSTDTC # Reference Start Date (first dose) ) |> admiral::derive_vars_dt(new_vars_prefix ="TRTE",dtc = RFENDTC # Reference End Date (last dose) ) |># Duration on treatment (days) dplyr::mutate(TRTDURD =as.numeric(TRTEDT - TRTSDT) +1L )
3.5 Baseline Age Group
Show Code
adsl <- adsl |> dplyr::mutate(AGEGR1 = dplyr::case_when( AGE <65~"<65", AGE >=65& AGE <75~"65–74", AGE >=75~">=75",TRUE~NA_character_ ),AGEGR1 =factor(AGEGR1, levels =c("<65", "65–74", ">=75")) )
ADAE contains one row per adverse event per subject and is the primary dataset for safety analyses.
4.1 Merge Subject-Level Variables and Derive Analysis Dates
Show Code
adae <- ae |># Merge treatment and population flags from ADSL admiral::derive_vars_merged(dataset_add = dplyr::select(adsl, USUBJID, TRT01P, TRT01A, SAFFL, TRTSDT),by_vars =exprs(USUBJID) ) |># Restrict to safety population dplyr::filter(SAFFL =="Y") |># AE start date (ISO 8601 → numeric) admiral::derive_vars_dt(new_vars_prefix ="AST",dtc = AESTDTC ) |># AE end date admiral::derive_vars_dt(new_vars_prefix ="AEN",dtc = AEENDTC ) |># Study day of AE onset relative to first dose dplyr::mutate(ASTDY =as.integer(ASTDT - TRTSDT) +1L )
4.2 Derive Severity and Seriousness Flags
Show Code
adae <- adae |> dplyr::mutate(# Serious AE flagAESERF = dplyr::if_else(AESER =="Y", "Y", "N", missing ="N"),# Grade 3+ severity flag (CTCAE-style: map AESEV text to numeric grade)AESEVN = dplyr::case_when( AESEV =="MILD"~1L, AESEV =="MODERATE"~2L, AESEV =="SEVERE"~3L,TRUE~NA_integer_ ),CTC3FLG = dplyr::if_else(AESEVN >=3L, "Y", "N", missing ="N"),# Treatment-emergent flag: AE onset on or after first dose dateTRTEMFL = dplyr::if_else(!is.na(ASTDT) &!is.na(TRTSDT) & ASTDT >= TRTSDT,"Y", "N", missing ="N" ) )
4.3 ADAE Quality Check
Show Code
cat("========== ADAE Quality Check ==========\n")
========== ADAE Quality Check ==========
Show Code
cat("Total AE records: ", nrow(adae), "\n")
Total AE records: 890
Show Code
cat("Subjects with AEs: ", n_distinct(adae$USUBJID), "\n")
Figure 1. Incidence of TEAEs by System Organ Class and Treatment Group
6.5 Figure: Vital Signs — Mean Change from Baseline Over Time
Show Code
# Bin study days into nominal weeksadvs_sysbp <- advs |> dplyr::filter(PARAMCD =="SYSBP", !is.na(CHG), !is.na(TRT01P)) |> dplyr::mutate(WEEK = dplyr::case_when( ADY <=0~"Baseline", ADY <=14~"Week 2", ADY <=28~"Week 4", ADY <=56~"Week 8", ADY <=84~"Week 12",TRUE~"Week 12+" ),WEEK =factor(WEEK, levels =c("Baseline", "Week 2", "Week 4", "Week 8", "Week 12", "Week 12+")) ) |> dplyr::group_by(TRT01P, WEEK) |> dplyr::summarise(mean_chg =mean(CHG, na.rm =TRUE),se_chg =sd(CHG, na.rm =TRUE) /sqrt(dplyr::n()),n = dplyr::n(),.groups ="drop" )ggplot2::ggplot( advs_sysbp, ggplot2::aes(x = WEEK, y = mean_chg, colour = TRT01P, group = TRT01P) ) + ggplot2::geom_hline(yintercept =0, linetype ="dashed", colour ="grey60") + ggplot2::geom_line(linewidth =1) + ggplot2::geom_point(size =3) + ggplot2::geom_errorbar( ggplot2::aes(ymin = mean_chg - se_chg, ymax = mean_chg + se_chg),width =0.2 ) + ggplot2::scale_colour_manual(values =c("#2E75B6", "#ED7D31", "#70AD47"),name ="Treatment" ) + ggplot2::labs(x ="Study Visit",y ="Mean Change from Baseline (mmHg)",title ="Systolic Blood Pressure: Mean Change from Baseline",subtitle ="Safety Population — Mean ± SE" ) + ggplot2::theme_minimal(base_size =12) + ggplot2::theme(legend.position ="bottom",plot.title = ggplot2::element_text(face ="bold"),axis.text.x = ggplot2::element_text(angle =30, hjust =1) )
Figure 2. Mean Change from Baseline in Systolic Blood Pressure Over Study Weeks
6.6 Figure: Kaplan–Meier Time to First TEAE
Show Code
# Build subject-level time-to-first-TEAE datasetfirst_ae <- adae |> dplyr::filter(TRTEMFL =="Y", !is.na(ASTDY)) |> dplyr::group_by(USUBJID) |> dplyr::slice_min(ASTDY, n =1, with_ties =FALSE) |> dplyr::ungroup() |> dplyr::select(USUBJID, TIME = ASTDY, EVENT = TRTEMFL)# All safety-pop subjects; those without an AE are censored at TRTDURDkm_data <- adsl |> dplyr::filter(SAFFL =="Y", !is.na(TRT01P)) |> dplyr::select(USUBJID, TRT01P, TRTDURD) |> dplyr::left_join(first_ae, by ="USUBJID") |> dplyr::mutate(TIME = dplyr::coalesce(TIME, TRTDURD, 1L),EVENT = dplyr::if_else(EVENT =="Y", 1L, 0L, missing =0L),TIME =pmax(TIME, 1L) )km_fit <- survival::survfit( survival::Surv(TIME, EVENT) ~ TRT01P,data = km_data)survminer::ggsurvplot( km_fit,data = km_data,risk.table =TRUE,pval =TRUE,conf.int =TRUE,xlab ="Days Since First Dose",ylab ="Probability of Remaining TEAE-Free",title ="Time to First TEAE — Safety Population",legend.title ="Treatment",palette =c("#2E75B6", "#ED7D31", "#70AD47"),ggtheme = ggplot2::theme_minimal(base_size =12))
Figure 3. Kaplan–Meier Estimate of Time to First Treatment-Emergent Adverse Event
7. Final Dataset Export
Show Code
# In a validated production environment, export as SAS Transport (.xpt) format# using haven::write_xpt() for submission to regulatory agencies (FDA, EMA).## Example:# haven::write_xpt(adsl, path = "data/adam/adsl.xpt", version = 5, name = "ADSL")# haven::write_xpt(adae, path = "data/adam/adae.xpt", version = 5, name = "ADAE")# haven::write_xpt(advs, path = "data/adam/advs.xpt", version = 5, name = "ADVS")cat("=====================================================\n")
This report has demonstrated a complete, CDISC-compliant end-to-end pipeline using the pharmaverse admiral package:
Phase
Datasets
Key Deliverables
Data Loading
DM, AE, EX, DS, VS
Domain inventory table
ADaM Construction
ADSL, ADAE, ADVS
Treatment vars, population flags, dates, baseline
Quality Checks
All
Record counts, duplicates, missing data
TLFs
ADSL + ADAE + ADVS
Demographics table, AE incidence, KM curve, vital signs
Export
ADSL, ADAE, ADVS
Ready for write_xpt() submission packages
All datasets are audit-ready and reproducible. Next steps would include ADLB (Laboratory), ADTTE (Time-to-Event), statistical model outputs, and integration into an RTF/PDF regulatory submission package.