Open-source tools, R packages, Shiny applications, and reproducibility infrastructure — built for biostatisticians and clinical researchers across Bayesian statistics, clinical trial design, biomarker science, explainable AI, and audit-ready workflows.
regulog — Tamper-Evident Audit Logging for R
A hash-chained, session-level audit logging system for regulated R environments.
Every action, change, annotation, and signature is recorded in a tamper-evident
chain — any post-hoc modification is detectable by verify_log().
Directly covers 21 CFR Part 11 §11.10, §11.100, §11.200 and EU Annex 11
Clauses 9 and 11. Ships with IQ/OQ/PQ qualification scripts and a full
Requirements Traceability Matrix.
| Function | Purpose |
|---|---|
| log_action() | Record a discrete analytical event with mandatory reason |
| log_change() | Document before/after field modifications |
| log_note() | Annotate decisions, outlier rationale, query resolutions |
| log_signature() | Named, dated electronic sign-off per §11.100/§11.200 |
| with_log() | Auto-log all data I/O in a scoped block |
| verify_log() | Recompute SHA-256 chain; report first broken entry |
| filter_log() | Query entries by type, user, action, or date range |
| export_audit_trail() | Export signed CSV or JSON for submission |
lineager — Row-Level Data Provenance and Exclusion Tracking
Tags every row with a unique lineage ID (.__lid__) that survives
filters, joins, and derivations. Every row removal requires a documented reason.
CDISC-aware: USUBJID embedded in lineage IDs; population flags and SDTM-to-ADaM
derivations registerable via lg_population() and lg_spec().
Generates CDISC Reviewer's Guide-aligned HTML provenance reports.
| Function | Purpose |
|---|---|
| lg_tag() | Assign row-level lineage IDs to source datasets |
| lg_filter() | Tracked filter with mandatory exclusion reason |
| lg_derive() | Tracked mutate with documented description |
| lg_join() | Tracked join with bilateral row-ID tracing |
| lg_trace() | Trace any subject across the complete pipeline |
| lg_exclusions() | Full exclusion registry with reasons and populations |
| lg_disposition() | CONSORT-style disposition table |
| lg_lineage() / lg_plot() | Pipeline lineage graph (Graphviz DOT) |
| lg_report() | Self-contained HTML provenance report |
bayprior — Bayesian Prior Elicitation for Clinical Trials
An advanced R package and Shiny application for Bayesian prior elicitation, conflict diagnostics, and sensitivity analysis in clinical trials. Addresses the upstream problem that existing Bayesian trial packages largely ignore: how to construct, validate, and justify a prior to a regulator — directly aligned with the FDA's 2026 draft guidance on Bayesian methods.
| Module | What it does |
|---|---|
| Prior Elicitation | SHELF roulette, quantile & moment matching across Beta, Normal, Gamma, Log-Normal, Exponential, Weibull |
| Expert Pooling | Linear & logarithmic pooling with Bhattacharyya diagnostics |
| Conflict Diagnostics | Box p-value, surprise index, KL divergence for binary, continuous, Poisson, and survival data |
| Sensitivity Analysis | Hyperparameter grids, tornado plots, influence heatmaps |
| Robust & Power Priors | Sceptical, robust mixture (Schmidli et al.), and power priors (Ibrahim–Chen) |
| Regulatory Reports | Self-contained HTML, PDF, or Word reports aligned with FDA/EMA expectations, rendered via Quarto |
reproducr — Behavioural Reproducibility Auditing for R
reproducr makes the hidden risks of long-lived R analyses visible and trackable.
Package updates can silently change function behaviour; stochastic code without a fixed seed
produces different results on every run; locale-sensitive operations behave differently across
systems. reproducr surfaces all of this — before results reach a journal, a regulator,
or a collaborator. Works with or without renv; no configuration required.
| Three-tier workflow | ||
|---|---|---|
| Function | Tier | Purpose |
audit_script() |
1 | Extract all pkg::fn calls with version info; flag missing seeds & locale-sensitive ops |
risk_score() |
1 | Score calls against a curated breaking-changes database (13 packages, 30+ entries) |
certify() |
2 | Hash and store analytical outputs (coefficients, p-values, N) as a signed baseline |
check_drift() |
2 | Detect numerical drift between current outputs and any stored baseline |
repro_report() |
3 | Render audit report — minimal, academic methods paragraph, or pharma QC with sign-off fields |
repro_badge() |
3 | Generate a live shields.io reproducibility badge; CI updates it on every push |
End-to-end reproducr pipelines across four domains — each a complete, independently runnable
analysis with CI that audits on every push, certifies outputs, detects drift, and updates the badge automatically.
reproducr-clinical
Full reproducr pipeline for a simulated Phase III oncology trial with survival analysis. Uses renv and outputs a pharma-style QC document with sign-off fields.
reproducr-rwe
reproducr pipeline for a real-world evidence analysis generating an academic-style methods paragraph suitable for journal submission.
reproducr-cmc
reproducr pipeline for CMC statistics with pharma QC report output. Aimed at CMC statisticians and regulatory affairs teams.
ClinicalBayes — Bayesian Dynamic Borrowing
Interactive Shiny application for Bayesian dynamic borrowing of historical control data. Supports rMAP, Power Priors, and Commensurate Priors (Stan/CmdStanR) for binary and continuous endpoints.
PSClinical — Sample Size & Power Toolkit
Sample size and power for clinical trials. Continuous, binary, and survival endpoints under parallel, paired, and one-sample designs with Monte Carlo simulation.
ClinicalXAI — Explainable AI with SHAP
XAI toolkit on DALEXtra for interpreting ML models using SHAP values. Global feature importance, local force plots, and stability checks for regression, classification, and survival models.
Biomarker Data Science Platform
End-to-end clinical trial biomarker analysis pipelines across four therapeutic areas — Immunology (MG/FcRn), Oncology (NSCLC/PD-1), Cardiovascular (FH/PCSK9), and Neurology (Alzheimer's/Anti-Aβ). All data simulated with fixed seeds.
| Stage | Methods |
|---|---|
| Multi-Omics | Transcriptomics QC, batch correction, Welch DE (BH-FDR), Olink NPX proteomics, cross-modal correlation |
| Longitudinal & Survival | Linear mixed-effects (lme4), Emax PD model, Kaplan–Meier, Cox proportional-hazards |
| Machine Learning | PCA, UMAP, k-means clustering, elastic-net (glmnet), random forest, OOB AUROC |
CDISC-meets-R — SDTM to ADaM Pipeline
Full end-to-end CDISC-compliant pipeline using the pharmaverse admiral package. Produces ADSL, ADAE, and ADVS datasets with six TLFs — compliant with CDISC ADaM IG, SDTM IG, and FDA Study Data Technical Conformance Guide.
| ADaM Dataset | Key Derivations |
|---|---|
| ADSL | Treatment variables, population flags, study dates, age groups |
| ADAE | TEAE flag, seriousness, severity grade, study day |
| ADVS | Baseline, change from baseline, % change |