projects – Ndoh Penn

Projects

Open-source tools, R packages, Shiny applications, and reproducibility infrastructure — built for biostatisticians and clinical researchers across Bayesian statistics, clinical trial design, biomarker science, explainable AI, and audit-ready workflows.

R Packages

R Package · Org: repro-stats

regulog — Tamper-Evident Audit Logging for R

A hash-chained, session-level audit logging system for regulated R environments. Every action, change, annotation, and signature is recorded in a tamper-evident chain — any post-hoc modification is detectable by verify_log(). Directly covers 21 CFR Part 11 §11.10, §11.100, §11.200 and EU Annex 11 Clauses 9 and 11. Ships with IQ/OQ/PQ qualification scripts and a full Requirements Traceability Matrix.

Function	Purpose
log_action()	Record a discrete analytical event with mandatory reason
log_change()	Document before/after field modifications
log_note()	Annotate decisions, outlier rationale, query resolutions
log_signature()	Named, dated electronic sign-off per §11.100/§11.200
with_log()	Auto-log all data I/O in a scoped block
verify_log()	Recompute SHA-256 chain; report first broken entry
filter_log()	Query entries by type, user, action, or date range
export_audit_trail()	Export signed CSV or JSON for submission

R Package repro-stats org 21 CFR Part 11 EU Annex 11 Audit Trail IQ/OQ/PQ Shiny

📦 Package site GitHub →

R Package · Org: repro-stats

lineager — Row-Level Data Provenance and Exclusion Tracking

Tags every row with a unique lineage ID (.__lid__) that survives filters, joins, and derivations. Every row removal requires a documented reason. CDISC-aware: USUBJID embedded in lineage IDs; population flags and SDTM-to-ADaM derivations registerable via lg_population() and lg_spec(). Generates CDISC Reviewer's Guide-aligned HTML provenance reports.

Function	Purpose
lg_tag()	Assign row-level lineage IDs to source datasets
lg_filter()	Tracked filter with mandatory exclusion reason
lg_derive()	Tracked mutate with documented description
lg_join()	Tracked join with bilateral row-ID tracing
lg_trace()	Trace any subject across the complete pipeline
lg_exclusions()	Full exclusion registry with reasons and populations
lg_disposition()	CONSORT-style disposition table
lg_lineage() / lg_plot()	Pipeline lineage graph (Graphviz DOT)
lg_report()	Self-contained HTML provenance report

R Package repro-stats org CDISC ADaM Data Provenance Exclusion Tracking Reviewer's Guide

📦 Package site GitHub →

R Package · Shiny App · CRAN

bayprior — Bayesian Prior Elicitation for Clinical Trials

An advanced R package and Shiny application for Bayesian prior elicitation, conflict diagnostics, and sensitivity analysis in clinical trials. Addresses the upstream problem that existing Bayesian trial packages largely ignore: how to construct, validate, and justify a prior to a regulator — directly aligned with the FDA's 2026 draft guidance on Bayesian methods.

Module	What it does
Prior Elicitation	SHELF roulette, quantile & moment matching across Beta, Normal, Gamma, Log-Normal, Exponential, Weibull
Expert Pooling	Linear & logarithmic pooling with Bhattacharyya diagnostics
Conflict Diagnostics	Box p-value, surprise index, KL divergence for binary, continuous, Poisson, and survival data
Sensitivity Analysis	Hyperparameter grids, tornado plots, influence heatmaps
Robust & Power Priors	Sceptical, robust mixture (Schmidli et al.), and power priors (Ibrahim–Chen)
Regulatory Reports	Self-contained HTML, PDF, or Word reports aligned with FDA/EMA expectations, rendered via Quarto

R Package CRAN Shiny Bayesian Methods Clinical Trials Prior Elicitation FDA/EMA golem

▶ Live App Docs → GitHub →

R Package · Org: repro-stats · CRAN

reproducr — Behavioural Reproducibility Auditing for R

reproducr makes the hidden risks of long-lived R analyses visible and trackable. Package updates can silently change function behaviour; stochastic code without a fixed seed produces different results on every run; locale-sensitive operations behave differently across systems. reproducr surfaces all of this — before results reach a journal, a regulator, or a collaborator. Works with or without renv; no configuration required.

Three-tier workflow
Function	Tier	Purpose
`audit_script()`	1	Extract all pkg::fn calls with version info; flag missing seeds & locale-sensitive ops
`risk_score()`	1	Score calls against a curated breaking-changes database (13 packages, 30+ entries)
`certify()`	2	Hash and store analytical outputs (coefficients, p-values, N) as a signed baseline
`check_drift()`	2	Detect numerical drift between current outputs and any stored baseline
`repro_report()`	3	Render audit report — minimal, academic methods paragraph, or pharma QC with sign-off fields
`repro_badge()`	3	Generate a live shields.io reproducibility badge; CI updates it on every push

R Package CRAN repro-stats org Reproducibility Audit CI/CD Pharma QC renv

📦 Package site GitHub → Breaking-changes DB →

Reproducibility Pipelines — repro-stats gallery

End-to-end reproducr pipelines across four domains — each a complete, independently runnable analysis with CI that audits on every push, certifies outputs, detects drift, and updates the badge automatically.

Pipeline · Clinical Trials / Oncology · renv · Pharma QC report

reproducr-clinical

Full reproducr pipeline for a simulated Phase III oncology trial with survival analysis. Uses renv and outputs a pharma-style QC document with sign-off fields.

repro-statsClinical Trials Survival AnalysisrenvPharma QC

GitHub → DEMO.md →

Pipeline · Real-World Evidence · renv · Academic report

reproducr-rwe

reproducr pipeline for a real-world evidence analysis generating an academic-style methods paragraph suitable for journal submission.

repro-statsRWE EpidemiologyrenvAcademic

GitHub → DEMO.md →

Pipeline · CMC Statistics · renv · Pharma QC report

reproducr-cmc

reproducr pipeline for CMC statistics with pharma QC report output. Aimed at CMC statisticians and regulatory affairs teams.

repro-statsCMC Regulatory AffairsrenvPharma QC

GitHub → DEMO.md →

Pipeline · Ecology · No renv · Minimal setup · Live badge

reproducr-ecology

Minimal reproducr pipeline on Palmer Penguins — no renv, minimal setup, live CI badge. Accessible entry point for general R users.

repro-statsEcology Minimal SetupCI/CD

GitHub → DEMO.md →

Shiny Applications

Shiny App

ClinicalBayes — Bayesian Dynamic Borrowing

Interactive Shiny application for Bayesian dynamic borrowing of historical control data. Supports rMAP, Power Priors, and Commensurate Priors (Stan/CmdStanR) for binary and continuous endpoints.

ShinyBayesian Borrowing rMAPStan/MCMCClinical Trials

▶ Live App GitHub →

Shiny App

PSClinical — Sample Size & Power Toolkit

Sample size and power for clinical trials. Continuous, binary, and survival endpoints under parallel, paired, and one-sample designs with Monte Carlo simulation.

ShinySample Size Power AnalysisSurvivalMonte Carlo

▶ Live App GitHub →

Shiny App

ClinicalXAI — Explainable AI with SHAP

XAI toolkit on DALEXtra for interpreting ML models using SHAP values. Global feature importance, local force plots, and stability checks for regression, classification, and survival models.

ShinyExplainable AI SHAPDALEXMachine Learning

▶ Live App GitHub →

Data Science Platforms

Quarto Website · R · GitHub Pages

Biomarker Data Science Platform

End-to-end clinical trial biomarker analysis pipelines across four therapeutic areas — Immunology (MG/FcRn), Oncology (NSCLC/PD-1), Cardiovascular (FH/PCSK9), and Neurology (Alzheimer's/Anti-Aβ). All data simulated with fixed seeds.

Stage	Methods
Multi-Omics	Transcriptomics QC, batch correction, Welch DE (BH-FDR), Olink NPX proteomics, cross-modal correlation
Longitudinal & Survival	Linear mixed-effects (lme4), Emax PD model, Kaplan–Meier, Cox proportional-hazards
Machine Learning	PCA, UMAP, k-means clustering, elastic-net (glmnet), random forest, OOB AUROC

QuartoBiomarkers Survival AnalysisMixed Models Machine Learninglimmaglmnet

▶ Live Site GitHub →

CDISC & Regulatory

Quarto Report · R · pharmaverse · GitHub Pages

CDISC-meets-R — SDTM to ADaM Pipeline

Full end-to-end CDISC-compliant pipeline using the pharmaverse admiral package. Produces ADSL, ADAE, and ADVS datasets with six TLFs — compliant with CDISC ADaM IG, SDTM IG, and FDA Study Data Technical Conformance Guide.

ADaM Dataset	Key Derivations
ADSL	Treatment variables, population flags, study dates, age groups
ADAE	TEAE flag, seriousness, severity grade, study day
ADVS	Baseline, change from baseline, % change

QuartoCDISC ADaM CDISC SDTMadmiral pharmaversegtsummaryRegulatory

▶ Live Report GitHub →

💡 Most of my applied work is conducted in proprietary pharmaceutical settings and cannot be shared publicly. If you'd like to discuss a specific methodology or explore a collaboration, feel free to get in touch.