Based on a motivating example in non-communicable disease epidemiology, we generated a dataset with 1,000 observations to contextualize the effect of conditioning on a collider. Nearly 1 in 3 Americans suffer from high blood pressure and more than half do not have it under control [1]. Increased levels of systolic blood pressure over time are associated with increased cardio-vascular morbidity and mortality [2]. Summative evidence shows that exceeding the recommendations for 24-hour dietary sodium intake in grams (gr) is associated with increased levels of systolic blood pressure (SBP) in mmHg [3]. Furthermore, with advancing age, the kidney undergoes several anatomical and physiological changes that limit the adaptive mechanism responsible for maintaining the composition and volume of the extracellular fluid. These include a decline in glomerular filtration rate and the impaired ability to maintain water and sodium homeostasis in response to dietary and environmental changes [4]. Likewise, age is associated with structural changes in the arteries and thus SBP [2]. Age is a common cause of both high SBP and impaired sodium homeostasis. Thus age acts as a confounder for the association between sodium intake and SBP (i.e. age is on the back-door path between sodium intake and SBP). However, high levels of 24-hour excretion of urinary protein (proteinuria) are caused by sustained high SBP and increased 24-hour dietary sodium intake. Therefore, proteinuria (PRO in the DAG) acts as a collider via the path SOD -> PRO <- SBP.
The data generation for the simulation is based on the structural relationship between the variables depicted on the Directed Acyclic Graph. We simulated 24-hour excretion of urinary protein as a function of age, SBP, and sodium intake. We assured that the range of values of the simulated data was biologically plausible and as close to reality as possible [5, 6].
alpha1 (effect of SOD on PRO) and alpha2 (effect of SBP on PRO) are parameters you can modify in 'Collider Visualization'.
generateData <- function(n, seed, beta1, alpha1, alpha2){
set.seed(seed)
Age_years <- rnorm(n, 65, 5)
Sodium_gr <- Age_years / 18 + rnorm(n)
sbp_in_mmHg <- beta1 * Sodium_gr + 2.00 * Age_years + rnorm(n)
Proteinuria_in_mg <- alpha1 * Sodium_gr + alpha2 * sbp_in_mmHg + rnorm(n)
data.frame(sbp_in_mmHg, Sodium_gr, Age_years, Proteinuria_in_mg)
}