Skip to main content

A data-generation process for data with specified risk differences or numbers needed to treat

Austin PC. Commun Stat Simul Comput. 2010; 39(3):563-77. Epub 2010 Feb 19.


Monte Carlo simulation methods are increasingly being used to evaluate the performance of statistical methods and estimators. However, the utility of these methods depends upon the existence of appropriate data-generating processes. Clinical commentators have suggested that the risk difference and the associated number needed to treat (NNT) are important measures of treatment effect when outcomes are binary. While these quantities are easily estimated in randomized controlled trials, there is an increasing interest in methods to estimate these quantities using observational or non-randomized data. However, the lack of a data-generating process for simulating data in which treatment induces a specified risk difference hinders the systematic examination of the performance of these methods.

In the current study, the authors describe and evaluate the performance of a data-generating process for simulating data in which treatment induces a specified risk difference. The process is based upon an iterative process of evaluating marginal risk differences using Monte Carlo integration. The proposed data-generating process is flexible and can easily incorporate different distributions for baseline covariates and different levels of the baseline risk of the event. The data-generating process can also be easily modified to simulate data in which treatment induces a specified relative risk.

Keywords: Research and statistical methods

×