Go to content

Some methods of propensity-score matching had superior performance to others: results of an empirical investigation and Monte Carlo simulations


Propensity-score matching is increasingly being used to reduce the impact of treatment-selection bias when estimating causal treatment effects using observational data. Several propensity-score matching methods are currently employed in the medical literature: matching on the logit of the propensity score using calipers of width either 0.2 or 0.6 of the standard deviation of the logit of the propensity score; matching on the propensity score using calipers of 0.005, 0.01, 0.02, 0.03, and 0.1; and 5 –> 1 digit matching on the propensity score.

The authors conducted empirical investigations and Monte Carlo simulations to investigate the relative performance of these competing methods.  Using a large sample of patients hospitalized with a heart attack and with exposure being receipt of a statin prescription at hospital discharge, the investigators found that the eight different methods produced propensity-score matched samples in which qualitatively equivalent balance in measured baseline variables was achieved between treated and untreated subjects. Seven of the eight propensity-score matched samples resulted in qualitatively similar estimates of the reduction in mortality due to statin exposure. 5 –> 1 digit matching resulted in a qualitatively different estimate of relative risk reduction compared to the other 7 methods.

Using Monte Carlo simulations, the authors found that matching using calipers of width of 0.2 of the standard deviation of the logit of the propensity score and the use of calipers of width 0.02 and 0.03 tended to have superior performance for estimating treatment effects.



Austin PC. Biom J. 2009; 51(5):171-84.

Contributing ICES Scientists

Research Programs

Associated Sites