How to Read a Readmission Intervention Study – for Non-Statisticians
Here are a few guidelines to help you better navigate those technical, quantitative studies; to pull out the most important takeaways; and to see through statements that can be misleading.
1. Think about how generalizable the results are. The more narrow the sample being studied, the more one must sift through considerations about generalizability. For example, the study may focus on:
- a particular region or state
- a particular type of hospital
- acute care vs. long-term care
- teaching hospital vs. “community” hospital
- urban vs. suburban vs. rural
- patients with
- a limited set of diagnoses
- a limited age range
- a limited socio-economic range or limited types of insurance
Just because Method A worked with elderly heart failure patients in an elite urban teaching hospital, will it work with your population?
2. Ask how well the authors have demonstrated cause and effect. Most of us are very good at coming up with critiques about this when we consciously think about it. But it’s terribly easy to get caught up in a narrative and to forget — especially if the topic is one of importance to us. So remember that correlation does not prove causation; an excellent reminder of this is the cartoon at http://xkcd.com/552/. Another pithy example: “The more fire trucks that arrive on the scene, the greater the fire’s damage.”
Remember that a study that incorporates some sort of control for confounding factors (or that uses statistical adjustment to “level the playing field”) will come closer to demonstrating causality than one that doesn’t. And remember that there are many levels and flavors of controls:
- The purely design-based. The gold standard for design-based control is the randomized experiment. This approach is almost universally effective at controlling for nuisance variables (underlying factors) so that the question of interest can be examined in a valid way.
- The purely statistical. One might control after the fact through regression or analysis of variance or even a more advanced method such as propensity scoring. None of these methods is foolproof; teasing out causality always requires thought as opposed to simply applying some procedure or recipe by rote. And just because someone has used a more sophisticated method doesn’t mean they have better accounted for causality issues. Trust your instincts: if you think an author has failed to account for some factor that matters, she probably has. The burden is on each researcher to demonstrate how well causality has been addressed.
3. Not all “significant” results are alike. You may have been encouraged to check a study’s sample size as a way to evaluate its reliability. You may also have learned to check for indications of statistical significance, usually “p-values.” And yes, it’s true that smaller p-values do point to more significant findings — those that would scarcely occur by chance alone. “P = .002″ means that the result obtained would only occur 2 in 1000 times if chance alone were the cause. “P = .002” spells greater significance than “p = .02″, which in turn is more significant than “p = .2″. But what you may not have been told is the value of considering sample size and significance together.
Statements of statistical significance can be very misleading if sample size is not taken into account. Large samples allow a better test of significance. A study performed on a sample that is too small will have a low chance of showing significance. (Statistician-geekizoids like to call this “low statistical power.”) On the other hand, if the phenomenon under question truly exists, research using large samples ought to be able to show great significance. What if it doesn’t?
A recent study examined the extent to which an initiative decreased utilization. It did so chiefly using the very sensitive outcome measure of dollars and cents. The authors compared 321 patients receiving the intervention with 321 controls. If the intervention had been very effective, this sample was large enough — and the outcome measures sensitive enough — to attest to that, by producing a definitive p-value such as .001. Yet the most significant result among a half dozen outcomes tested yielded a p of .01. Some findings that the authors considered “suggestive of a benefit” or “in the right direction” had values as high as .55. This should raise some eyebrows. It means that, even with 642 patients studied, one would only give roughly even odds (55/45) for the intervention showing a utilization difference in the wider population. So: If the sample is large, and the outcome measure sensitive, expect a definitive p. Otherwise the study’s writeup probably overstates the likelihood of the treatment effect.
4. Small effects require large samples. Above, we discussed utilization charges, where a strong intervention would decrease the mean by several thousand dollars. Here, let’s consider readmission rates, which are measured on an entirely different scale. Most quality officers and care management teams would be delighted to see their readmission rate drop from, say, 10% to 9%. Even a change of this size can affect a hospital’s bottom line, among many other things. But whether you are using a basic chi-square test or an advanced regression technique, sample sizes that work for explaining other effects may well be inadequate when studying something like readmission rate. And so we find many studies that use a seemingly robust set of several hundred patients but that fail to reach statistical significance. These “underpowered” studies, unfortunately, may have very little to say either way about the treatment in question.
What sort of pre- and post- sample sizes would be needed to demonstrate that a decrease from 10% to 9% is statistically significant, using the common significance level of .05 (i.e., with 95% confidence)? The answer can be found in the table below, and you will obtain the same answer from any statistical software or online power calculator (e.g., MedCalc’s).
|Sample Size (# of pts. in each group, pre- and post-treatment)||P-value for a readmission-rate decrease from 10% to 9%|
Thus it would take 6700 x 2 or 13,400 patients total before one could say that this single-point reduction was statistically significant. That’s almost 2 years’ discharges for the average US hospital.
On the other hand, an analysis like this can tell us just how large a reduction would be necessary to reach statistical significance if we were limited to, say, 1,000 patients per group. That original 10% readmission rate would need to cut to 7.5%, an extraordinary 25% relative reduction.