In one of my previous blogs I started a review of Dr.Ornish's Lifestyle Heart Trial1. At that point, my goal was to compare the claims of Dr.Greger to the results and conclusions of the authors of this seminal study. They didn't match and just reviewing the abstract was sufficient to show it. At the same time, this study is so important that I promised to do a clinical appraisal of the full report, which I'm going to do in this blog.
I'll start with a quick recap based on the review of the study abstract – the study compared the changes in coronary artery stenosis in two groups of patients with coronary artery disease – one of them received a comprehensive lifestyle intervention that included vegetarian diet, stress management training, smoking cessation and moderate exercise whereas another received just general medical advice. The study lasted 1 year and was a randomized controlled clinical trial. It was not blinded and I had some reservations about the nature of randomization as subjects were divided into the two study groups very unevenly (28 and 20 subjects, respectively, whereas they should be closer to 24 per group). Also, the abstract did not include any statistical significance indicators, which is also a bit of a red flag. So, without further a do, let's dig into the actual paper and reflect on its key elements using the PICO (patients/population, intervention, control/comparison, outcome) approach2 to make my review more structured.
The subjects of the study were patients with one-, two- or three-vessel coronary artery disease, aged 35-75 years, of both sexes, residents of San Francisco area, with no other life-threatening illnesses and no recent myocardial infarction (within 6 weeks), not receiving lipid-lowering drugs. Researchers identified 193 patients as potential study subjects, but only 94 of them met the eligibility criteria. I have no issues with the eligibility criteria or the selection process prior to the consent and randomization part.
At this point normally researchers first ask for informed consent and then enroll and randomize them, but in this particular study researchers changed the course of events for some reason – they first randomized these 94 patients into two groups – 53 were randomly assigned to the experimental group and 43 were to become controls [sic]– and only after that they asked the patients to participate in a specific arm of the study, which resulted in such an uneven distribution of subjects in experimental and control groups – 28 (53% agreed) and 20 (42% agreed). I find it peculiar and, quite honestly, I think that at this point we cannot call this study a randomized clinical trial anymore as there were factors affecting the decision to participate in a specific study arm and, thus, the selection wasn't truly random3. Of course, now we see what happened with the uneven sizes of the study groups.
Also, if you haven't noticed, 53 plus 43 equals 96, not 94 as it's reported in the article. Though it seems to be an obvious error, there are lots of little inaccuracies throughout the text and I wonder how the Lancet accepted this. To be honest, this kind of errors are inacceptable in any kind of scientific publication let alone one of the most influential medical journals.
Baseline characteristics of the patients in two groups were presented in the tables 1 and 2 and, again, there are a couple of questions I have in relation to the way they were presented. First, of all, the number of subjects in the tables was different from 28 and 20 enrolled into the study. It looks like researchers had further decreased the sample size to 22 and 19 subjects respectively due to the fact that they didn't have follow-up data for 6 patients in experimental group and 1 patient in control group. It looks like there were much more adverse events and dropouts in the experimental group (one patient died) compared to control group and it was not taken into consideration and not adjusted for by intention-to-treat4 or any other appropriate statistical approach. Second problem is that researchers did not report the statistical significance of the differences between the two groups at baseline. Technically, they are not significant (I checked), but first of all you have to report that anyways. Most importantly, some differences seemed to be large enough e.g. there was only 1 woman in the experimental group and 4 in the control group, subjects in experimental group on average were more than 10 kg heavier than controls. It looks like the only reason these differences were not statistically significant is the sample size, which by the way is quite small, but I'll touch on that a bit later.
Intervention was well-described and included quite a comprehensive approach – patients were switched to a vegetarian diet (ovo-lacto vegetarian from what we know), they were trained to cope with stress and were supposed to use relaxation techniques for 1 hour a day, smoking cessation intervention was performed (though there was only 1 smoker in the group at baseline) and moderate exercise was prescribed. Notably, the researchers went beyond just the vegetarian diet – they also limited consumption of salt, saturated fat, cholesterol (5 mg/day or less [sic]), caffeine (completely eliminated) and alcohol. Also patients received vitamin B12 supplements and some prepackaged food rations were given to patients to take home. Patients in the experimental group were also provided with twice-weekly 4-hour long group discussions led by a psychologist. I was surprised to see a daily limit of 5 mg/day of dietary cholesterol as it's an extremely low amount – even 1 cup of low-fat yogurt would likely have more than that, so it appears somewhat unrealistic given that patients were allowed to have a cup of yogurt per day.
In line with the previous discussion of Dr.Greger's claims, I would like to emphasize that this intervention was much more comprehensive than just a vegetarian diet and even if we focus on the diet alone, it's more than just going vegetarian, there are quite a few other elements to it, so claims that the vegetarian diet reverses heart disease cannot be made based on this study. Notably, authors didn't make such claims, at least in this paper.
There is virtually no description of the control condition, which is unacceptable by modern clinical trial reporting standards. Apparently, it was ok with the Lancet in 1990. If I were a managing editor for this study, at this point I would send this study back to authors with a request for major revisions, but since it's not in my power, let's talk about the outcomes.
The authors reported quite a few primary and secondary measures, as any good researcher would do – we always try to get as much data as possible from any study. Many of the outcomes were not statistically significant – I'll talk about the statistical methods used in a bit. Some items were quite odd to begin with, for example adherence to the intervention – given that there effectively was no intervention in the control group, there is simply nothing to compare. Or the authors wanted to show that having 4-hour support sessions twice a week with a psychologist increases adherence? As compared to the adherence to the intervention that you did not provide?
In any case I would prefer not to spend too much time talking about the secondary outcomes – there are lots of them and it would take a while going over everything, especially given that there is no mentioning of them in the original hypothesis as well as the hypothesis itself, which is also a must for a good clinical trial.
At this point I would like to proceed to the primary outcome – the changes in the coronary artery stenosis (again, it's not stated explicitly in the text) – but just before that we need to talk about the general choice of statistical methods and I have a huge point to make here.
First of all, the sample sizes were quite small – effectively 22 and 19 subjects and in order to justify the use t-tests and ANOVA as authors did, they had to demonstrate that the variables that they compared were normally distributed. It's simply one of the major assumptions5 for these tests and I am not sure if it was met or even checked. The thing is, if this assumption is not met, we cannot use the t-test or ANOVA, period, there other tests that must be used6.
Second problem that I see is the use of mixed-model analysis of variance (mixed-effects ANOVA). The rationale for this method is that if we have more than 2 groups of subjects or more than 2 observations for each group, we can't use t-test – even if we use multiple pair-wise comparisons, that would increase the likelihood of finding an effect where it doesn't exist. So, in such cases we indeed must use the analysis of variance or ANOVA7. The method is obviously great, but it requires a series of post-hoc tests. In this particular study we have 2 groups of subjects with before and after data, so the before and after data are dependent as they were collected in the same cohorts of people and the between group data are independent, which complicates the analysis and forces us to use mixed models. In such scenario, we can use much simpler and more elegant method – we can compare the before-after differences between two groups using one t-test for independent measures. No complex equations, no mixed-models, no need for post-hoc tests – just one clean test!
Speaking of post-hoc testing – all ANOVA can tell us is that one or more means doesn't fit the overall pattern, or is statistically different from the others. It doesn't tell us which one – in order to figure it out we need to use post-hoc tests and authors didn't do (or reported) any. They also haven't reported the F-statistics, degrees of freedom etc. for each ANOVA test. The results are not presented the way they should be and effectively they are completely meaningless without post-hoc tests.
Finally, the key outcome – the size of the stenosis: the authors reported a decrease of the diameter of stenosis from 40.0 (16.9)% to 37.8 (16.5)% in the experimental group and its increase from 42.7 (15.5)% to 46.1 (18.5)% in the control group with p<0.001. Sounds like an excellent result (because it is), but from a pure scientific prospective the effect size is quite small – given the mean diameter of a coronary artery being around 3-4 mm, the overall change is comparable to the measurement error. To sum it up: authors use inappropriate statistical methods and reporting – they don't check assumptions, they use mixed-models ANOVA instead of a simple t-test in a 2x2 scenario, and most importantly – they don't conduct post-hoc tests.
I would like to start with congratulating Dr.Ornish with completing this study – with all its flaws, it's still a milestone. At the same time, this study simply doesn't reach the threshold for excellence and has so many statistical issues that I cannot trust the results. It does show a signal, but it cannot separate it from the noise. Ideally, it would be great to replicate it with a larger sample and better reporting.
I hope that this appraisal piece helped you to see the true value of this clinical trial. I will definitely work on more studies. It will be fun, I promise – stay tuned! Please let me know if there are specific studies, you'd like me to go over in this series and, as always, feel free to ask questions, to make comments, to explore my website and to subscribe to my YouTube channel for updates.
1. Ornish D, Brown SE, Scherwitz LW, et al. Can lifestyle changes reverse coronary heart disease? The Lifestyle Heart Trial. Lancet (London, England). 1990;336(8708):129-133.
2. Aslam S, Emmanuel P. Formulating a researchable question: A critical step for facilitating good clinical research. Indian journal of sexually transmitted diseases and AIDS. 2010;31(1):47-50.
3. Kim J, Shin W. How to do random allocation (randomization). Clinics in orthopedic surgery. 2014;6(1):103-109.
4. Gupta SK. Intention-to-treat concept: A review. Perspect Clin Res. 2011;2(3):109-112.
5. Ghasemi A, Zahediasl S. Normality tests for statistical analysis: a guide for non-statisticians. International journal of endocrinology and metabolism. 2012;10(2):486-489.
6. Kim HY. Statistical notes for clinical researchers: Nonparametric statistical methods: 1. Nonparametric methods for comparing two groups. Restorative dentistry & endodontics. 2014;39(3):235-239.
7. McHugh ML. Multiple comparison analysis testing in ANOVA. Biochemia medica. 2011;21(3):203-209.