|
|
||||||||
Guest Access | Sign In via User Name/Password |
|||||||||
* From the Departments of Clinical Epidemiology and Biostatistics (Drs. Guyatt and Cook), McMaster University Faculty of Health Sciences, Hamilton, ON, Canada; Polish Institute of Evidence Based Medicine (Dr. Jaeschke), Krakow, Poland; Department of Medicine (Dr. Pauker), Tufts-New England Medical Center, Tufts University School of Medicine, Boston, MA; and Department of Epidemiology (Dr. Schünemann), Italian National Cancer Institute Regina Elena, Rome, Italy.
Correspondence to: Gordon Guyatt, MD, FCCP, McMaster University, Health Sciences Centre, Room 2C12, Hamilton, ON, L8N 3Z5, Canada; e-mail: guyatt{at}mcmaster.ca
| Abstract |
|---|
|
|
|---|
For grading methodologic quality, randomized controlled trials (RCTs) begin as high-quality evidence (designated by "A"), but quality can decrease to moderate ("B"), or low ("C") as a result of poor design and conduct of RCTs, imprecision, inconsistency of results, indirectness, or a high likelihood for reporting bias. Observational studies begin as low quality of evidence (C) but can increase in quality on the basis of very large treatment effects.
Strong (Grade 1) recommendations can be applied uniformly to most patients. Weak (Grade 2) suggestions require more judicious application, particularly considering patient values and preferences and, when resource limitations play an important role, issues of cost.
Key Words: clinical trials metaanalysis practice guidelines
| Introduction |
|---|
|
|
|---|
The American College of Chest Physicians has revised its system for grading recommendations used in evidence-based guidelines.1 This system is being applied to all guidelines for purposes of consistency and comparison with other guidelines. The new grading system is modified from a grading scheme developed by the Grades of Recommendation, Assessment, Development and Evaluation Working Group,23 which assesses the quality of evidence and strength of recommendations. One advantage of the commonality between the American College of Chest Physicians grading system and Grades of Recommendation, Assessment, Development and Evaluation is that other organizations including UpToDate (Waltham, MA), the American College of Physicians, the American Thoracic Society, and the World Health Organization also use one of these schemes.4
Our system of evaluating and presenting recommendations entails an initial assessment of the quality of evidence, followed by judgment about the direction and strength of recommendations. Since clinicians will be most interested in the best course of action, we present the strength of the recommendation first as strong (Grade 1) or weak (Grade 2), followed by the quality of the evidence as high ("A"), moderate ("B"), or low ("C"). Furthermore, we use language for our guidelines that expresses their strength. For strong (Grade 1) recommendations, we say: "We recommend ... (for or against a particular course of action)". For weak (Grade 2) recommendations, we say: "We suggest ... (using or not using)" what we believe to be an optimal management approach. We then specify the methodologic quality with designations of A, B, C. Thus, recommendations can fall into the following categories: 1A, 1B, 1C, 2A, 2B, and 2C (Table 1 ).
|
| Strength of the Recommendation |
|---|
|
|
|---|
When chapter authors are confident that the desirable effects of adherence to a recommendation outweigh the undesirable effects or vice versa, they make a strong recommendation. Such confidence usually requires high-quality evidence that provides precise estimates of both benefits and downsides, and a clear balance in favor, or against, the benefits vs the downsides of an intervention. On occasion, faced with only low-quality evidence, authors may nevertheless make a strong recommendation. For instance, consider the recommendation for routine monitoring of platelet counts in patients receiving heparin whose risk of heparin-induced thrombocytopenia (HIT) is > 1%. Although the evidence of benefit is weak, the early discontinuation of heparin when the platelet count drops may be of appreciable benefit, and the costs and risks of monitoring are negligible. The authors of the chapter addressing HIT in the 2008 guidelines used this rationale to make a strong recommendation. Similarly, when only low-quality evidence supports an experimental intervention with appreciable costs and/or detriments, authors may recommend strongly against use of that intervention. For instance, authors of the HIT guidelines recommended strongly against routine use of HIT antibody testing, with appreciable cost and risk of false-positive results, in patients without clinical evidence to suggest HIT.
Chapter authors offer a weak recommendation when low-quality evidence results in appreciable uncertainty about the magnitude of benefits and/or downsides, or the benefits and downsides are finely balanced. Other reasons for not being confident include imprecise estimates of benefits or harms, uncertainty or variation in how different individuals value the outcomes and thus their preferences regarding management alternatives, small benefits, or situations when benefits may not be worth the costs (including the costs of implementing the recommendation). While the degree of confidence is a continuum and there is no precise threshold between a strong and a weak recommendation, the presence of important concerns about one or more of the above factors supports a weak recommendation.
| Interpreting Strong and Weak Recommendations |
|---|
|
|
|---|
For instance, results from an extremely large high-quality randomized trial suggest that ASA reduces the relative risk of death after myocardial infarction by approximately 25%. Depending on their age and factors such as the presence of heart failure, typical patients with myocardial infarction face risks of death in the first 30 days after infarction of between 2% and 40%.5 We can therefore expect a 0.5% absolute risk reduction (ARR) [from 2 to 1.5%] in the lowest-risk patients and a 10% ARR (from 40 to 30%) in the highest-risk patients. ASA has minimal side effects and is very inexpensive. Because, even in the lowest-risk subgroups, the benefits clearly outweigh the risks, adverse consequences, and costs, administration of ASA is strongly endorsed and widely used. Using letters and numbers to express the quality of the evidence and strength of recommendations (Table 1), both low-risk and high-risk patients would fall within the category of a strong recommendation based on high-quality evidence or Grade 1A (Grade 1 because it the benefits clearly outweigh the downsides, and A because the estimate of benefit comes from high-quality, randomized trials that yielded consistent results).
Thus, a second way for clinicians to interpret strong recommendations is that they provide, for typical patients, a mandate for the clinician to provide a simple explanation of the intervention along with a suggestion that the patient will benefit from its use. Further elaboration will seldom be necessary; however, when clinicians face weak recommendations, they should more carefully consider the benefits, harms and burden in the context of the patient before them, and ensure that the decision is consistent with the patients values and preferences. These situations arise when appreciable numbers of patients, because of variability in values and preferences, will make different choices.
Consider a 40-year-old man who has had an idiopathic deep venous thrombosis (DVT) followed by treatment with adjusted-dose warfarin for 1 year to prevent recurrent DVT and pulmonary embolism.1 Continuing on standard-intensity warfarin beyond 1 year will reduce his absolute risk for recurrent DVT by > 7% per year for several years.6 The burdens of treatment include taking warfarin daily, monitoring the intensity of anticoagulation with blood tests, living with the increased risk of both minor and major bleeding and, for some, experiencing those events. Patients who are minimally concerned about the lifestyle limitations of taking warfarin, or are particularly concerned about recurrent DVT, would consider the benefits of avoiding DVT worth the downsides of taking warfarin. Other patients are likely to consider the benefit not worth the harms and burden.
Individualization of clinical decision making in weak recommendations remains a challenge. Although clinicians should always consider patient values and preferences, weak recommendations dictate more detailed conversations with patients to ensure that the ultimate decision is consistent with the patient values. For patients who are interested, a decision aid that presents patients with both benefits and downsides of therapy is likely to improve knowledge, decrease decision-making conflict, and may promote a decision most consistent with underlying values and preferences.7 Using decision aids for strong recommendations, while still potentially helpful for fully informing patients, is less important, and may be inefficient.
Other ways of interpreting strong and weak recommendations relate to performance or quality indicators. Strong recommendations are candidate performance indicators. For weak recommendations, performance could be measured by monitoring whether clinicians have discussed recommended actions with patients or their surrogates or carefully documented the evaluation of benefits and downsides in the patients chart. Table 2 summarizes several ways of interpreting antithrombotic recommendations.
|
|
The choice of adjusted-dose warfarin vs ASA for prevention of stroke in patients with AF illustrates a number of the factors that will influence the strength of a recommendation. A systematic review9 and metaanalysis found a relative risk reduction (RRR) of 46% in all strokes with warfarin vs ASA. This large effect supports a strong recommendation for warfarin. Furthermore, the relatively narrow 95% confidence interval (RRR, 29 to 57%) suggests that warfarin provides a RRR of at least 29% and further supports a strong recommendation. At the same time, warfarin is associated with burdens that include keeping dietary intake of vitamin K constant, monitoring the intensity of anticoagulation with blood tests, and living with the increased risk of both minor and major bleeding. Most patients, however, are much more stroke averse than they are bleeding averse.9 As a result, almost all patients with high risk of stroke would choose warfarin, suggesting the appropriateness of a strong recommendation.
A patients baseline risk of the adverse outcome (also called control event risk or rate) that an intervention is expected to prevent may prove a key consideration. Consider another 65-year-old patient with AF and no other risk factors for stroke. This individuals risk for stroke in the next year is approximately 2%. Dose-adjusted warfarin can, relative to ASA, reduce the risk to approximately 1%. Some patients who are very stroke adverse may consider the down sides of taking warfarin well worth it. Others are likely to consider the benefit not worth the risks and inconvenience. When, across the range of their values and preferences, fully informed patients are liable to make different choices, guideline panels should offer weak (Grade 2) recommendations.
While it is ideal for clinicians to elicit preferences and values directly from patients, and for guideline panels to obtain values and preference estimates from population-based studies, such studies are often unavailable. When value or preference judgments are particularly important for the interpretation of recommendations, chapter authors have made statements about the key values underlying their recommendations.
As benefits and risks become more finely balanced, or more uncertain, decisions to administer an effective therapy also become more cost sensitive. We have considered cost in only a small proportion of the recommendations in which we considered resource issues particularly important.10
| How Methodologic Quality Contributes to Grades of Recommendation |
|---|
|
|
|---|
|
|
For instance, a randomized trial suggests that danaparoid sodium is of benefit in treating HIT complicated by thrombosis.11 That trial, however, was unblinded, and the key outcome was the clinicians assessment of when the thromboembolism had resolved, which is a subjective judgment.
Unexplained Heterogeneity of Results (Inconsistent Results): When studies yield widely differing estimates of the treatment effect (heterogeneity or variability in results), investigators should look for explanations for that heterogeneity. For instance, drugs may have larger relative effects in sicker populations or when administered in larger doses. When heterogeneity exists but investigators fail to identify a plausible explanation, the quality of evidence decreases. For example, RCTs12 of pentoxifylline in patients with intermittent claudication have shown conflicting results that so far defy explanation. When patient characteristics can explain the heterogeneity, recommendations for patient subgroups will generally differ.
Indirectness of Evidence: The question being addressed in the guideline differs from the available evidence in regards to the population, intervention, comparison, or outcome. Investigators may have undertaken studies in similar, but not identical populations to those under consideration for a recommendation. For example, many of the antithrombotic therapies rigorously tested in randomized trials in adults are also administered to children. The adult trials provide high-quality evidence for adult recommendations, but because of indirectness they represent only moderate- or low-quality evidence for children. Similarly, recommendations in pregnant women rely on randomized trial results in nonpregnant individuals, and recommendations for patients with artificial valves often rely on trial results from other patient groups at high risk for thrombosis. In each of these situations, evidence quality is rated down for indirectness.
Lack of Precision: When studies include few patients and few events and thus have wide confidence intervals, a guideline panel will judge the quality of the evidence lower than it otherwise would because of resulting uncertainty in the results. For instance, a well-designed and rigorously conducted RCT addressed the use of nadroparin, a low-molecular-weight heparin, in patients with cerebral venous sinus thrombosis.13 Of 30 treated patients, 4 patients had a poor outcome, as did 6 of 29 patients in the control group. The analysis suggests a 7% ARR (which, if true, would correspond to a requirement to treat approximately 14 patients to prevent a single poor outcome),13 but the confidence interval also included not only a 26% absolute difference in favor of treatment, but also a 12% difference in favor of placebo.
Reporting Bias: The quality of evidence may be reduced if investigators fail to report studies (typically those that show no effect) or outcomes (typically those that may be harmful or for which no effect was observed), or if other reasons lead to results being withheld. Unfortunately, guideline panels are still required to make guesses about the likelihood of reporting bias. A prototypical situation that should elicit suspicion of reporting bias is when published evidence includes a number of small trials, all of which are industry funded.14 Authors have developed graphical representations that demonstrate asymmetry and likely reporting bias.
Factors That Increase the Quality of Evidence
Observational studies can provide moderate or strong evidence.15 While well-done observational studies will generally yield low-quality evidence, there may be unusual circumstances in which guideline panels classify such evidence as moderate or even high quality (Table 6
).
|
The guides (Table 6) of RRR of 50% and 80% are just that: guides. Some may prefer more stringent criteria, such as a reduction in relative risk—or a reduction in hazard in a time-to-event analysis—of 90% required to move to high-quality evidence.15 Further, we will be more comfortable raising the quality of the evidence when other related criteria—the temporal relation between the exposure and the outcome, the existence of a dose-response gradient—are also met.17
Plausible Bias: On occasion, all plausible biases from observational studies may be working to underestimate an apparent treatment effect. For example, if only sicker patients receive an experimental intervention or exposure yet they still fare better, it is likely that the actual intervention or exposure effect is larger than the data suggest.
Dose-Response Gradient: The presence of a dose-response gradient may also increase our confidence in the findings of observational studies and thereby enhance the assigned quality of evidence. For example, our confidence in the result of observational studies that show an increased risk of bleeding in patients who have supratherapeutic anticoagulation levels is increased by the observation that there is association between progressively higher levels of the international normalized ratio and the increased risk of bleeding.18
| Interpreting the Recommendations |
|---|
|
|
|---|
Similarly, following Grade 1A recommendations will at times not serve the best interests of patients with atypical values or preferences, or whose risks differ markedly from the usual patient. For instance, consider patients who find anticoagulant therapy extremely aversive, either because it interferes with their lifestyle (eg, prevents participation in contact sports), or because of the need for monitoring. Clinicians may reasonably conclude that following some Grade 1A recommendations for anticoagulation for either group of patients will be a mistake. The same may be true for patients with particular comorbidities (eg, a recent GI bleed, repeated falls, or an arteriovenous malformation) or other special circumstances (eg, very advanced age) that put them at unusual risk.
We trust that these observations convey our acknowledgment that no recommendations or clinical practice guidelines can take in to account the often compelling unique features of individual clinical circumstances. No clinician, and no body charged with evaluating a clinicians actions, should attempt to apply our recommendations in rote or blanket fashion.
| Summary |
|---|
|
|
|---|
| Conlict of Interest Disclosures |
|---|
|
|
|---|
Dr. Schünemann reports no personal payments from for-profit organizations, but he has received research grants and/or honoraria that were deposited into research accounts or received by a research group that he belongs to from AstraZeneca (research grant, honoraria), Amgen (research grant), Barilla (research grant), Chiesi Foundation (honorarium), Lily (honorarium), Pfizer (research grant, honorarium), Roche (honorarium), and UnitedBioSource (honorarium) for development or consulting regarding quality-of-life instruments for chronic respiratory diseases and as lecture fees related to the methodology of evidence-based practice guideline development and/or research methodology. He is documents editor for the American Thoracic Society and senior editor of the American College of Chest Physicians Antithrombotic and Thrombolytic Therapy Guidelines, and both organizations receive funding from for-profit organizations. Other institutions or organizations that he is affiliated with likely receive funding from for-profit sponsors that are supporting infrastructure and research that may serve his work.
Dr. Cook discloses that she received grant monies from the Canadian Institutes for Health Research and a dalteparin donation for a peer-review funded trial by the Canadian Institutes for Health Research.
Dr. Jaeschke discloses that he has received honoraria and travel support from AstraZeneca, GlaxoSmithKline, MSD, and Boehringer Ingelheim. He is also the deputy editor of a medical journal that is partially financed by industry advertisement and projects.
Dr. Pauker reveals no real or potential conflicts of interest or commitment.
| Footnotes |
|---|
Accepted for publication December 20, 2007.
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |