Ranking system used in evidence-based practices to describe the strength of the results measured in a clinical trial or research study. The design of the study (such as a case report for an individual patient or a double-blinded randomized controlled trial) and the endpoints measured (such as survival or quality of life) affect the strength of the evidence. Levels of evidence range from I-IV.
Ia - Evidence from Meta-analysis of Randomized Controlled Trials
Ib - Evidence from at least one Randomized Controlled Trial
IIa - Evidence from at least one well designed controlled trial which is not randomized
IIb - Evidence from at least one well designed experimental trial
IV - Evidence from a panel of experts
The U.S. Preventive Services Task Force uses:
Level A: Good scientific evidence suggests that the benefits of the clinical service substantially outweigh the potential risks. Clinicians should discuss the service with eligible patients.
Level B: At least fair scientific evidence suggests that the benefits of the clinical service outweighs the potential risks. Clinicians should discuss the service with eligible patients.
Level C: At least fair scientific evidence suggests that there are benefits provided by the clinical service, but the balance between benefits and risks are too close for making general recommendations. Clinicians need not offer it unless there are individual considerations.
Level D: At least fair scientific evidence suggests that the risks of the clinical service outweighs potential benefits. Clinicians should not routinely offer the service to asymptomatic patients.
Level I: Scientific evidence is lacking, of poor quality, or conflicting, such that the risk versus benefit balance cannot be assessed. Clinicians should help patients understand the uncertainty surrounding the clinical service.
A system was developed by the GRADE working group and takes into account more dimensions than just the quality of medical research.[
It requires users of GRADE (short for Grading of Recommendations Assessment, Development and Evaluation) who are performing an assessment of the quality of evidence, usually as part of a systematic review, to consider the impact of different factors on their confidence in the results. Authors of GRADE tables, grade the quality of evidence into four levels, on the basis of their confidence in the observed effect (a numerical value) being close to what the true effect is. The confidence value is based on judgements assigned in five different domains in a structured manner.
The GRADE working group defines 'quality of evidence' and 'strength of recommendations' based on the quality as two different concepts which are commonly confused with each other.
Systematic reviews may include Randomized Controlled trials that have low risk of bias, or, observational studies that have high risk of bias. In the case of Randomized controlled trials, the quality of evidence is high, but can be downgraded in five different domains.
Risk of bias: Is a judgement made on the basis of the chance that bias in included studies has influenced the estimate of effect.
Imprecision: Is a judgement made on the basis of the chance that the observed estimate of effect could change completely.
Indirectness: Is a judgement made on the basis of the differences in characteristics of how the study was conducted and how the results are actually going to be applied.
Inconsistency: Is a judgement made on the basis of the variability of results across the included studies.
Publication bias: Is a judgement made on the basis of the question whether all the research evidence has been taken to account.
In the case of observational studies, the quality of evidence starts of lower and may be upgraded in three domains in addition to being subject to downgrading. Large effect: This is when methodologically strong studies show that the observed effect is so large that the probability of it changing completely is less likely. Plausible confounding would change the effect: This is when despite the presence of a possible confounding factor which is expected to reduce the observed effect, the effect estimate still shows significant effect. Dose response gradient: This is when the intervention used becomes more effective with increasing dose. This suggests that a further increase will likely bring about more effect.
Meaning of the levels of quality of evidence as per GRADE: High Quality Evidence: The authors are very confident that the estimate that is presented lies very close to the true value. One could interpret it as “there is very low probability of further research completely changing the presented conclusions.”
Moderate Quality Evidence: The authors are confident that the presented estimate lies close to the true value, but it is also possible that it may be substantially different. One could also interpret it as: further research may completely change the conclusions.
Low Quality Evidence: The authors are not confident in the effect estimate and the true value may be substantially different. One could interpret it as “further research is likely to change the presented conclusions completely.”
Very low quality Evidence: The authors do not have any confidence in the estimate and it is likely that the true value is substantially different from it. One could interpret it as “new research will most probably change the presented conclusions completely.”
Guideline panelists may make strong or weak recommendations on the basis of further criteria. Some of the important criteria are:
Balance between desirable and undesirable effects (not considering cost)
Quality of the evidence
Values and preferences
Costs (resource utilization)
Despite the differences between systems, the purposes are the same: to guide users of clinical research information on which studies are likely to be most valid. However, the individual studies still require careful critical appraisal.