Advertisement






Click here for more guidelines.
CME Topic Collections Past Issues Search Current Issue Home
     

J Am Coll Cardiol, 2005; 46:1986-1995, doi:10.1016/j.jacc.2005.07.062 (Published online 8 November 2005).
© 2005 by the American College of Cardiology Foundation
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
j.jacc.2005.07.062v1
46/11/1986    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (21)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kaul, S.
Right arrow Articles by Weintraub, W. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kaul, S.
Right arrow Articles by Weintraub, W. S.

STATE-OF-THE-ART PAPER

Trials and Tribulations of Non-Inferiority

The Ximelagatran Experience

Sanjay Kaul, MD*,*, George A. Diamond, MD, FACC* and William S. Weintraub, MD, FACC{dagger}

* Division of Cardiology, Cedars-Sinai Medical Center, and the David Geffen School of Medicine, University of California, Los Angeles, California
{dagger} Emory Center for Outcomes Research, Emory University, Atlanta, Georgia

Manuscript received May 21, 2005; revised manuscript received July 6, 2005, accepted July 11, 2005.

* Reprint requests and correspondence: Dr. Sanjay Kaul, Division of Cardiology, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Los Angeles, California 90048. (Email: kaul{at}cshs.org).


    Abstract
 Top
 Abstract
 Non-inferiority trial design
 Discussion
 References
 
Ximelagatran is a novel oral direct thrombin inhibitor that offers a number of advantages over the standard treatment, warfarin, in patients with atrial fibrillation. Two large clinical trials, one open-label (Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation [SPORTIF] III), one double-blind (SPORTIF V), have compared the efficacy and safety of fixed-dose ximelagatran without anticoagulation monitoring with dose-adjusted warfarin using a non-inferiority design. On the basis of the results, the investigators concluded that ximelagatran was just as effective as warfarin in preventing stroke or systemic embolism (the primary end point), because the pre-specified non-inferiority criterion was met. Reanalysis of the data with rather conservative interpretive criteria, however, revealed a number of deficiencies: 1) an unreasonably generous margin that was potentially biased toward non-inferiority, given the low baseline event rate of warfarin; 2) the inappropriateness of the analytical method used to estimate the non-inferiority margin; 3) a lack of confidence that ximelagatran retains at least 50% of warfarin’s effect (a prerequisite to the establishment of non-inferiority); 4) significant heterogeneity in the magnitude of efficacy observed in the two trials; and 5) safety concerns regarding increased liver toxicity with ximelagatran without a significant offsetting advantage in major bleeding. This imbalance in the benefit-risk profile materially undermines the investigators’ claim of non-inferiority of ximelagatran and led the Food and Drug Administration to reject the sponsor’s application for ximelagatran. Despite published conclusions to the contrary, we conclude that ximelagatran has not been shown to be non-inferior to warfarin. Such determinations of non-inferiority are highly dependent on the underlying assumptions, and graphical sensitivity analyses make this dependence explicit.

Abbreviations and Acronyms
  AF = atrial fibrillation
  CI = confidence interval
  FDA = Food and Drug Administration
  SPORTIF = Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation


Nonvalvular atrial fibrillation (AF) is associated with an increased risk of ischemic stroke and systemic embolization (1). Anticoagulation therapy with warfarin reduces this risk by approximately two-thirds (1,2). Aspirin is generally less effective than warfarin, but evidence suggests that it is superior to placebo, especially in low-risk patients (1). Despite the compelling evidence that anticoagulation with warfarin reduces the risk of stroke in most patients with AF, this therapy continues to be underused, with fewer than one-half of eligible patients taking it (3).

Alternative approaches to anticoagulation have resulted in the development of an oral direct thrombin inhibitor, ximelagatran (4–6), which offers several practical advantages over warfarin. It has a stable and predictable pharmacokinetic profile that is independent of body weight and other patient variables, rapid onset and offset of action, and minimal interaction with diet and drugs, thereby eliminating the need for anticoagulation monitoring and dose adjustment. The safety and efficacy of ximelagatran in patients with AF at risk for ischemic stroke has been evaluated in two phase-III studies within the Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation (SPORTIF) program (4–6).

The SPORTIF III study was a randomized, open-label parallel group trial that sought to determine if ximelagatran, administered in a 36-mg twice-daily fixed dose, was non-inferior to the current efficacy standard of warfarin (dose-adjusted to an international normalized ratio of two to three) for prevention of strokes and systemic embolic events in 3,407 patients with nonvalvular AF and, at least, one stroke risk factor (4). The SPORTIF V trial was similar, but with double-blind treatment allocation involving 3,922 patients (6). The primary efficacy and secondary safety results are summarized in Table 1.


View this table:
[in this window]
[in a new window]
 
Table 1. Primary Results of Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation (SPORTIF) III and V Studies
 
On the basis of these results, the SPORTIF investigators concluded that "ximelagatran treatment is at least as effective as well-controlled warfarin treatment for prevention of stroke and systemic embolism [and] might have a more favorable benefit/risk profile than warfarin for patients with atrial fibrillation" (4–6). As a result of these findings, the corporate sponsor submitted a new drug application to the Food and Drug Administration (FDA). Nevertheless, on October 8, 2004, the FDA rejected the application on the recommendation of its Cardiovascular and Renal Drugs Advisory Committee, owing to concerns over safety and, to a lesser degree, the method of measuring efficacy (7). An understanding of this decision requires a deeper appreciation of the methods and assumptions underlying the design and analysis of the SPORTIF trials, in particular, and of non-inferiority trials in general.

The non-inferiority design employed in the SPORTIF trial, unlike a placebo-controlled design, compares the new experimental treatment with the current standard treatment ("active control") rather than placebo. This design is justified whenever treatment with placebo is considered "unethical," as in the SPORTIF trial (4–6). The critical issues involved in non-inferiority trials are highlighted in Table 2. There are two basic approaches to non-inferiority analysis (8–15). The first approach seeks to determine if the new treatment is inferior to the active control treatment by no more than some pre-defined margin ("marginal analysis"). The second approach seeks to determine if the new treatment preserves some pre-defined fraction of the standard treatment’s effect ("fractional analysis"). Both of these approaches introduce a number of statistical assumptions, not always specified or justified. Our goal is to describe the critical assumptions underlying the design and analysis of the SPORTIF trial and to explore the robustness of its conclusions with respect to these assumptions.


View this table:
[in this window]
[in a new window]
 
Table 2. Critical Issues in the Design Conduct and Analysis of Non-Inferiority Trials
 

    Non-inferiority trial design
 Top
 Abstract
 Non-inferiority trial design
 Discussion
 References
 
Marginal analysis of non-inferiority.   One of the most crucial steps in the non-inferiority trial design is the specification of a non-inferiority margin which quantifies the worst case loss in efficacy that is clinically acceptable, considering the potential safety, convenience, or cost advantages of the new treatment. No universally accepted "gold standard" criterion exists for estimation of this margin (8–15). The International Conference on Harmonization guidance (16) advises that the determination of the margin should be: 1) specified a priori, 2) on the basis of both clinical judgment and statistical reasoning, and 3) suitably conservative, reflecting the uncertainty in evidence.

A clinically accepted norm for non-inferiority margin is a proportional difference of 15% to 20% or less, smaller than the typical 20% to 25% "minimally clinically important difference" criterion employed in superiority trials; however, what constitutes a clinically acceptable difference is ultimately a matter of judgment and might vary widely for each patient, physician, investigator, regulator or payor, and the clinical circumstance. For example, any difference in hard outcomes like mortality or irreversible morbidity (myocardial infarctions or disabling stroke) might be argued to be clinically meaningful, thereby warranting the choice of narrow conservative margins. In contrast, a larger reduction in efficacy and, therefore, a more liberal margin might be tolerable if the outcome is less robust, as with any reversible morbidity (recurrent ischemia or transient ischemic attack), and if the new treatment offers significant improvements in administration, adverse effects, and cost.

Despite the emphasis on clinical judgment, the deciding factor for determination of the margin is often statistical, given the subjective and somewhat arbitrary nature of the former. From a statistical perspective, the margin should be, at the very least, no larger than the worst limit of 95% confidence interval (CI) of standard treatment effect relative to placebo (8,13,14), but it could be smaller so as to have assurance that the new treatment has greater than minimal efficacy (10). One proposal for selecting the margin is to take one-half of the magnitude of the worst limit of this CI—the so-called "50% rule" or "95-95 method" recommended by the FDA (13,14). This conservative margin, however, often results in a high "false-negative" rate (type II error; i.e., low power to demonstrate non-inferiority). In general, the objective should be to limit "false-positive" (type I) errors by avoiding too liberal a margin and "false-negative" (type II) errors by avoiding too conservative a margin with respect to a claim of efficacy. The margin is generally set in terms of an absolute or relative difference, the latter being favored over the former given its greater trial-to-trial stability (10). The active control effect is best determined from a random effects meta-analysis to account for trial-to-trial variability (8,10,13,14).

Estimation of the non-inferiority margin for SPORTIF.   Figure 1 summarizes the estimation of the non-inferiority margin. The warfarin effect has been assessed in six placebo-controlled studies (17–22), five of which enrolled patients without and one (European Atrial Fibrillation Trial [EAFT]) with recent transient ischemic attack or stroke. Because the EAFT study enrolled a higher risk population than did the SPORTIF trial, one can justify not including it in the estimation of the summary effect of warfarin. The SPORTIF investigators assumed a baseline warfarin event rate of 3.1%/year and a non-inferiority margin of 2% in terms of absolute difference, representing their estimate of the maximal difference considered to be clinically acceptable (4,6). This margin, however, is approximately equivalent to the 95% lower limit of the absolute risk difference derived from the five pooled trials. Taking 50% of this limit, as suggested by the "50% rule," results in a margin of 1%, one-half of that of the margin used by the SPORTIF trial. With meta-analysis, the margin is estimated to be even smaller—0.85% for a fixed-effects model and 0.68% for a random-effects models. The corresponding margins in terms of risk ratio vary from 1.22 for an absolute margin of 0.68% to 1.65 for an absolute margin of 2% at an expected warfarin rate of 3.1% (Fig. 1). Thus, the non-inferiority risk ratio margin of 1.65 in the SPORTIF trial was unreasonably generous, exceeding those typically encountered in contemporary non-inferiority cardiovascular clinical trials such as the Randomized Evaluation in PCI Linking Angiomax to Reduced Clinical Events (REPLACE II) (1.18), Valsartan in Acute Myocardial Infarction (VALIANT) (1.13), Pravastatin or Atorvastatin Evaluation and Infection Therapy (PROVE-IT) (1.17), Superior Yield of the New strategy of Enoxaparin, Revascularization, and GlYcoprotein IIb/IIIa inhibitors (SYNERGY) (1.10), and Aggrastat to Zocor (A-to-Z) (1.11). The risk ratio margin would have been even higher (nearly 2.0) had the investigators chosen an appropriate warfarin rate—their choice for the warfarin rate (3.1%) exceeded the rate supported by historical pooled data (1.9%) by more than 50%.



View larger version (26K):
[in this window]
[in a new window]
 
Figure 1 Estimation of the non-inferiority margin for Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation (SPORTIF) trial. Individual data for the six historical studies of warfarin versus placebo are shown above the dashed line. Summary effects and estimation of non-inferiority margin are shown below the dashed line for all six trials (n = 6) and five primary prevention (PP) (minus European Atrial Fibrillation Trial [EAFT]) trials (n = 5). Summary effects were calculated with pooled analysis derived by adding all the events together, Mantel Haenszel method for fixed-effect meta-analysis (MA), and DerSimonian-Laird method for random-effect MA. The margin is typically estimated as 50% of the 95% lower limit of the absolute risk difference (ARD) derived from a random-effects MA (e.g., 50% of 1.36% = 0.68%). The corresponding risk ratio (RR) margins are on the basis of an assumed warfarin event rate of 3.1% (e.g., ARD of 0.68 = RR 1.22 [(3.1 + 0.68)/3.1 = 3.78/3.1]). AFASAK = Atrial Fibrillation, Aspirin and Anticoagulation; BAATAF = Boston Area Anticoagulation Trial for Atrial Fibrillation; CAFA = Canadian Atrial Fibrillation Anticoagulation; DB = double-blind; OL = open-label; SP = secondary prevention; SPAF I = Stroke Prevention in Atrial Fibrillation I; SPINAF = Stroke Prevention in Non-Rheumatic Atrial Fibrillation.

 
Marginal analysis of the SPORTIF trial.   The primary non-inferiority analyses reported for the SPORTIF III and SPORTIF V trials are shown in Figure 2. Unlike superiority trials, in which intention-to-treat (ITT) is preferred to on-treatment (OT) analysis, the ITT analysis is biased toward non-inferiority. Hence, both analyses should be performed for a non-inferiority trial, similar results supporting the robustness of the conclusion (8,13,14). Non-inferiority is established for all margins in the SPORTIF III trial for both ITT and OT analyses. Superiority of ximelagatran over warfarin is established in the SPORTIF III trial for OT analysis. In contrast, non-inferiority is established only for a margin of 2% with respect to absolute difference and is not established for any margin with respect to risk ratio in the SPORTIF V trial. According to this analysis, an inference of non-inferiority is highly sensitive to one’s choice of the non-inferiority margin—the higher the margin, the easier it is to establish non-inferiority. A sensitivity analysis over a continuous range of values is shown in Figure 3. Non-inferiority is supported in the SPORTIF V trial only for an absolute risk difference ≥1.04% and a risk ratio ≥1.87. Such margins are arguably too liberal to be considered clinically relevant. In contrast, the marginal threshold for non-inferiority for the SPORTIF III trial is suitably conservative—an absolute risk difference of 0.17% or a risk ratio of 1.05.



View larger version (17K):
[in this window]
[in a new window]
 
Figure 2 Marginal analysis of non-inferiority for Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation (SPORTIF) trial. Data are shown as point estimates and 95% confidence interval (CI) for absolute (left) and relative (right) difference in outcome for intention-to-treat (ITT) and on-treatment (OT) populations. The vertical dashed lines represent non-inferiority margins. Non-inferiority is established when the upper bound of the two-sided 95% CI (equivalent to one-sided 97.5% CI) is less than the margin. Superiority is established if the upper bound lies below 0 absolute difference (or <1.0 risk ratio).

 


View larger version (17K):
[in this window]
[in a new window]
 
Figure 3 Sensitivity analysis of marginal non-inferiority for Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation (SPORTIF) trial. Data are shown for absolute difference (left) and risk ratio (right). The one-sided p values for non-inferiority for the three margins shown in Figure 2 for SPORTIF V (2%, 1%, and 0.68%) were <0.001, 0.034, and 0.225, respectively. Non-inferiority is supported if the one-sided p ≤ 0.025 (represented by the dashed line) (9).

 
Fractional analysis of non-inferiority.   Unlike placebo-controlled superiority trials, non-inferiority trials do not provide a direct way to distinguish effective from ineffective therapies (i.e., the two therapies could be equally effective or equally ineffective). Many non-inferiority analyses are conducted on the tacit assumption that the new and standard treatments would have demonstrated effectiveness when compared directly with placebo had such a comparison been made, fulfilling the so-called requirement of "assay sensitivity" (8,10–15). One can perform such a comparison, albeit indirectly, via the "putative placebo" approach. The effect of the new treatment versus "putative" placebo is imputed from the observed effects in the current trial and the historical placebo-controlled trials of the standard treatment, as illustrated in Figure 4 (11–15). The putative placebo approach makes a critical assumption of "constancy," to the effect that the standard treatment performs as it did in previous placebo-controlled trials. In practice, this assumption is not necessarily plausible because of differences in patient characteristics, concomitant medications, intensity of treatment, and other key design features (12–15). One way to "discount" for this limitation is by estimating the fraction of the standard treatment effect retained by the new treatment. This is determined as a ratio of the imputed effect of the new treatment versus putative placebo relative to the effect of the standard treatment versus placebo along with its estimated variance and CI (12–15). Non-inferiority is inferred if the CI of this fraction exceeds a pre-specified minimum threshold (arbitrarily, 0.5) that is considered to be "clinically important" (i.e., the 95% lower limit should exceed 0.5 fraction) (12–15). Thus, for a new treatment to be declared effective, it must not only be superior to the putative placebo but it must also preserve a pre-specified fraction of the standard treatment’s effect.



View larger version (13K):
[in this window]
[in a new window]
 
Figure 4 Putative placebo analysis for Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation (SPORTIF) trials. Imputed comparisons of ximelagatran versus "putative" placebo from data derived from SPORTIF III, SPORTIF V, and combined SPORTIF III+V trials are shown. The putative placebo approach is illustrated where the estimate of risk ratio of ximelagatran relative to placebo (derived risk ratio) is obtained by multiplying the risk ratio of ximelagatran relative to warfarin (observed risk ratio), from the current trial, by the historical warfarin versus placebo risk ratio (historical risk ratio). Historical risk ratio is obtained from a random-effect meta-analysis of five primary prevention trials and is 0.37 (0.28 to 0.50). Derived risk ratios are 0.27 (0.16 to 0.44) for SPORTIF III, 0.52 (0.31 to 0.86) for SPORTIF V, and 0.37 (0.24 to 0.55) for SPORTIF III+V. Superiority over putative placebo is established if the derived risk ratio is <1.0. Non-inferiority is established if the worst (upper) limit of the "derived risk ratio" does not exceed the fractional threshold estimated as a fraction (arbitrarily 50%) of the "historical risk ratio." In SPORTIF trials, this estimate is obtained mathematically as the square root of the "historical risk ratio" and is equivalent to 0.61 (0.53 to 0.71). The lower bound (LB) of this interval (0.71) represents a liberal fractional threshold for non-inferiority. In contrast, the upper bound (UB) of this interval (0.53) constitutes a conservative fractional threshold (15).

 
Fractional analysis of the SPORTIF trial.   Comparison of ximelagatran versus putative placebo is shown in Figure 4. Superiority is established for all three analyses—ximelagatran exhibited a 73% relative risk reduction (95% CI, 56% to 85%) relative to placebo in the SPORTIF III trial, 48% relative risk reduction (95% CI, 14% to 69%) in the SPORTIF V trial, and a 63% relative risk reduction (95% CI, 45% to 76%) in the SPORTIF III+V trial. Non-inferiority is established in the SPORTIF III trial and in the combined SPORTIF III+V trial analyses but not in the SPORTIF V trial alone with the liberal "lower bound" criterion. With the more conservative "upper bound" criterion (15), non-inferiority is established only for the SPORTIF III trial.

Again, because non-inferiority depends on one’s arbitrary choice of the threshold fraction—the lower the fraction, the easier it is to establish non-inferiority—a sensitivity analysis is best performed to define the robustness of this choice. On the basis of this analysis shown in Figure 5 for the SPORTIF V trial, non-inferiority is not supported at the conventional fractional threshold of 0.5 but only for fractions ≤0.2. Although superiority of ximelagatran to placebo (assessed at 0 fraction) is established, there is some suggestion that warfarin is actually superior to ximelagatran (assessed at 1.0 fraction; one-sided p = 0.06).



View larger version (22K):
[in this window]
[in a new window]
 
Figure 5 Sensitivity analysis of fractional non-inferiority for Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation V trial. Non-inferiority, assessed at 0.5 fraction retention of the warfarin treatment, is supported if one-sided p ≤ 0.025 (represented by the lower dashed line). Superiority to placebo and warfarin is assessed at 0 and 1.0 fraction, respectively. The point estimate and the lower and upper limit of one-sided 97.5% confidence interval (CI) (equivalent to two-sided 95% CI) of the fraction of warfarin’s effect preserved by ximelagatran are shown as fractional values corresponding to the 50th (middle dashed line), 2.5th (lower dashed line), and 97.5th (upper dashed line) percentile of distribution, respectively.

 
The fraction of the warfarin effect retained by ximelagatran can also be estimated from Figure 5 and is equivalent to 0.67 (0.19 to 1.10), corresponding to the 50th (p = 0.5), 2.5th (p = 0.025) and 97.5th (p = 0.975) percentile of the distribution, respectively. With a different approach, the Hasselblad and Kong method (12), the fraction retained is estimated to be 0.67 (0.37 to 0.97). Because the CI of both of these estimates contains the pre-specified fractional threshold of 0.5, the data do not support a claim of non-inferiority. Thus, in contrast to the "official" conclusion regarding the SPORTIF V trial, these fractional analyses are inconsistent with a claim of non-inferiority relative to warfarin.

Bayesian analysis of non-inferiority.   The Bayesian approach to hypothesis testing can be employed for determination of non-inferiority. Briefly, normal posterior distributions are derived with the log mean risk ratio (µ) and its standard deviation ({sigma}) according to Bayes’ theorem, which states that the probability for the hypothesis (non-inferiority), given the evidence (the "posterior"), is proportional to the probability for the evidence, given the hypothesis (the "likelihood") times the probability for the hypothesis independent of the evidence (the "prior") (23–25).

Essential to Bayesian analysis is the choice of prior and the weight assigned to that prior (23,25). Briefly, priors range from the uninformative, which impart no information (expressed mathematically as µ = 0, {sigma} >>1), to the skeptical, which express cautiously reasonable skepticism about efficacy of the new treatment, to the informative, which impart substantial information from previous clinical trials. The uninformative prior has the least influence on the analysis; inferences on the basis of it are equivalent to the conventional frequentist results. The informative prior is especially helpful if the previous clinical trial from which it is derived closely resembles the current non-inferiority trial. Any differences in patient characteristics, study protocols, or outcome assessment between the current and historical trial can be accounted for by discounting the latter relative to the former by varying the proportion or weight assigned to the prior information (23). The influence of these choices on the resultant probability of non-inferiority can be assessed through sensitivity analysis. An analysis that is insensitive to the choice of prior indicates a greater degree of stability in the resultant inferences.

The advantages of the Bayesian approach, and its applications to non-inferiority trials, are reviewed in greater detail elsewhere (23–25). In general, Bayesian analysis replaces a categorical (yes/no) non-inferiority judgment with a continuous probability statement relative to the non-inferiority hypothesis. Accordingly, the probability of non-inferiority relative to any assumed marginal or fractional threshold can be computed, and non-inferiority is thereby inferred at a posterior probability of ≥0.975 (corresponding to a conventional one-sided p ≤ 0.025).

Bayesian analysis of the SPORTIF trial.   Figure 6 shows posterior probability distributions for the SPORTIF V trial with three different priors, as detailed in the figure legend. The probability of non-inferiority with the pre-defined risk ratio margin of 1.65 (equivalent to a log risk ratio of 0.5) is 0.804, 0.913, and 0.999 with an uninformative, skeptical, and informative prior, respectively. Thus, non-inferiority is established (posterior probability ≥0.975) only when prior information from the SPORTIF III trial is used. The probability of non-inferiority is directly proportional to the magnitude of the margin and the weight of prior information and exceeds the threshold of 0.975 for all weights (0 to 1) with the investigators’ absolute difference margin of 2% (1.65 risk ratio) but only for weights >0.4 for a conservative absolute difference margin of 0.68% (1.22 risk ratio; i.e., 40% or higher portion of data from the SPORTIF III trial is required to establish non-inferiority at a conservative marginal threshold in the SPORTIF V trial). Thus, the lower the prior weight, the lesser the dependence on prior studies and the stronger the evidence of non-inferiority.



View larger version (16K):
[in this window]
[in a new window]
 
Figure 6 Bayesian analysis of Stroke Prevention Using Oral Thrombin Inhibitor in Atrial Fibrillation (SPORTIF) V trial. Tri-plots showing posterior (thick line) distributions derived from integrating evidence or likelihood (thin line) from SPORTIF V trial and three different priors (dashed line), according to Bayes’ theorem. The margin of non-inferiority (M) is indicated by the two vertical dotted lines and is equivalent to a log risk ratio of 0.5 (equivalent to a risk ratio of 1.65). Three priors are used: 1) uninformative prior, intended to add as little as possible to the data (formally expressed as log-RR mean (µ) = 0, standard deviation ({sigma}) = 10; likelihood and posterior distributions are superimposed, hence only one plot); 2) moderately skeptical (i.e., 50% of the distribution is contained within the non-inferiority margin [log-RR µ = 0.250, {sigma} = 0.374]); and 3) informative, on the basis of prior information derived from SPORTIF III trial (log-RR µ = –0.338, {sigma} = 0.241). Posterior probability of any effect size can be calculated by computing area under the curve. The probabilities of falling below (<M), within (= M), or above M (>M) are shown on the top right-hand corner of each plot. Probability of non-inferiority is computed as the sum of probability of <M plus = M (e.g., the posterior probability of non-inferiority with uninformative prior [plot # 1] is 0.068 [<M] + 0.736 [= M] = 0.804). Non-inferiority is inferred at a posterior probability of ≥0.975 (corresponding to a one-sided p ≤ 0.025).

 

    Discussion
 Top
 Abstract
 Non-inferiority trial design
 Discussion
 References
 
Two large non-inferiority trials have concluded that "ximelagatran is at least as effective as warfarin" in preventing stroke and systemic embolism in patients with nonvalvular AF (4–6); however, our re-assessment reveals several limitations in the design and analysis of the data that refute the investigators’ interpretation and point to a contrary conclusion. On the basis of the pivotal double-blind SPORTIF V trial, there is, in fact, very little evidence that ximelagatran is non-inferior to warfarin, unless one uses a liberal non-inferiority margin that is not supported by historical studies. The best case scenario indicates a negligible benefit of ximelagatran over warfarin—a 0.13% absolute or 9% relative risk reduction and 110% retention of warfarin’s effect. In contrast, the worst case scenario reflects a 1% absolute loss of benefit or a >2-fold relative increase in risk and ≤20% preservation of the warfarin effect.

This loss in efficacy would have been tolerable if ximelagatran had been shown to have other noteworthy benefits (less toxic, less costly, easier to administer) that outweigh the seeming loss of efficacy; however, this is not the case, owing to major safety concerns over increased liver toxicity and increased withdrawal rate associated with ximelagatran in comparison with warfarin. Even though the incidence of major bleeding is numerically greater with warfarin, the difference is not statistically significant. Thus, the potential pharmacologic advantages and ease of administration of ximelagatran without the need for monitoring or dose titration are offset by reduced efficacy, increased safety concerns, and potentially increased cost. This imbalance in the benefit-risk and potentially benefit-cost profile challenge the investigators’ claim of non-inferiority of ximelagatran and led the FDA against recommending ximelagatran for approval.

Critical issues in the design and analysis of non-inferiority trials.   The results of the SPORTIF trials highlight several fundamental issues in the design and analysis of non-inferiority trials, as summarized in Table 2. The choice of the non-inferiority margin is a key step. Several points are worthy of consideration with respect to the non-inferiority margin in the SPORTIF trials.

First, one might argue that the 2% margin was unreasonably generous and potentially biased toward non-inferiority, given the low baseline event rate in this study. If the investigators had chosen a smaller, more conservative margin, they would not have drawn the conclusion that ximelagatran was non-inferior to warfarin.

Second, there is uncertainty about the magnitude of the warfarin effect because of the variability between the five historical trials in terms of the design and the observed results. Only two of the five trials were double-blind (Fig. 1), and four were stopped prematurely because of significant benefits observed on interim analysis (18–21). Given this variability, a random-effects meta-analytical model, which allows for differences in treatment effects between studies, would have provided a reliable estimate of the warfarin effect compared with the fixed-effects model or the pooled analysis.

Third, the decision to employ absolute or relative difference as the basis for judgments regarding non-inferiority is arbitrary. In general, relative differences provide more conservative thresholds than absolute differences when event rates are changing and/or unpredictable (as in the SPORTIF trial, where the observed event rates were lower than the assumed rate) owing to differences in patient populations or new modalities of treatment (8,11). Accordingly, an analysis of the SPORTIF V trial on the basis of risk ratio is inconsistent with the "official" analysis on the basis of absolute difference, the observed upper bound of 2.1 being greater than the non-inferiority risk ratio margin of 1.65. Non-inferiority becomes even more difficult to establish with more conservative relative margins. In such cases, a judgment of non-inferiority would be more confident if analyses on the basis of absolute and relative difference were concordant.

Finally, the impact of active control event rate and non-inferiority margin on sample size is quite substantial. For a relative risk margin of 1.65, the total sample size required to ensure 90% power increases from 3,156 in the SPORTIF V trial at an expected warfarin event rate of 3.1%/year to 4.875 at the pooled historical warfarin rate of 1.9% per year and to 8,190 at the actually observed warfarin rate of 1.2%/year. More conservative margins would also require greater sample sizes at any given warfarin rate with the sample size increasing as the reciprocal of the square of the margin (13,14). Thus, both SPORTIF trials were arguably underpowered (resulting in a high "false-negative" type II error) to determine the relative efficacy of ximelagatran versus warfarin, given lower than expected warfarin event rates.

Although the size of the margin is determined by trial logistics (the larger the margin, the smaller the trial), a potentially serious consequence of choosing liberal margins is "biocreep," a well recognized phenomenon that can occur when a slightly inferior treatment becomes the active control for future non-inferiority trials and so on until the active control becomes no better than a placebo (13,14). Ideally, stringent margins on the basis of the best comparator should be used to enhance the strength and credibility of non-inferiority trials; however, such stringent margins often result in large sample sizes that render the trials impractical. Reconciling these two important considerations of feasibility and stringency poses a substantial challenge. In this article, we have shown how a sensitivity analysis across a range of margins, from liberal to clinically relevant to conservative (reflecting the core philosophies of the sponsor, practitioner, and regulator, respectively), might provide useful insights.

Another key aspect of the non-inferiority inference is its reliance on two critical assumptions: assay sensitivity defined as the ability to detect differences between treatments if such differences exist; and constancy, which assumes that the historical difference between the active control and placebo will be constant in the setting of the current active control trial if a placebo control had been used (8,10–15). The validity of these two key assumptions, however, cannot be verified directly. Assay sensitivity is affected by poor trial design and conduct that does not ensure maximal compliance, minimization of protocol deviations, and outcome misclassifications (Table 2). The constancy assumption cannot be plausibly demonstrated because of differences with respect to patient characteristics, concomitant medications, intensity of treatment, and other key design features (12–15). Given this limitation, it seems reasonable to raise the standard of evidence required for the establishment of non-inferiority.

Both marginal and fractional analyses are considered to be forms of "discounting" to raise the standard of evidence (15). The "50% rule" endorsed by the FDA represents a form of "double discounting," in which preservation of a fraction of the active control effect is applied to the non-inferiority margin to make it "suitably conservative" (16). The fractional approach addresses the issue of constancy by discounting the historical data when the event rates are dissimilar in the current and historical trials (15). In the SPORTIF V trial, the observed warfarin rate of 1.2% was nearly 50% lower than the historical rate of 1.9%. Thus, a proper discounting via fractional analysis would have minimized the type I error and led the SPORTIF investigators away from an erroneous conclusion of non-inferiority.

Implications.   One’s choice of the statistical approach to inference has important implications in the interpretation of non-inferiority trials. In this context, the Bayesian approach offers a number of advantages over the conventional frequentist approach (23–25). Chief among them is the integration of prior information with the empirical data to upgrade the evidence. Any degree of heterogeneity between the current and prior trial can be corrected by varying the proportion or weight assigned to the prior information (23). For example, there is substantial heterogeneity in the primary outcome between the SPORTIF III and SPORTIF V trials (p = 0.03) (5) that might be related to bias due to lack of blinding in the SPORTIF III trial or to other confounding factors such as significantly greater degree of concomitant aspirin use in the ximelagatran group in the SPORTIF III trial (4,5). Bayesian analysis supports the appropriateness of lowering the marginal threshold for non-inferiority via incorporation of prior information, thereby strengthening the evidence in favor of non-inferiority. It is exactly in this way—by taking optimum advantage of the available prior information—that the Bayesian approach offers a major advantage over the frequentist approach. Ideally, robust conclusions regarding non-inferiority should be on the basis of concordant analyses with both approaches.

In conclusion, a variety of subtle assumptions challenge the design, analysis, and interpretation of non-inferiority trials. Among these are the arbitrary thresholds employed for the characterization of "non-inferiority" and the use of historical controls to derive the effect of the new treatment relative to a hypothetical putative placebo. In the extreme, this trial design might result in a "regression toward mediocrity" whereby any treatment becomes non-inferior to another by suitable choice of the underlying assumptions. In general, if such trials are to be applied to clinical and regulatory decisions regarding the marketing and use of new treatments, the underlying assumptions must be made explicit and their influence on the resultant conclusions must be assessed rigorously via sensitivity analyses. Thus, when these sensitivity analyses were applied to each of the key assumptions underlying the recently reported SPORTIF trials, they materially undermined the authors’ conclusion regarding the non-inferiority of ximelagatran relative to warfarin in the management of patients with AF.


    References
 Top
 Abstract
 Non-inferiority trial design
 Discussion
 References
 
1. Singer DE, Albers GW, Dalen JE, Go AS, Halperin JL, Manning WJ. Antithrombotic therapy in atrial fibrillation. The Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy Chest 2004;126:429S-456S.[Abstract/Free Full Text]

2. Atrial Fibrillation Investigators Risk factors for stroke and efficacy of antithrombotic therapy in atrial fibrillationanalysis of pooled data from five randomized controlled trials. Arch Intern Med 1994;154:1449-1457.[Abstract/Free Full Text]

3. Fang MC, Stafford RS, Ruskin JN, Singer DE. National trends in antiarrhythmic and antithrombotic medication use in atrial fibrillation Arch Intern Med 2004;164:55-60.[Abstract/Free Full Text]

4. Olsson SB. Stroke prevention with the oral direct thrombin inhibitor ximelagatran compared with warfarin in patients with non-valvular atrial fibrillation (SPORTIF III)randomized controlled trial. Lancet 2003;362:1691-1698.[CrossRef][Web of Science][Medline]

5. Halperin JL. Ximelagatranoral direct thrombin inhibition as anticoagulant therapy in atrial fibrillation. J Am Coll Cardiol 2005;45:1-9.[Abstract/Free Full Text]

6. SPORTIF Executive Steering Committee for the SPORTIF V Investigators Ximelagatran versus warfarin for stroke prevention in patients with nonvalvular atrial fibrillation. A randomized trial JAMA 2005;293:690-698.[Abstract/Free Full Text]

7. Lawrence J, Hung J, Mahjoob K, reviewer. Statistical review and evaluation, clinical studies, NDA 21-686 (2004). FDA web site. Available at: http://www.fda.gov/ohrms/dockets/ac/04/briefing/2004-4069B1_07_FDA Backgrounder-C-R-stat%20Review.pdf. Accessed October 10, 2004..

8. Ellenberg SS, Temple R. Placebo-controlled trials and active-control trials in the evaluation of new treatments. Part 1: ethical and scientific issues Ann Intern Med 2000;133:455-463.[Abstract/Free Full Text]

9. Blackwelder W. Proving the null hypothesis in clinical trials Control Clin Trials 1982;3:345-353.[CrossRef][Web of Science][Medline]

10. Siegel JP. Equivalence and non-inferiority trials Am Heart J 2000;139:S166-S170.[CrossRef][Web of Science][Medline]

11. Gould A. Another view of active-controlled trials Control Clin Trials 1991;12:474-485.[CrossRef][Web of Science][Medline]

12. Hasselblad V, Kong DF. Statistical methods for comparison to placebo in active-control trials Drug Inf J 2001;35:435-449.[Web of Science]

13. Hung HMJ, Wang S-J, Tsong Y, et al. Some fundamental issues with non-inferiority testing in active controlled trials Stat Med 2003;22:213-225.[CrossRef][Web of Science][Medline]

14. D’Agostino Sr. RB, Massaro JM, Sullivan LM. Non-inferiority trialsdesign concepts and issues—the encounters of academic consultants in statistics. Stat Med 2003;22:169-186.[CrossRef][Web of Science][Medline]

15. Snapinn SM. Alternatives for discounting in the analysis of noninferiority trials J Biopharm Stat 2004;14:263-273.[CrossRef][Medline]

16. International Conference on Harmonisation. Statistical principles for clinical trials (ICH E 9) (1998); International Conference on Harmonisation. Guidance on choice of control group and related design and conduct issues in clinical trials (ICH E 10) (2000). Food and Drug Administration, Department of Health and Human Services. Available at: http://www.fda.gov/cder/guidance/index.htm. Accessed October 17, 2005..

17. Conolly SJ, Laupacis A, Gent M, Roberts RS, Cairns JA, Joyner C. Canadian Atrial Fibrillation Anticoagulation (CAFA) study J Am Coll Cardiol 1991;18:349-355.[Abstract]

18. Ezekowitz MD, Bridgers SL, James KE, et al. Warfarin in the prevention of stroke associated with nonrheumatic atrial fibrillation N Engl J Med 1992;327:1406-1412.[Abstract]

19. Stroke Prevention in Atrial Fibrillation Investigators Stroke prevention in Atrial Fibrillation studyfinal results. Circulation 1991;84:527-539.[Abstract/Free Full Text]

20. The Boston Area Anticoagulation Trial for Atrial Fibrillation Investigators The effect of low-dose warfarin on the risk of stroke in patients with nonrheumatic atrial fibrillation N Engl J Med 1990;323:1505-1511.[Abstract]

21. Petersen P, Boysen G, Godtfredsen J, Andersen ED, Andersen B. Placebo-controlled, randomised trial of warfarin and aspirin for prevention of thromboembolic complications in chronic atrial fibrillation Lancet 1989;175:179.

22. European Atrial Fibrillation Trial (EAFT) Study Group Secondary prevention in non-rheumatic atrial fibrillation after transient ischaemic attack or minor stroke Lancet 1993;342:1255-1262.[Web of Science][Medline]

23. Spiegelhalter DJ, Freedman LS, Parmar MKB. Bayesian approaches to randomized trials J Royal Stat Soc Series A 1994;157:357-416.[CrossRef]

24. Simon R. Bayesian design and analysis of active control clinical trials Biometrics 1999;55:484-487.[CrossRef][Web of Science][Medline]

25. Diamond GA, Kaul S. Prior convictionsBayesian approaches to the analysis and interpretation of clinical megatrials. J Am Coll Cardiol 2004;43:1929-1939.[Abstract/Free Full Text]




This article has been cited by other articles:


Home page
CirculationHome page
B. K. Nallamothu, R. A. Hayward, and E. R. Bates
Beyond the Randomized Clinical Trial: The Role of Effectiveness Studies in Evaluating Cardiovascular Therapies
Circulation, September 16, 2008; 118(12): 1294 - 1303.
[Full Text] [PDF]


Home page
ChestHome page
D. E. Singer, G. W. Albers, J. E. Dalen, M. C. Fang, A. S. Go, J. L. Halperin, G. Y. H. Lip, and W. J. Manning
Antithrombotic Therapy in Atrial Fibrillation: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines (8th Edition)
Chest, June 1, 2008; 133(6_suppl): 546S - 592S.
[Abstract] [Full Text] [PDF]


Home page
J Am Coll CardiolHome page
S. Kaul, G. A. Diamond, and W. S. Weintraub
Reply
J. Am. Coll. Cardiol., September 5, 2006; 48(5): 1059 - 1059.
[Full Text] [PDF]


Home page
J Am Coll CardiolHome page
G. W. Albers, H.-C. Diener, L. Frison, M. Grind, J. Horrow, M. Nevinson, S. B. Olsson, S. Partridge, P. Petersen, A. Vahanian, et al.
Trials and Tribulations of Noninferiority: The Ximelagatran Experience
J. Am. Coll. Cardiol., September 5, 2006; 48(5): 1058 - 1058.
[Full Text] [PDF]


Home page
ANN INTERN MEDHome page
S. Kaul and G. A. Diamond
Good Enough: A Primer on the Analysis and Interpretation of Noninferiority Trials
Ann Intern Med, July 4, 2006; 145(1): 62 - 69.
[Abstract] [Full Text] [PDF]


Home page
JAMAHome page
P. C. Gotzsche
Lessons From and Cautions About Noninferiority and Equivalence Randomized Trials
JAMA, March 8, 2006; 295(10): 1172 - 1174.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
j.jacc.2005.07.062v1
46/11/1986    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (21)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kaul, S.
Right arrow Articles by Weintraub, W. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kaul, S.
Right arrow Articles by Weintraub, W. S.

 
  CME Topic Collections Past Issues Search Current Issue Home

Advertisement