|
|
||||||||||
|
J Am Coll Cardiol, 2004; 44:2285-2292, doi:10.1016/j.jacc.2004.07.059 © 2004 by the American College of Cardiology Foundation |
Division of Cardiovascular Pathophysiology and The Howard Gilman Institute for Valvular Heart Diseases, Weill Medical College of Cornell University, New York, New York
Manuscript received May 21, 2004; accepted July 24, 2004.
* Reprint requests and correspondence: Dr. Jeffrey S. Borer, New York-Presbyterian Hospital Weill Cornell Medical Center, 525 East 68th Street, New York, New York 10021 (Email: ero2002{at}med.cornell.edu).
| Abstract |
|---|
|
|
|---|
| ||||||
The law requires that, before therapeutic availability by prescription or "over the counter" without prescription, the FDA must apply a benefit/risk standard to approve any new drug (in this context, a product [usually but not necessarily synthetic] claiming health benefits from diagnosis or treatment based on direct or indirect interaction with intermediary metabolism of the beneficiary), any device (a synthetic product claiming similar benefit from physical interaction with the beneficiary), or any biological (a derivative or product of a living organism other than its donor or, if autologous, undergoing substantial processing before administration). The FDA's approval is based on evaluation of a New Drug Application, Pre-Marketing Approval, or Biologics Licensing Application from the sponsor (producer) supporting its use. The application includes a proposed label describing the product's characteristics and directions for use. This must also be approved by the FDA. Before any new therapeutic can be studied in people, the preclinical pharmacology/biology/actions, the general outline of the proposed human studies, and the development plan must be described in an Investigational New Drug application for human studies (drugs, biologicals) or an Investigational Device Exemption for devices. The FDA can prevent any or all of the proposed human studies if it perceives safety concerns. During development and after approval, the FDA continually reviews and, if necessary, stops studies or withdraws approval on the basis of adverse events (AEs), focusing especially on AEs meeting an operational definition of seriousness (serious [S]AEs). Allowable AE reporting intervals are strictly defined and enforced.
Riding herd on this effort is a Herculean task, in part because of the rigorous scope of FDA reviews: all data collected by the manufacturer must be tabulated, submitted to, and evaluated by the FDA. Although it is not necessary for the FDA to review each source document (raw data), these must be available for audit/sampling. Evidence of complete and appropriate data collection must be provided from properly documented audits; "Good Clinical Practice," "Good Laboratory Practice," and "Good Manufacturing Practice" standards must be employed in all research upon which FDA approval is based. By federal regulation, approval requires evidence of efficacy from adequate and well-controlled clinical studies, as well as safety acceptable for the intended use defined from adequate exposure. Until 1997, the plural "studies" was generally interpreted as requiring at least two independent clinical trials. However, by that time, progressively larger clinical trials had emerged, some focused on major adverse outcomes (death, stroke) that rendered study repetition ethically tenuous and/or impractical. Although the FDA accepted single persuasive trials as a basis for approval in such cases, the 1997 law explicitly provides FDA authority in certain circumstances to approve a therapy based on a single trial if adequate "confirmatory evidence" is available; the nature of the confirmatory evidence is not legally specified. The law also requires performance of all tests of safety that are "reasonably applicable."
This summary of approval requirements masks far more complex FDA judgments based on principles often very different from those that influence unfettered academic research or clinical practice. The scientific method underlies problem solving in each area, but the latitude permitted in interpreting the results of scientific inquiry differs widely in the three circumstances and is most stringent in the area of therapeutics regulation.
This article describes my perception of some of these limitations on interpretation. In so doing, it aims to elucidate the process and principles by which cardiovascular therapeutics and, most specifically, cardiovascular drugs are approved. The same overarching principles apply to approval of devices and biologicals that are increasingly important in cardiovascular therapy, although characteristics specific to these other modalities impose unique limitations, which I will briefly review.
| Background |
|---|
|
|
|---|
Personnel: Officials, advisors, and consultants. The FDA comprises full-time experts in basic and preclinical pharmacology, clinical pharmacology, toxicology, statistics, and more. Most have previous clinical and/or research experience. After several years of reviewing Investigational New Drug/New Drug Applications, Pre-Marketing Approval, or Biologics Licensing Applications and the associated literature, these officials are highly knowledgeable regarding the therapeutic areas under their purview (including, for the Cardio-Renal Drugs Division, hypertension, ischemic heart disease, heart failure, arrhythmia, renal disease, and pulmonary hypertension). When appropriate, they can call on the expertise of other FDA Divisions and Centers. What, then, is the need for Advisors and Consultants? Primarily, to provide perspective gained from ongoing contemporary clinical and consumer experience, generally not available from full-time regulators because of their workload. Also, although FDA officials review and analyze massive amounts of data, they do not produce the data; value is gained from the perspective of Advisors involved in research like that under review.
The FDA and clinical practice. The FDA clearly plays a fundamental role in defining medical practice. However, though FDA decisions determine the pharmaceutical, device, and biological therapies (and many of the diagnostic tools) available for patient care, these decisions refer only to uses the sponsor proposes for marketing. In this sense, the FDA is a reactive body. Although the FDA is increasingly involved with the details of developing therapeutics and often influences decisions about specific components of trial design such as outcome measures, it does not initiate therapeutics development. More importantly, the FDA is not more broadly charged with definition of practice standards. Therapeutics approval and labeling define the benefit (the "indication") that can generally be expected from therapy, the characteristics of patients who can generally be expected to benefit, the administration regimen(s) likely to provide the benefit with acceptable relation to risk, and the adverse experiences that may be associated with using the therapy. Approvals are generally conservative: they focus only on one or more very carefully defined indication(s) supported by well-designed clinical trials that provide statistically persuasive evidence. Thus, evidentiary standards for FDA approval imply a certain degree of scientific and statistical rigor. However, except when safety considerations result in the recommendation for use only if safer therapies are intolerable or inadequate to provide the desired benefit, or when study designs and results clearly support a sponsor-submitted claim of superiority of one therapy over another, the FDA does not prioritize among approved treatments for a given indication. The FDA does not recommend the use of any specific therapy, does not comment on the relative cost of different treatments, and does not sanction the use of treatments outside the labeled indications or doses, but it does sanction sponsors who advertise outside the approved label. Recommendations for treatment and the prioritization of therapies are the purview of professional societies or "guidelines" panels, generally comprising small groups of experienced clinicians/researchers with expertise in a specific disease area, whose consensus recommendations are usually based on personal experience and published literature rather than on the far more complete data available to the FDA, and who are not held to any predetermined standard of statistical or scientific rigor. Because patients must receive the best possible care despite recognized deficiencies in supporting data, consensus panels function in part to provide reasonable recommendations when rigorously defined supporting data are not available. Thus, their function is fundamentally different from that of the FDA. It is entirely conceivable (and regularly true) that "guidelines" include recommendations for use of drugs in settings and for indications not considered and sometimes even specifically rejected by the FDA. Thus, the occasional argument by sponsors seeking new indications for already approved drugs that "the horse is out of the barn," meaning that clinical practice has moved beyond the FDA, generally has little impact on approval decisions. Because of the FDA's legal mandate and the legal implications of its decisions, the FDA must require well-established evidentiary standards for its imprimatur. Sanctions on physicians are not the responsibility of either the FDA or consensus panels, but rather of the courts and, increasingly, of third-party payers; sometimes such sanctions are based on the use of drugs or other treatments for "off-label" indications (i.e., to achieve benefits not confirmed by the FDA).
| Fundamental principles |
|---|
|
|
|---|
Surrogates. Surrogate end points are laboratory tests or clinical measures, such as blood pressure, that are believed to relate invariably to clinical outcome but are not themselves clinical benefits (4). To be an acceptable basis for drug approval, the relation between the surrogate and clinical outcome must be constant. Surrogates and interventions must not interact (i.e., surrogate and clinical outcome both must change similarly irrespective of the form of intervention); improvement in the surrogate invariably must lead to clinical benefit. Historically, variations in blood pressure have been closely related to variations in the risk of stroke, with less clear effects on myocardial infarction and heart failure. Also, despite possible differences in some effects of interventions noted in the recent Antihypertensive and Lipid-Lowering treatment to prevent Heart Attack Trial (ALLHAT) (9), all interventions that reduce blood pressure, irrespective of pharmacologic class, have had directionally similar effects on events. Consequently, blood pressure reduction has been accepted as a surrogate for clinical benefit, and drugs are approved if blood pressure reduction is demonstrated. An antihypertensive drug could have additional benefits (e.g., reduction in rate of deterioration of renal function in type II diabetics by losartan and irbesartan but not by an equi-effective antihypertensive dose of amlodipine) or improvement in some heart failure outcomes by certain antihypertensives, but these benefits must be demonstrated directly for approval to be granted for such an indication. In contrast to blood pressure, left ventricular ejection fraction for heart failure outcomes, ST-segment depression on the exercise electrocardiogram (ECG) or on the 24-h ambulatory ECG, or radionuclide-based measures of ischemia for ischemic event risk, among other measures, have been proposed as surrogates, but have either failed to manifest an invariant relation between the effect of the intervention on the surrogate and on the outcome or have not been studied with sufficient interventions to allow confidence that such an invariant relation exists. The search for surrogates continues, driven by the economics of drug development and the realities of trial conduct. The success of multi-drug, multi-modality therapy in reducing the risks from major cardiovascular diseases requires conducting progressively larger (and costlier) clinical trials to demonstrate relatively small but potentially clinically important incremental benefits. Resulting costs increasingly threaten further therapeutics development.
Study design: Efficacy assessment. Evidence of efficacy requires a rigorous comparison between a new therapy and either no therapy, treatment with an agent already approved for a similar purpose, or treatment with different doses of the new agent. Regulations do not specify the form of such comparisons and allow for nonrandom and even "historically controlled" trials. However, because of the potential for unintentional bias and confounding, cardiovascular drug approvals are currently based solely on contemporaneous comparisons using randomized, usually double-blinded study designs. In theory, a clinical entity might have outcomes sufficiently predictable so that historical controls, alone, could be used (e.g., anuric renal failure, inevitably rapidly fatal without dialysis); FDA approval of cardiovascular therapies has not been sought for such entities. However, although "non-inferiority" trials (discussed subsequently) are based on contemporaneous comparison of a new treatment with an active control, historical information must be employed in defining the extent of non-inferiority that must be precluded (it cannot be larger than the effect of the control).
Several study designs may be appropriate for comparisons; selection is based on multiple factors, including the benefit expected. For example, if symptom reduction is sought, parallel-arm or cross-over studies might be acceptable; the parallel comparison might even come at the end of a period in which all patients received the new treatment ("randomized withdrawal"), a very attractive strategy in certain situations (10). If natural history alteration is anticipated, only parallel-arm studies may be appropriate.
Data interpretation is least ambiguous when a drug is compared with placebo without any background therapy. However, in the present era, to achieve requisite population sizes, it may not be possible to maintain placebo therapy alone for sufficient time to study the drug effects on certain end points with adequate statistical power to be likely to demonstrate a drug effect, even if it truly exists. Therefore, comparisons with placebo are often carried out with background therapy, sometimes involving several drugs and varying among subsets within the population. This may confound data interpretation but, given the proliferation of treatments for many conditions and the ethical and/or practical necessity of applying at least some of these, the problem is unavoidable. The potential impact of background therapy can be evaluated statistically, albeit imperfectly unless background treatment is uniform or study designs include complicated (and often impractical) stratification and balancing schemes.
Comparison can be undertaken with a labeled dose of an approved drug, aimed at demonstrating superiority of the new agent. Although sometimes successful (and providing unambiguous evidence of effectiveness when it is), this strategy minimizes the likelihood of demonstrating a benefit from the new therapy (11). Therefore, trials can be designed to demonstrate "non-inferiority" of one regimen to another (i.e., to show that a defined amount of the effect of the control agent is retained by the new therapy). The effect of the control agent in the non-inferiority trial is inferred from the results of earlier trials that compared the control agent with placebo.
The FDA has suggested that the acceptable difference between a new drug and active comparator should be less than the difference between the effect of the comparator and the upper boundary of the confidence interval of the placebo effect in the historical comparison; however, other regulatory authorities (and the FDA) may allow greater flexibility in defining "non-inferiority" standards. For optimal application, this approach requires an extensive comparison of the active control and placebo, so that the incremental effect of the active control is reasonably well defined. Unfortunately, for most approved treatments, the point estimate of the effect versus placebo has fairly broad confidence intervals. Thus, with few exceptions, assumptions underlying non-inferiority trials involve considerable uncertainty. In the U.S., an efficacy claim supported solely by non-inferiority trials would require additional support of some sort but, as in all other cases, decisions would be importantly influenced by situation-specific data, including confidence in the anticipated effect of the control versus placebo. Approval for clopidogrel was based on a single trial in which non-inferiority was persuasively demonstrated versus aspirin (for which the effect versus placebo was relatively well defined historically). In this trial, clopidogrel actually demonstrated nominal (p < 0.05) superiority to aspirin (12). Generally, this would be considered insufficient evidence to support approval from a single trial (see subsequent Statistics section), but because of the known effect of aspirin, clopidogrel was judged to be clearly effective (i.e., superior to placebo).
A comparison of the new agent with itself at different doses (i.e., demonstration of a dose-response relation) indicates drug efficacy if, within the proposed range of administration, higher doses cause progressively greater beneficial effects.
As inferable from the 1997 law, support for the efficacy of a new treatment might result from experience with other approved drugs of the same class or with some similar properties, perhaps allowing approval based on a single study without a particularly high level of statistical significance. During the past three years, the Cardio-Renal Advisory Committee has suggested on multiple occasions that, in most instances, such support is at best modest, primarily because of the previously noted arguments favoring direct testing of clinical efficacy rather than extrapolating clinical effects from pharmacologic effects. However, depending on the specific characteristics of the parallel treatments and of the relevant data, outcomes with similar therapies might affect the strength of evidence expected from the primary (pivotal) trials of a new therapy or might help justify acceptance of a large single trial demonstrating effects considered particularly beneficial.
Study design: Efficacy persistence. Usually it is assumed that drug regimens for long-term use (though not necessarily those intended for single, short-duration or repeated transient use) will maintain their efficacy indefinitely, although this presumption has never been tested. To provide evidence of reasonable effect persistence, drugs aimed at chronic symptom relief generally must demonstrate efficacy at least three to six months after treatment initiation (interval depending on the disease); this duration is based largely on empirical observations on the predictive value of effect persistence over this interval and is subject to change with new empirical data. The three- to six-month standard also is driven by the observation that drugs for long-term use sometimes do not achieve a maximal effect until many weeks after initial administration and by the need to allow reasonable exposure for detection of certain types of AEs.
Assessment of efficacy persistence for symptom reduction (and concomitant evaluation for "rebound phenomena," discussed later) is best achieved with randomized withdrawal from treatment at the conclusion of an appropriate interval. This study design feature can be applied after prolonged open-label active therapy, thus minimizing the practical difficulty and costs potentially associated with randomized, double-blinded comparisons of similar duration, and can provide persuasive evidence of effectiveness as well. For example, for the If current inhibitor, ivabradine, efficacy in angina prevention was confirmed, and a lack of pharmacologic tolerance or rebound was demonstrated by randomized withdrawal at the conclusion of a trial that began with a two-week, placebo-controlled, parallel-arm comparison of multiple ivabradine doses, followed by two to three months of open-label treatment of all patients receiving the highest ivabradine dose (6).
For treatments intended to alter survival and/or to minimize major morbid events, no specific standard exists for effect persistence. The basis for approvability would depend on the expected natural history of the specific disease, whether the treatment is intended for application during acute events or for a chronic condition, as well as on the specific benefits and risks expected. In general, for most chronic conditions, evidence of survival improvement and/or major morbid event reduction for at least one year is presented, often including information on many patients studied for several years. Generally, there should be no substantial narrowing of the gap between treatment and comparator during the interval of observation, but this pattern probably reflects the specific nature of the claims that have been sought and by expectations of the scientific community and Advisory Committees and is not mandated by the FDA.
A related concern is that of rebound phenomena. Unfortunate experience 30 years ago revealed that sudden cessation of short-acting beta-blockers in patients with ischemic heart disease is associated with a modest risk of myocardial infarction or sudden death. Now, some assessment of the effect of stopping a drug is expected, specifically to detect rebound.
Safety assessment. Unless special circumstances suggest the prudence of more intensive scrutiny (concerns based on fundamental pharmacologic properties, adverse findings from animal studies, the likelihood of co-administration with other drugs that might interact adversely with the new agent, all of which also might lead to requests for studies in special populations), in accordance with recent international regulatory harmonization agreements, the Cardio-Renal Division expects exposure of at least 1,500 to 2,000 patients to a new drug product before approval, with 300 to 600 exposed for six months or more and at least 100 patients exposed for one year or more. Past experience with certain types of treatments may suggest the need for treatment-specific safety data for certain outcomes; sometimes these requirements encompass all drug classes. On the basis of extensive empirical evidence linking ECG QT prolongation with sudden death and torsade de pointes arrhythmias, the FDA now requires evaluation of potential QT effects of all new drugs.
After approval, voluntary physician reports of AEs are regularly evaluated by the FDA. It is clear that relatively rare AEs, even if catastrophic, could easily be missed with the required pre-marketing exposure. However, the exposure standard is justified by the perceived balance between the risk of harm to individuals from missing a serious AE and the harm to the public from the likelihood that requiring substantially larger and longer pre-marketing exposure might suppress development of generally beneficial new therapies. Empirically, this system is acceptable: relatively few drugs have required post-approval withdrawal because of AEs first recognized after approval.
Statistics. Statistical evidence of a drug-mediated benefit is expected to be strong, in accordance with the law. The usual standard for each study is statistical confidence that, if repeated 100 times, efficacy would be found in >95 studies (p < 0.05). On the basis of this standard, in the usual case, if two studies are positive, the likelihood is very low that an ineffective drug will be approved. The FDA has described evidence that may be considered in judging the persuasiveness of a single trial but, ultimately, such decisions are largely based on intuitive judgments. The Cardio-Renal Division has suggested that equivalence to two trials might be inferable if a single trial achieves a p value approximating that of p < 0.05 in two trials, roughly equivalent to p < (0.05)2/2 = 0.00125, but several drugs have been approved on the basis of single trials without achieving quite such a stringent standard when Advisors and the FDA found the trials and supporting evidence otherwise compelling. Safety needs to be demonstrated by the sponsor; however, no rigorous statistical standard exists to define a lack of safety. Alarming clustering of SAEs, without statistical significance, can result in withholding approval.
The label.
About 20 years ago, a labeling amendment was requested for short-acting nifedipine that included an indication for treatment of "hypertensive urgencies." Three years earlier, the drug had been approved for prevention of angina due to vasospasm, as well as for typical effort-induced angina. Among other reasons, the Advisory Committee recommended denial of the amendment (the FDA concurred) because directions for use were inadequate: neither the proposed label nor the supporting evidence defined "hypertensive urgency," the condition for which the drug would be applied. More recently, a labeling amendment was sought for aspirin, already approved for prevention of ischemic events in patients with established ischemic heart disease (secondary prevention); the sponsor proposed the additional indication of event prevention in patients without clinically evident ischemic disease but with a clinical profile suggesting
20% event risk over the succeeding 10 years (primary prevention). Supporting data included multiple placebo-controlled trials (different risk profiles) involving more than 55,000 patients. No individual trial prespecified the risk profile envisioned in the new label, and not all achieved statistical significance for event reduction. However, the data suggested that: 1) aspirin reduces ischemic events across populations of various risk profiles, probably including persons who would form the newly targeted population; and 2) major bleeding events, including transfusion-requiring hemorrhage and hemorrhagic stroke, occur with sufficient frequency that a benefit might be judged to exceed the risk only if the pre-therapy ischemic event risk is
2%/year. Advisory Committee concerns included the difficulty of creating an equation that weighs disparate benefits and risks appropriately. More importantly, the target population was defined by a Framingham Study event risk algorithm that, although intrinsically reasonable, had not been applied, even retrospectively, to the populations that produced the benefit/risk data presented to the Committee. The aspirin labeling amendment proposal exemplifies a fundamental regulatory concern that arises whenever treatments are considered for prevention of events in asymptomatic patients. In part, such claims are based on the assumption that, on average, the long-term benefit/risk relation will be better if treatment is begun before evidence of disease develops than if applied immediately after non-lethal evidence emerges, because with the latter strategy, some patients will die before treatment application. This assumption may be correct but was not tested in any of the trials in which aspirin was studied and seldom is formally assessed in other parallel situations. (Conversely, it is also possible that treatment primarily prevents early events, in which case a therapeutic benefit could be missed if treatment began only after such events had occurred.) The FDA's approval for a preventive indication has potential legal as well as public health ramifications based specifically on the population targeted for drug use in the label. Therefore, considerable rigor should be applied in designing the supporting studies justifying the targeting of a population for labeling.
| Devices and biologicals |
|---|
|
|
|---|
Conclusions. The foregoing has summarized some of the principles underlying regulatory evaluation of therapeutics. Many important issues have not been discussed, including considerations relating to drug or drug-device combinations, selection of end points (particularly if composite), statistical concerns relating to "splitting alpha" for multiple primary end points, the potential for combining superiority and non-inferiority analyses in the same trial, the role of FDA guidance documents, the importance of defining a dose-response relation from minimally effective to maximally tolerated doses, the degree of processing that renders autologous biologicals subject to FDA approval, and others. However, most importantly, I hope this account has indicated the intensive effort to achieve scientific rigor, logic, and fairness that is central to the process of identifying effective and safe therapeutics to guard the public health. For this, all of us owe the FDA an extraordinary debt.
| Footnotes |
|---|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. S. Borer Angiotensin-converting enzyme inhibition: a landmark advance in treatment for cardiovascular diseases Eur. Heart J. Suppl., September 1, 2007; 9(suppl_E): E2 - E9. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Fox, R. Ferrari, S. Yusuf, and J. S. Borer Should angiotensin-converting enzyme-inhibitors be used to improve outcome in patients with coronary artery disease and 'preserved' left ventricular function? Eur. Heart J., September 2, 2006; 27(18): 2154 - 2157. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Borer Heart rate slowing by If inhibition: therapeutic utility from clinical trials Eur. Heart J. Suppl., September 1, 2005; 7(suppl_H): H22 - H28. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | SUBSCRIPTIONS | CURRENT ISSUE | PAST ISSUES | CARDIOSOURCE | SEARCH | HELP | FEEDBACK |