A Medical Writer's Guide to Reporting on Diagnostic Tests

            When diagnostic tests are evaluated, the critical question is whether the test can provide useful information to patients or clinicians. A highly accurate test can detect disease in an early, treatable stage; determine the correct therapeutic approach; or reassure someone that they do not have a disease. In contrast, a test that is inaccurate is not only uninformative but also potentially harmful. Therefore, the evaluation of diagnostic tests focuses on various measures of accuracy.

            The accuracy of new or improved diagnostics must be assessed against a reference or gold standard. This standard should be the best currently available method for diagnosing patients and should be an established practice within the medical community. Measures of accuracy assess the agreement of the new test and the reference standard.

Screen Shot 2017-09-21 at 2.03.15 PM.png

            For a qualitative test, we can start to assess accuracy by constructing a simple 2 x 2 table (Table 1). The number of patients testing positive and negative with a reference standard are subdivided by the results of the new test. A true positive is someone with the condition (as determined by the reference standard) who tests positive with the new test.  A true negative is someone who does not have the condition and tests negative with the new test. In other words, those are the people who were correctly categorized by the new test. When cases are inaccurately assessed by the new test, they are called false positives or false negatives.

Sensitivity and Specificity

            Sensitivity and specificity are the most commonly reported measures of diagnostic accuracy. Sensitivity is the ability of the test to correctly identify people who have the condition. Table 2 is a sample 2 x 2 table with mock data. Only the people who have the condition per the reference standard are included in the calculation of sensitivity (purple cells). Eighty-eight people have the condition in this example (60 + 28), and 60 were correctly identified by the new test. Therefore, the sensitivity of the new test is 60/88 = 0.68 (or 68%).

Screen Shot 2017-09-21 at 2.17.35 PM.png

            Specificity is the ability of the test to correctly identify those who do not have the disease or condition. Only the people who do not have the condition per the reference standard are included in the calculation (Table 2, green cells). Seven hundred and fifty people did not have the condition in this case (159 + 591), and 591 were correctly identified by the new test. Therefore, the specificity of the new test is 591/750 = 0.79 (or 79%).

            Sensitivity and specificity should never be presented as a proportion or percent only: Confidence intervals must always be provided to help the reader interpret the results. In our example case, the sensitivity is 0.68 (95% CI: 0.57–0.77), and the specificity is 0.79 (95% CI: 0.76–0.82). Note that the interval is narrower for specificity because the number of patients without the condition is much higher. In addition, sensitivity and specificity should be reported both as percentages (e.g., 68%) and fractions (e.g., 60/88).

Positive and Negative Predictive Values

            Calculating the positive predictive value or negative predictive value of a new test can help us understand how informative the test will be in a clinical situation. Positive predictive value (PPV) is the probability that a person testing positive for the condition with the new test actually has the condition. In other words, how often do women with suspicious findings on a mammogram actually have breast cancer? PPV is calculated using only the subset of people testing positive with the new test (Table 3, yellow cells). In our example, 219 people tested positive with the new test, and 60 of these actually have the condition. Therefore, the PPV is 60/219 = 0.27 (0.22–0.34). Again, this value must be presented with confidence intervals.

Screen Shot 2017-09-21 at 2.19.18 PM.png

            Negative predictive value (NPV) is the probability that a person testing negative for the condition with the new test really does not have the condition. In other words, how often do women with normal mammograms really not have breast cancer? NPV is calculated using only the subset of people testing negative with the new test (Table 3, pink cells). In our example, 619 people tested negative, and 591 of these did not have the condition. Therefore, the NPV is 591/619 = 0.95 (0.93–0.97).

            In our sample test, the PPV (0.27) may seem low and the NPV (0.95) may seem high. But remember the application. If the test is applied to the general population, an NPV of 0.95 will mean that many people with the condition will be missed. Depending on how quickly the disease develops, what alternative tests are available, and how the disease is treated, a test with this performance may or may not be acceptable.

            PPV and NPV are influenced by prevalence of the disease in the tested population. If a study enrolls 100 people with a rare disease and 100 controls, those who test positive for the rare disease are far more likely to have the disease than a random sample from the general population. Therefore, the PPV and NPV for the study will not be the same as for the general population. In our example in Table 3, the NPV is 0.95. If the sensitivity and specificity are kept the same, but twice as many people with the disease are enrolled, the NPV drops to 0.91. In some studies, information on the prevalence of the condition in the general population is used to calculate an adjusted NPV and PPV.

Likelihood Ratios

            Like negative and positive predictive values, likelihood ratio pairs can be used to demonstrate the value of a new diagnostic test. The positive likelihood ratio is the probability that a person with the disease will test positive for the disease, divided by the probability that a person without the disease will test positive. In other words, it is the rate of true positives divided by the rate of false positives. The positive likelihood ratio will be greater than one for an effective test, with more effective tests having higher likelihood ratios.

            In combination with the pretest probability of disease (primarily influenced by prevalence), the positive likelihood ratio can provide a clinician with the odds that the patient does have the disease. In our example (Table 2), the positive likelihood ratio is 3.2 (2.6–3.9), so the probability that an individual testing positive with the test actually has the condition are increased 3.2-fold over his or her pretest probability. This would have a small effect on the posttest probability of disease (Table 4).

            The negative likelihood ratio is the rate of false positives divided by the rate of true negatives. An effective test will have a negative likelihood ratio less than one. In our example, the negative likelihood ratio is 0.40 (0.30–0.55) and would moderately decrease the posttest probability of the disease (Table 4). For comparison, a CT scan for diagnosing a condition such as appendicitis would have a negative likelihood ratio of about 0.05.

Receiver Operator Characteristic Curves

            So far we have assumed that the new test results are either positive or negative. But what if the new test is quantitative? If the test is to be used to indicate the presence of the disease or to select patients for follow-up tests, a cut-off value is often used to divide patients into positive and negative groups. The cut-off point must be optimized in one study, and then validated in one or more additional studies.

Screen Shot 2017-09-21 at 2.25.50 PM.png

            As the cut-off value for a new test is increased, fewer test results are called positive, and false positives decrease but false negatives increase. Lower cut-off values result in a higher number of false positives and fewer false negatives. A receiver operator characteristic (ROC) curve illustrates the trade-off between sensitivity and specificity. The true positive rate (or sensitivity) is plotted against the false positive rate (or 1–specificity) for all possible cut-off values (Figure 1). A test that has no diagnostic value will have equivalent true and false positive rates at all cut-off values and will therefore have an ROC curve that falls along the green diagonal line. More effective tests will curve above the diagonal line (red and blue lines).

            The area under the ROC curve (AUC) can be quantified to compare the performance of two different tests. An AUC of 0.5 indicates that the test has no diagnostic value. As test performance increases, the AUC will approach one. The test depicted with the red line in Figure 1 has an AUC of 0.87. The test depicted with the blue line has an AUC of 0.68.

Screen Shot 2017-09-21 at 2.27.06 PM.png

             ROC curves can be a visual tool for finding the optimal cut-off point for a new test. Two tests with the same AUC are graphed in Figure 2. The test depicted with a purple line might work well at the cut-off point that results in 70% sensitivity and 95% specificity (red arrow). The test depicted with an orange line might work well at the cut-off point that results in 85% sensitivity and 80% specificity (red arrow). The test depicted with an orange line might work well at the cut-off point that results in 85% sensitivity and 80% specificity (blue arrow). However, when setting a cut-off point, the clinical situation in which the test will be used must always be kept in mind. A test with 80% specificity might be unacceptable for screening the general population, and test developers might choose to alter the cut-off point to increase the specificity and reduce the sensitivity.

Reporting on Diagnostic Tests

            When we present a new diagnostic test to regulatory agencies or journals, we must include appropriate measures accuracy (sensitivity, specificity, NPV, PPV, likelihood ratios) and the ranges of test values. If the test is quantitative, we should consider presenting an ROC curve (with AUC) and a histogram of test results in patients with and without the condition. All subjects and test results should be accounted for, so that the reader knows how many subjects were enrolled, tested, and included in the analysis. Wherever possible, confidence intervals should be included. The study design should be presented clearly so that all sources of bias and efforts to reduce bias are explicitly stated.

Five reasons to start working with a medical writer early in your project

Hiring a medical writer can be a huge time-saver. One way to make sure the arrangement is as productive and cost-effective as possible is to begin to work with the medical writer early in the project. Here is why:

  1. For clinical development in particular, later documents build off early documents. If those early documents are done well, with well-presented rationale and accurate details, the later documents will be much more straightforward.

  2. Good medical writers have an eye for detail. We know how to carefully keep track of all pieces of data so that no errors are introduced during the writing process. If we are called on to edit a late-stage draft of the document, it is too late to prevent or catch those errors.

  3. The best way to figure out if you have the right freelance writer is to work with them on a project. If you bring one in early in the process, you have a chance, before crunch time, to make sure you have the right writer for the job and to work out a good system for working together.

  4. Experienced medical writers have a thorough knowledge of the final product and can use this knowledge early in the process to help you identify key deliverables, plan and manage timelines, and even design the study. Some researchers conduct clinical trials with the goal of publication in a high impact factor journal, but after completing the trial are disappointed to find that their trial design or their poorly written protocol precludes them from their target journal.

  5. Lastly, good medical writers are busy. If we have a long-term relationship with a client, we are aware of your timelines and can ensure that we have time in our schedules to prioritize your project.

Comparisons

If there is one simple writing skill all graduate students should be taught, it is how to write comparisons correctly. There are several ways to mess up comparisons, but there is one I see very frequently, from native speakers and non-native speakers alike: compared to is used instead of the word than.

Here are some examples paraphrased from works I have edited:

“The sodium concentrations were higher in sample A compared to sample B.”

“Spring had a much higher percentage of readings above 100 μg/m3 compared to the fall.”

In both examples, compared to should simply be replaced with than:

“The sodium concentrations were higher in sample A than sample B.”

“Spring had a much higher percentage of readings above 1000 μg/m3 than fall.”

In some cases, a little more work is necessary to correct these second-degree comparisons. Here is an example:

SOD1 mRNA was significantly elevated in tumors as compared to normal tissue.”

We need to change the word elevated to a comparative adjective and then switch the compared to to than:

“SOD1 mRNA was significantly higher in tumors than in normal tissue.”

Unfortunately, these errors are so common in scientific writing and everyday speech that they can be hard to catch. But pay attention to those comparative statements, see what happens if you replace compared to with than, and your scientific writing will be more clear and concise.

Liquid Biopsies

The number of publications identified in a Pubmed search for "liquid biopsy" has increased more than ten-fold in just three years (see chart). Dozens of companies have formed in the past five years to develop liquid biopsy tests for cancer. In recent reports, analysts have estimated the liquid biopsy market to be tens of millions of dollars. So why are liquid biopsies now the focus of so much attention? 

Screen Shot 2016-11-17 at 11.08.21 AM.png

For years research has focused on gathering the knowledge we need to prevent cancer as much as possible and treat it effectively when it does arise. Now, through all that research, we have gained a deep understanding of the process of carcinogenesis and the associated molecular events, and we are even starting to understand how those molecular events influence the response to various treatments.

But how can we take advantage of all of that knowledge in a clinical setting? That's where liquid biopsies come in. With a liquid biopsy, blood is drawn at the time a clinical decision needs to be made, and the circulating tumor DNA or cells are analyzed. For many patients, a recent tumor tissue biopsy is just not available, but a liquid biopsy can be taken at any time a blood draw is possible.

In the late 1970s, long before liquid biopsy became a buzzword, researchers were beginning to hypothesize that nucleic acids were released from tumor cells, either through an active process or through cell death. Evidence began to grow to suggest that tumor-derived material could indeed be found in the plasma of patients with cancer, and through the 1980s and 1990s, numerous reports were published demonstrating that the same molecular aberrations found in tumors could be found in circulating DNA.

Scientists understood the potential value of this material, but numerous technical hurdles prevented this from being easily translated to the clinic. Circulating tumor DNA is only present in small amounts and is quite fragmented. Furthermore, tumor-derived DNA is only a small portion of the total free DNA in plasma. Small changes in blood collection procedures can have a large effect on the amount of free DNA from healthy cells. So any assay targeting tumor-derived DNA must be sensitive and specific.

In the past ten years, knowledge of the molecular alterations in cancer and development of appropriate technology have both advanced enough to make liquid biopsy assays a reality. Dozens of companies have started to work hard developing liquid biopsy assays, and there has certainly been a lot of hype. The Illumina spin-off Grail drew attention when it was backed by over $100 milllion dollars from Bill Gates, Jeff Bezos, Google, and others. Pathway Genomics' assay was featured on the TV show "Keeping Up with the Kardashians."

But is the hype justified? Many companies are already offering liquid biopsy assays in their own CLIA labs, and data to validate the assays is rapidly being generated. Last spring, Guardant Health presented the results of their assay in 15,000 patients. Furthermore, in April, Epigenomics' Epi proColon became the first circulating tumor DNA-based assay to gain FDA approval. Thus, liquid biopsies will likely become standard of care in the coming years.

Biomarkers in Early Clinical Trials

            Carefully chosen biomarkers incorporated into clinical trials can provide critical information for drug developers. With biomarker data, researchers can more accurately determine which programs should advance, at what doses, and in which patients. More and more pharmaceutical companies are emphasizing broad biomarker plans for each drug in development in order to reap these benefits.

            Pharmacodynamic markers, which indicate that the drug is having the desired effect, can be either proximal (drug target-related) or distal (disease-related). A proximal marker, such as phosphorylation of a kinase substrate, indicates whether the drug is active as expected. A distal marker, such as circulating tumor cells, provides exploratory evidence that the drug is having an effect on the disease.

            Pharmacodynamic markers are especially important in early clinical trials. In first-in-human oncology trials for instance, there is not much opportunity to see clinical efficacy, but a pharmacodynamic marker can reassure study sponsors that the drug is engaging its target and is active in vivo. Furthermore, exploring the relationship between drug pharmacokinetics and pharmacodynamics is helpful for optimizing the dose levels in future trials. In some cases, there might not be a dose level that is both well-tolerated and pharmacodynamically active.

            Although pharmacodynamic biomarkers can potentially provide critical information, they must be carefully chosen and implemented. Many times, the biomarkers must be assessed in a surrogate tissue, such as blood, if the target tissue is inaccessible or repeat biopsies are not feasible. Furthermore, early clinical trials are often small, and if biomarker sample collection is not prioritized or biomarker assays are not carefully qualified, then the data might not be interpretable.

            A report released earlier this year by the Biotechnology Innovation Organization (BIO) called "Clinical Development Success Rates 2006-2015" states that programs that use biomarkers to select patients are three times more likely to be approved than those that do not use selection biomarkers. Knowing the group of patients that is most likely to respond is clearly an advantage. Patient selection or stratification markers ideally begin to emerge during preclinical research, but many are still exploratory in early clinical trials and cannot be truly tested until the bigger phase II or III trials. However, early trials can provide some data, perhaps through correlation with pharmacodynamic response, and they offer an excellent opportunity to fine-tune biomarker assays and sample collection protocols.

            Biomarker programs can certainly add to the cost of a clinical trial, but more and more companies and organizations are realizing that a carefully designed and implemented biomarker program can help them make the right decisions.

A New Option for Colorectal Cancer Screening

            Although colorectal cancer is a leading cause of death in the United States and is treatable when detected early, about one-third of eligible adults have not been screened for the disease (1,2). Several screening options exist, including the gold standard colonoscopy. But many people are afraid of discomfort during colonoscopy or anticipate being inconvenienced by a stool-based test, and as a result, they decide not to undergo screening.

            A good screening test must not only be technically capable of finding cancer accurately, but it also must be acceptable to patients and doctors. Furthermore, offering two or more screening test options can increase screening compliance (3). With this in mind, researchers at a company called Epigenomics set out over 15 years ago to develop a blood-based test for colorectal cancer.

            Scientists at Epigenomics were aware of two critical developments. First, an abundance of reports were being published on hypermethylation of specific tumor-growth-related genes in various cancers. Second, there was increasing evidence that tumor-derived DNA could be found circulating freely in blood. Epigenomics identified a gene frequently hypermethylated in colorectal tumors (Septin 9), developed a highly sensitive assay for the gene, and optimized protocols for obtaining tumor-derived DNA from plasma.

            The resulting test, called Epi proColon, has been used to analyze plasma from thousands of average-risk adults of screening age. The test detects about 68% of cancer cases (with 80% specificity), and was found to be non-inferior to another FDA-approved screening method, fecal immunochemical test (FIT). In April of 2016, the Epi proColon test was approved by the FDA for use in patients who choose not to undergo screening by other recommended methods. In other words, the one-third of eligible Americans who are not currently screened for colorectal cancer now have another option to consider, one they might find more acceptable.

            The approval of the Epi proColon test is a huge success story for those who have advocated for early detection of cancer, for those who have worked hard on the promise of molecular diagnostics, and of course, for the patients whose cancer will be detected early. But hurdles remain for the test. Medical professionals must be made aware of the new option, and Medicare and private insurance companies must reimburse for the test. Recently, legislation was introduced by representative Donald M. Payne that would provide Medicare coverage for all FDA-approved blood-based colorectal cancer screening tests, including Epi proColon.

1. Joseph DA, King JB, Miller JW, Richardson LC. Prevalence of colorectal cancer screening among adults – behavioral risk factor surveillance system, United States, 2010. MMWR. 2012;61 Suppl: 51-56.

2. Klabunde CN, Joseph DA, King JB White A, Plescia M. Vital Signs: Colorectal Cancer Screening Test Use - United States, 2012. MMWR. 2013;62(44): 881-888.

3. Inadomi J, Vijan S, Janz NK, Fagerlin A, Thomas JP, et al. Adherence to Colorectal Cancer Screening: A Randomized Clinical Trial of Competing Strategies. Arch of Int Med. 2012;172(7): 575–582.