A window on data can be a window on discovery
The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka” but “That’s funny...”
Isaac Asimov (19201992)
The Princeton polymath John Tukey (19152000) observed that “the greatest value of a picture is when it forces us to notice what we never expected to see.” A graphic display can only develop the sort of forceful personality Tukey suggested when it is prepared carefully. As we shall see, when the combination of interesting data and clever display are properly aligned, remarkable outcomes can result.
Historically bacteria have been classified by a variety of characteristicstheir physical features when viewed through a microscope, their response to various stains, their interactions with chemical agents and so on. In the mid- 20th century a new set of criteria began to be addedhow they are affected by specific antibiotics.
In 1951, the famous graphic designer Will Burtin (19081972) published a graphic display that was admired for the clarity and economy with which it showed the efficacy of three antibiotics on 16 different kinds of bacteria. The dependent variable was the minimum concentration of the drug required to prevent the growth of the bacteria in vitrothe minimum inhibitory concentration (MIC). The three drugs were penicillin, neomycin and streptomycin, and their efficacy varied over six orders of magnitude. The original data and the representation of the data devised by Burtin are shown on the facing page (top). The scale varies from 1,000 micrograms per milliliter on the innermost ring to .001 micrograms per milliliter on the outermost; the longer the bar, the greater the efficacy of the antibiotic.
The display is divided into Gram positive or negative, depending on whether the bacteria in question take up Gram stain or not. (The stain is named after its inventor, Hans Christian Gram ). Burtin’s display focuses on the efficacy of the antibiotics. Careful study of it reveals that if a bacteria is Gram positive, it can be efficaciously treated with a combination of penicillin and neomycin; if it is Gram negative, neomycin is the treatment of choice. But the display does not allow us to easily compare the profiles of bacterial response to the three antibiotics. Choosing which data trends get emphasis is a fundamental problem of graphic display.
We can think of the component data as a matrix of 16 bacteria and three antibiotics. The two kinds of questions that naturally arise are:
1. How do the drugs compare?
2. How do the bacteria group together?
Will Burtin’s display aims at the former much more than the latter. It is easy to see why each of these two questions requires a different construction (although sometimes a very clever display can provide ingress to both). An example from Jacques Bertin (1918 ) has become a canonical illustration of this fact (see figure at right).
The map at the top of the display provides an answer to such questions as “What is produced in Nebraska?” A glance tells us corn. But if instead we were to ask the obverse question, “Where is corn produced?” this construction is of little use. To answer it we would need to look at every state and remember which ones produced corn. For questions of this latter type we would vastly prefer the compound construction at the bottom, in which there is a map for each product with producer states shaded.
This emphasizes the importance of deciding, before designing a display, what questions it will answer. Obviously Burtin’s choice was to emphasize the comparative effectiveness of the drugs, rather than to group the bacteria by their comparative susceptibility to the drugs. As we show next, this had at least one unfortunate consequence.
If we were to focus on the task of comparing bacteria, a different display might be more helpful. Consider the design shown at right. Each bacterium is given its own icon with vertical bars indicating the MIC of each drug for that organism. The horizontal line depicts what might be considered the maximum plausible dosage; thus, bars extending down from that line depict clinically efficacious drugs. The bacteria are arrayed based on their resistance to all three drugs, two of the three, one or none.
Looking at this display, specifically at the row of bacteria resistant to streptomycin and neomycin, we see something funny. The pattern of response to the antibiotics of all three bacteria is essentially identicalyet two of these bacteria are Streptococcus and one is not. That seems odd. What is Diplococcus pneumoniae doing there? And why does the third Strep bacteria, Streptococcus fecalis (in the next row up) appear to be so different? One would think that bacteria within a genus would be vulnerable to the same compounds.
Because these oddities were not easily visible in Burtin’s display, neither his nor anyone else’s curiosity was piqued. Had this odd pattern been detected, perhaps it would not have taken until 1974 for Diplococcus pneumoniae to be recognized as a Streptococcus and to be renamed Streptococcus pneumoniae.
And why is Streptococcus fecalis so different? It would seem that its credentials as a member of the Strep family are impeccable; as Sherman, Mauer and Stark described it in a 1937 issue of Journal of Bacteriology:
In some respects Streptococcus fecalis … might be considered one of the better established species of the streptococci, and certainly some of the rather unique characteristics of this organism, or the general group to which it belongs, are commonly known by bacteriologists.
Yet in 1984, its genus was changed and its name became Enterococcus faecalis. Perhaps had the Burtin data been plotted in a way that allowed us to more easily compare the profile of responses of these various bacteria to the antibiotics, the classification of Streptococcus fecalis would have come under scrutiny sooner.
The clustering of bacterial types and sensitivity to antibiotics becomes even more evident with a simple scatterplot in which we plot each bacteria’s MIC for both neomycin and penicillin. Streptomycin, not shown, has MIC values similar to neomycin. The penicillin response is quite different from the other two.
But these more specific plots were only generated after we knew what to look forafter the display shown at left allowed us to see what we hadn’t expected.
“Evidence-based science” is an ironic term, for what else could it be? Faith-based? The start of empiricism is usually credited to Aristotle (384 B.C.322 B.C.) but its pathway thereafter was not smooth, for once one commits to using evidence to make decisions, facts take precedence over opinion. And not all supporters of an empirical approach had Alexander the Great to watch their back. Hence it took almost 2,000 years before Francis Bacon (15611626) repopularized the formal use of evidence, which was subsequently expanded and amplified by the British empiricists John Locke (16321704) and George Berkeley (16851753) and the Scottish David Hume (17111776).
But having a formal epistemological basis for evidence-based science was not enough. Making the most of evidence required effective methods for presenting it. Language, developed long before science, was not an ideal match. Mathematics became the language of science but it was ill-suited for looking at evidence. In the 17th century large tables of data were compiled, but this was not an answer; indeed, two 19th-century economists emphasized the inadequacies of tabular presentation in their oft-quoted quip:
A heavy bank of figures is grievously wearisome to the eye, and the popular mind is as incapable of drawing any useful lessons from it as of extracting sunbeams from cucumbers.
Arthur Briggs Farquhar and Henry Farquhar, 1891
Instead, a graphical approach was developed and given a jump start by the Scot William Playfair (17561823) in his 1786 Commercial and Political Atlas. Playfair’s methods were picked up broadly, and by the 19th century there was a huge proliferation of scientific atlases, filled with pictures, that were so laconic that the words almost disappeared entirely. The graphical movement had a simple ideait supported the atheoretical plotting of data points which were then searched for suggestive patterns. This largely empirical approach reached a full consilience in 1977 with the publication of John Tukey’s influential Exploratory Data Analysis.
And yet despite the warning of the brothers Farquhar, and the evidence of more than a century of scientific investigation, the table remains by far the principal means of communicating scientific evidence. In the figure at right is a summary of the kinds of graphic forms used in the Journal of the American Medical Association in 2008. A similar picture would describe all of the other journals we have looked at. This tabular approach is a mistake. There is much that is being missed that might otherwise be found.
This investigation is meant only as one example of what could have been found if we had but taken the trouble to look. As we mentioned previously, the early days of bacterial taxonomy had bacteria being classified using whatever tools were available; how they looked under the microscope, how they reacted to stains and so on. In 1951, when the data used by Burtin were published, antibiotics were relatively new, so using a bacteria’s profile of reaction to antibiotics, while an obvious entry into their deep structure, was not yet a common tool. This demonstration shows what could have been found had the data been displayed optimally. The classification of bacteria by their reaction to antibiotics is now anachronistic for bacterial taxonomy. Modern classification defines the relationships within genera by a unique RNA sequence (16S RNA) that is part of the bacterial ribosome. This sequence is like a fingerprint for bacterial genera and species. But it is nice when the message of the 16S RNA confirms the phenotypic response and the tale told by the well-chosen diagram.