Correlation And Causation With Some Plausibility For Good Measure

One of the biggest problems in the battle against diseases is figuring out exactly what thing or things cause a disease. In the late 70’s and early 80’s, men and women – a lot of them being gay – started to come down with opportunistic infections at an accelerated rate. The cause was not known, but epidemiologists did come to realize that those who developed the syndrome – later to be called AIDS – were more likely to be exposed to certain behaviors and sexual preferences. That is, the personal attributes and the disease correlated, but it would be unscientific and wrong to say that one would get AIDS for the sole fact of being gay.

That sure didn’t stop the “moral majority” and others from stigmatizing an entire segment of the population. It wasn’t until HIV was isolated and discovered as the causative agent – and some heterosexual celebrities acquiring the infection – that the term GRID (Gay-Related Immune Deficiency) beam AIDS.

Of course, AIDS is not the only example.


I told you just the other day how nationalities and ethnicities are associated with certain conditions. Lou Dobbs wrongfully claimed that immigrants brought more cases of Hansen’s Disease (Leprosy) to this country than naturally occur. A discussion on recent cases of measles in Milwaukee undoubtedly turned into an immigrant bashing that would have made the most liberal KKK members blush. And God help you if you’re from Africa and trying to donate blood.

All of these examples above are instances of correlation between a disease and a person’s (or people’s) origin. Biologically speaking, it doesn’t matter where you come from. You’re still game to be infected.

But there are other examples were people have wrongly associated two things and then deduced that one caused the other – or vice-versa. For example, two non-scientists writing for an anti-vaccine blog recently published a seven-part story associating arsenic in pesticides with polio. They eyeballed data from the last 120 years and decided that arsenate in pesticides must trigger polio outbreaks because more polio outbreaks have been detected since the use of those pesticides started.

Sounds plausible, don’t it? Well, actually…

We all had a chuckle when a skeptical writer recently said that cases of autism have been on the rise since microwaveable popcorn went into the mass market. If you plot the incidence of autism and the sales of microwaveable popcorn, the lines almost overlap. Again, that’s just “eyeballing” the data without much of a scientific investigation. And that’s where a lot of assumptions about causation go badly.

Even if a study is well-designed and carried out by reputable institutions, there can be mistakes. For example, some time ago, a study was performed to look at the association between coffee and pancreatic cancer. The study concluded – with really good statistical data – that people with pancreatic cancer were more likely to be coffee drinkers. The researchers left it at that and walked away from their study, letting the public decide on whether or not to drink coffee.

Well, astute epidemiologists the world over noticed a funny thing. They noticed that the data never took into account the coffee drinkers’ smoking habits. Once the smokers and non-smokers were placed into different categories, people with pancreatic cancer were more likely to be smokers AND coffee drinkers. People without pancreatic cancer were more likely to be coffee drinkers BUT NOT smokers. Yes, it was the smoking, stupid – or the smoking stupid. The coffee industry took a while to make a comeback after that.

This brings me to biological plausibility of the whole damn thing. In the example of the coffee and cancer, there was no known biological process by which coffee could trigger pancreatic cancer. On the other hand, there was a process by which smoking could trigger pancreatic – and other forms of – cancer. In the example with arsenate in pesticides, there is no known process by which pesticides somehow make the polio virus more virulent – capable of infection – or more pathogenic – capable of causing disease. And, in the case of HIV/AIDS, one could see where certain sexual behaviors could lead to a better transmission of the virus, but there is no evidence whatsoever that one’s sexual predilections – who we’d like to shag – would make any difference to the virus. It will still infect all who are susceptible, meaning the whole of humanity if we’re not careful in how we use contaminated sharps, how we have sex with each other, and how we test blood donors (regardless of their national origin or sexual orientation).

Does the scientific data change and certain once-implausible events become plausible? Absolutely. But they’re far and few in between, and the scientists who once held them to be implausible will correct themselves and admit that there is now evidence of plausibility. And, for God’s sake, don’t just “eyeball” the data. Run through something, anything… Even MS Excel will do in a pinch!

In the case of vaccines and autism, that kind of evidence still has not come forth, despite all sorts of attempts at finding it. In fact, we have been trying to close the book on the vaccine-autism “debate”, but the anti-vaccine people won’t let it go. Once they do, we will be able to move on and help autistic people not only know why they are autistic but how to better treat them so they may live fuller lives.

I ended by talking about anti-vaccine advocates for a reason, by the way. I was asked recently if I thought anti-vaccine advocates presented an existential threat to the world or even just the country. It’s a question that will be asked of the protagonist in “The Poxes” twice. He will be asked about it on “Vaccination Day” and then again in “Five Years Since”. His answer will be much like my own five years ago and then again a few days ago. So look for that.

How Many Was That Again?

Have you ever noticed that reports of case counts from public health sources usually have the word “reported” included in them? You have, haven’t you? Well, have you ever wondered why that is so?

Click to enlarge

The reason for that is because of the inherent nature of epidemiological surveillance and the barriers to getting an exact case count for every single disease or condition out there. Some of these issues with surveillance make for an overestimation of the number of cases. Other issues make for an underestimation of the number of cases. In all cases, it is highly unlikely that you are seeing the true number of cases in any report from public health.

Does that make these reports not useful or even – as some will claim – “manipulated” in any way? Not necessarily, and let me tell you why…

CASE DEFINITIONS
The first thing you need to understand in analyzing descriptive data presented to you from public health sources is the case definition being used in counting cases. A case definition is usually presented in terms of person, place, and time. For example, a case of Salmonella food poisoning may be defined as “anyone with a stool culture positive for Salmonella who ate avocados in Pittsburgh in the week of December 8 to 15″. That’s pretty specific, right?

Case definitions can also be very broad, like saying that a case of Salmonella food poisoning is “anyone with gastrointestinal disease with an onset of December 10 to 17”. This definition would surely bring up many more cases than the cases from the previous, more stringent case definition. So you can see why you need to know exactly what defines a case.

DIAGNOSTIC TOOLS
Likewise, you need to know what diagnostic tools are being used to define a case. In our example above, we used a stool culture to define the specific case definition and a clinical description of “gastrointestinal disease” to define the second. When being presented with data, make sure that you know what diagnostic tool – or tools – was (were) used. It makes a big difference.

For example, in the late 1970’s and early 1980’s, we had very little with regards to technology to isolate the Human Immunodeficiency Virus (HIV). So an HIV infection had to progress to Acquired Immune Deficiency Syndrome (AIDS) – a collection of signs and symptoms of the deterioration of the immune system – in order to define a case of HIV infection. AIDS itself was very broad at first, and the definition then was refined. As more and more diagnostic tools have been made available, the case definition of HIV and AIDS has changed. Where the presence of an opportunistic infection was once enough to diagnose a person with AIDS, there are now lab tests to look at the white blood cell counts and diagnose earlier in order to intervene and treat earlier.

AUTISM
The example with HIV/AIDS above is true of autism as well. It used to be that there was no uniform diagnosis for autism – or any of the conditions that fall within the autism spectrum. Children were either “hyper”, or “retarded”, or “slow”, or had some other condition. As medical science began to understand what it meant to be on the autism spectrum, the definition of someone with autism changed, leading to better recognition of cases and a subsequent rise in the prevalence – the underlying rate of disease in a population – that we see now.

Incidentally, the case definition for autism became more sensitive and specific – and thus more accurate – around the same time that vaccines began to be more abundant and more recommended. This lead to the misperception that vaccines raised the rates of autism and not the better diagnostic tools. But that is for a whole other discussion.

BETTER SURVEILLANCE
It goes without saying that an improvement in surveillance methods also leads to a change in the number of cases observed and counted. For example, infant mortality reporting has gotten better as more and more health care providers in the United States are able to report infant deaths electronically. Health departments at all levels of government are more active in their surveillance of cases by surveying hospitals, clinics, and even midwives on the survival numbers of infants. So you can see how this extra effort to count the deaths that were previously not reported has led to the belief that the infant mortality rate in the country has increased.

Other countries don’t have the same systems as we do in the United States. As a result, their infant mortality rates are different – even lower –  than those observed here. Is it true, then, that the US is failing in controlling infant mortality compared to countries with less resources? Nope. It’s all in how we’ve been counting the numbers. Apples to apples, the rates are much better in the United States, where expectant mothers have better access to prenatal care and children are – for the most part – born in medical facilities capable of caring for them if they are in trouble.

CONCLUSION
So here is what you do when you compare two rates of a single disease either across time, across location, or even across populations of people. You need to make sure that the case definitions of both datasets are comparable and as close to matching as possible. Otherwise, you really are comparing apples to oranges. You also need to look at the diagnostic methods used for each dataset. There is no use in comparing one dataset whose cases were diagnosed based on symptoms – a subjective way of diagnosing – and another dataset whose cases were diagnosed by a lab – an objective way of diagnosing. Finally, you need to look at the surveillance system that collected these data and make sure that the systems for both sets of data are – yet again – comparable. If one relied on providers reporting cases while the other went out and looked for cases, then – yet again – you will find yourself comparing apples to oranges.