If you live in Kingston, you may have come across this headline:
Wonderful you think – after all, Kingston does have that small city charm, with lots of historical buildings, quaint little cafes and restaurants as well as being right on the water. Lots of romantic movie potential, where big city Sandra Bullock moves to a small town only to fall for lovable country mouse Ryan Reynolds.
And then you read the article more closely, and determine how they measured the “romanticness” of a city:
The online retailer bases its list by comparing sales data of romance novels, sex and relationship books, romantic comedy DVDs and CDs by Canadian crooner Michael Buble since Jan. 1 on a per capita basis in cities with more than 80,000 residents.
When doing research, it’s imperative that you collect the right data to answer your research question. This is particularly apparent when conducting surveys. For example, when we ask youth about physical activity, oftentimes what we really want to know is information about their cardiovascular conditioning, mental health or some other construct. Ideally, we’d love some more accurate measures of health: blood lipids, triglycerides or other biomarkers. But we can’t get those, so we’re going to get a proxy measure and run with that (pun intended).
When collecting data, we have several concerns. You could take a semester-long exposure measurement graduate course, as they do at McGill, but I’m going to keep this short and summarize what they said a paper by White et al. In short, deciding how to measure something is a balance between a number of factors.
First, you want to find the appropriate way to measure something. Either via blood samples, a questionnaire, objectively (like a pedometer or light meter), or some other way. Second, you want to consider cost. While it would be great to take blood samples on everyone, that is expensive. Sometimes you have to take the hit and write a big CIHR grant, but sometimes you can work around it and use self-report. Another workaround is to only measure a subsample of people directly and use them for your analysis.
There are other, more technical issues you have to consider as well. The dose itself i.e. at what point does the agent matter. We might be able to deal with a small amount of the agent, but is there a critical amount at which point your risks skyrocket? Your tool has to capture this. Also, the relevant time window; when does exposure matter. For example, exposure to a chemical in utero may have effects that exposure to the same chemical afterwards wouldn’t have. When designing your data collection approach, you need to bear that in mind. There’s also measurement error, which relates to the accuracy of your instrument.
Now back to the Amazon.ca report above. Let’s consider their measurement of “romance.” Is it appropriate? Obviously not. First, you’re using material wealth to define romance. Second, you’re using a very narrow definition of romance. Neither of these fill you with confidence.
Point number two: Is this cheap and inexpensive? Absolutely. Amazon has these data, and running the stats is super easy (which is probably why they did it). However, there would be HUGE misclassification – not everyone who buys a Michael Buble CD is purchasing it with romantic intentions – some may simply enjoy his songs and some may be confusing him with Josh Groban. Similarly, romantic comedies are enjoyed by a range of people in a range of circumstances.
So, in short, don’t move to Kingston for romance based on this study. And next time Amazon wants to conduct a study like this, they should really hire an Epidemiologist to help them out. Jeff Bezos – give me a shout!
White E, Hunt JR, & Casso D (1998). Exposure measurement in cohort studies: the challenges of prospective data collection. Epidemiologic reviews, 20 (1), 43-56 PMID: 9762508
Emily White, Bruce K. Armstrong, & Rodolfo Saracci (2008). Principles of Exposure Measurement in Epidemiology
Oxford Scholarship Online DOI: 10.1093/acprof:oso/9780198509851.001.0001