How to make facts: a primer on research methods

To get knowledge, you need to use deduction in the form of clear models as well as induction in the form of both qualitative and quantitative data. If any piece of this machine is missing, you’re liable to be misled.


Let’s a get few definitional matters out of the way (for more on the themes of section, see Popper's The Logic of Scientific Discovery and Conjectures and Refutations). First up, what is a fact? It is a hypothesis that has passed rigorous testing in the sense that the tests did not refute it. On this basis, we can treat the hypothesis as a fact. Until it has passed the tests, it is mere conjecture. If it fails to pass the tests, then it is wrong. At no point does a fact become truth. Truth is not accessible because of the problem of induction. This is the idea that if you drop an egg on the floor a thousand times and it always breaks, it remains possible that on the 1001st time you drop it, the egg will float up to the ceiling. Maybe you just haven’t seen the entire probability distribution of outcomes in your sample. A similar problem confronts the turkey who goes to the trough for 1000 days and is always fed, but on the 1001st day has his neck broken and his body eaten. A more extreme theoretical conjecture is the colour grue. A grue object will be green until some moment in the future, at which point it will turn blue. You have no idea at present whether it is green or grue, and won’t ever know because you may never observe the moment when it turns blue. We can never confirm whether a hypothesis is true by observation, we can only refute hypotheses by observation.

Tests are about the matter of empirical evidence (that is observed evidence) in favour or against a hypothesis. The key function of quantitative evidence in this regard is to convince us that we have seen a large portion of the probability distribution of events. If the turkey goes to the trough twice and gets fed, that is less convincing than if he goes 1000 times. The main purpose of qualitative evidence is to convince us that we have understood the full complexity of the phenomenon we are investigating. If we observe that on average, if 1000 chickens go to the trough, only 1 is shot, we might be inclined to think that going to the trough rarely ends in being shot. A qualitative analysis of that shot turkey however, would reveal the complexity of the farmer’s decisions around when to shoot and when to feed. We would consequently be enlightened and directed to a richer hypothesis and more complicated tests.

Note that I said ‘convince’ in the paragraph above. I chose this word deliberately. You can never be certain in empirical inquiry. The best you can do is present convincing evidence that holds until some equally convincing evidence emerges in the opposite direction which refutes your hypothesis. This is the basic idea in Bayesian as opposed to frequentist statistics, but let’s leave that aside.


Both quantitative and qualitative evidence are required to arrive at good quality facts. Here’s a thought experiment to demonstrate what happens when you’re missing one or the other, inspired by this quite gooddiscussion of how ethnographies are frequently misused in sociology and anthropology.

Imagine you want to investigate youth crime in inner-city slums in America. Imagine if the distribution of crime in such places looks a bit like this (please note, this graph is absolutely not to scale; embarrassingly so):

If you just run a regression on this macro-level data, you’ll find that the average number of crimes committed by youths is 2. That might worry you or it might not; shoplifting, graffiti and underage drinking among teenagers is pretty common. We’re not talking convictions or arrests here, just delinquent behaviour.

Importantly, this number 2 doesn’t tell you a whole lot. We’re talking about millions of people in this sample. If a sizeable number of them fall into a part of the distribution where the number of crimes does become subjectively worrying (say 100 000 of them commit 20 crimes by 30) then you should become interested from a policy point of view. It’s important to take in the whole distribution, not just the average.


Now imagine, and this is where things become really interesting and complicated, that you want to understand what causes youths to commit crimes. Can you do this using the quantitative method? Maybe. If you have rich data you can run some regressions with a range of covariates, like absentee father, school socioeconomic status, IQ, mental illness, parent’s education, parent’s occupation etc. You might turn up some very interesting results.

If you want to go deeper, you’re going to need to get qualitative. One way of doing that is to do an ethnography. Plant yourself in amongst the phenomenon you are trying to study and observe things close-up and first-hand. Get an appreciation for the complexity of the situation. Perhaps after spending a year with a young gang you’ll discover that the primary causes of crime as far as you can tell are machismo, a lack of other ways to make a living wage and peer pressure.

The problem comes when you try to generalise these observations to everyone else who is committing crimes. Perhaps your gang is a very unusual gang. Perhaps a gang is the wrong place to study people who commit around 20 crimes in their youth—maybe campus drug dealers would be a more appropriate ethnographic sample. Now you need to go back and get quantitative again, doing multiple enthnographies to confirm your suspicions.


Alternatively, you could take the relatively cheaper and quicker option of just collecting information on the variables that you think are important, namely machismo, job opportunities and peer pressure. At this point, you’re going to need a clear theory, which is where deduction comes in.

Before moving to that though, I want to quickly illustrate what goes wrong in qualitative and quantitative study. First, qualitative. What Airaksinen is highlighting in this piece is the selection of extremely marginal cases for ethnographic study and then the generalisation of findings from such studies to the mean. In my little graph above, it means studying someone in the tail and then arguing that the experiences of that person are similar to someone way back up around the average of the distribution. This is wrong and/or duplicitous. There are times when this sort of thing is very valuable, but you need to be clear about what you are studying. You will only be able to do that if you understand quantitative methods as well as qualitative.

Next, quantitative. I’ll tell a story here that actually comes off making the quant people seem like the smart ones (I’m in a quant discipline) but it illustrates the point quite well. A friend of mine did an excellent paper during his PhD on the average impact of palm oil development on poverty in Indonesia. Anytime you hear the word average you know you’re dealing with a quantitative study. The effect was quite large. Of about the 10 million people lifted from poverty in Indonesia in the last decade (up to 2013), around 1.3 did so with the help of income from palm oil. Importantly, the distribution of effects was quite wide—there were places where the effect was large and places where it was small. At the end of the presentation, an anthropologist flew into a rage in the Q&A (having been in a rage throughout it) arguing that in her extensive experience tramping various palm oil plantations the outcome was definitely shit for a range of reasons that you can only observe if you’re doing a qualitative study. Unfortunately for her, my mate was on top of all that and said that the explicit purpose of his study was to look at the average effect because at present there were only ethnographic studies and they were either ‘everything is great’ or ‘everything is shit’. Someone needed to look at the whole picture.

What comes next is some understanding of what makes palm oil work for poverty in the successful cases and what makes it fail to alleviate poverty in the others. For this you need qualitative and quantitative methods again, but you also need theory. You need a model. You need deduction.

There is an unfortunate tendency in a lot of scientific work to do very little theory and proceed at a tediously incremental pace. My discipline, well-being, suffers acutely under this burden. When you abstain from theorising you inevitably end up looking at the world through a keyhole. The picture you get through that keyhole is very clear, but it is only a small part of the bigger picture. If you open the door and get that whole view, you might discover that the keyhole image is misleading you. An illustration comes from my mother’s work on music perception. My mum is an empirical musicologist. She is interested in, among other things, what it is about certain items of music that make people feel a certain way. She regularly has run-ins with physicists who consider any theory pertaining to emotional cognition as inadmissible because they can’t measure it on their instruments for acoustics. They are looking at music perception through a keyhole. A very clear keyhole, but a keyhole nonetheless. There is obviously an element of emotional cognition in music. Just because you can’t measure it doesn’t mean it isn’t there. Humans have measurement instruments that physics does not. Not being able to measure it effectively simply means that when you open the door you’ll get a blurry picture. But at least you’ve got the whole picture, and from there you can try to make things clearer gradually.


A model helps you do a few things:
1.  It helps you make predictions. The best models allow you to make causal predictions rather than mere probabilistic predictions. This is the difference between macroeconomics and financial engineering.
2.     They help to integrate all the information you have obtained from quantitative and qualitative investigation. They also allow you to track and update your understanding of the phenomenon you are investigating as more quantitative and qualitative evidence emerges
3.     They give you space to employ deduction and weak forms of evidence like introspection to develop clear, rich, complete hypotheses.

Models, preferably mechanistic models, are absolutely vital for developing a deep, causal understanding of a phenomenon. I will illustrate this with reference to the relationship between trade and development, because this example is fresh in my mind.

Conventional trade theory suggests that relatively free trade (let’s not fuss too much about the specifics, but if you’re interested, the papers by Athukorala and Lin & Treichel below are good starting points) is good for economic growth and poverty alleviation. Countries operating at their comparative advantage will be globally competitive. This will lead to profits that can be invested in health and education, and will relax their balance of payments constraint by encouraging foreign investment and driving export performance, which will in turn provide foreign exchange for importing the capital goods necessary for industrial upgrading. Over time, countries move up the ladder of comparative advantage until they are advanced. Observe Taiwan for a case study.     

This thesis is controversial. Ha-Joon Chang is a well-known revisionist with a different view. He employs an historical analysis (the book is Kicking away the ladder, summary here) and finds that almost all of today’s advanced economies grew to that stage while employing protectionist policies. Unfortunately, he isn’t doing a causational analysis. Indeed, he’s not even doing a correlation analysis. He mistaking coincidence for causation. As literally hundreds of expert commentators have pointed out, the mere fact that a country grows while it has protectionism in place doesn’t mean that it is growing because of that protectionism. It might be growing in spite of the protectionism. Chang also assumes that the global trading regime facing America when it developed is the same as that facing say, ASEAN nations today. That is a heroic assumption given the emergence of the WTO and global value chains (GVCs). There is almost a consensus among Asian trade economists that GVCs make infant industry protection distinctly harmful to your growth. I explain this here.

Another revisionist account is given by Thirlwell and Pacheco-Lopez (the book is Trade Liberalisation and the Poverty of Nations, summary here). They employ a regression approach and find that the average effect of free trade on poverty and growth is negative. This is an incomplete methodology. For starters, it is correlational not causational. The effect may well be driven by countries, like those in Sub-Saharan Africa, whose development is impeded by a range of factors independent of the effect of trade liberalisation. The authors also make little effort to understand what it is about the countries in the positive part of the distribution that made trade successful there.

What both studies have in common is a failure to engage usefully with the models (though Thirlwall and Pacheco-Lopez are much better on this count). Trade theories pertaining to comparative advantage, the productivity effects of competition and the role of FDI and technology transfer in spurring development are fundamentally sound. But of course, these models are derived in an abstract environment characterised by ceteris paribus—all other things equal, or ‘assuming everything else away’. The reality of trade is much more complex, especially in developing countries. The purpose of evidence needs to be to improve these models, not to prove them decisively despite their simplicity or throw them out completely. This kind of categorical thinking is, I think, the most pernicious factor operating in global politics and policy today. Things are usually too complex for categories.

When you qualitatively and quantitatively investigate the cases where trade has worked for development with an eye to improving the complexity and accuracy of your underlying theory you get an understanding of the whole picture of trade for development. You open the door rather than just looking at it through one particular keyhole. You get pieces like this one from the ANU’s Prema-Chandra Athukorala explaining why Thirlwall and Pacheco-Lopez are oversimplifying the situation and ultimately wrong, and this one by Lin and Treichel on what aspects of industrial policy are actually helpful. I note that the Treichel and Lin paper emphasises the need for countries to move in line with their comparative advantage, the need for trade openness, the need for competition, the need for governments to support infant industries without picking winners and without protecting them from competition (if they need protection then they are not at the level of that country’s comparative advantage!), and the need for industrial policy to basically massage the market rather than working against it.

There is an irony that emerges out of the two papers I mention in the paragraph above. Lin is considered by economists to be largely antithetical to the Chicago school that is often associated with American (Republican) economics. If the chicago school is neo-liberal, then Lin is the opposite as far as economics is concerned (he's not Marxist though, that's a whole other paddock). And yet he was director of the World Bank. Treichel is currently head of the South Asia branch there. Stiglitz, famed left-wing economist, was also a director of the World Bank. And yet the world bank has a reputation for being market fundamentalist. It is anything but. It is not, however, pro-protection or explicitly anti-Chicago. Such monikers are the flags of a political debate that has no place in attempts to uncover the factual nature of things. This brings me to the final section of this essay, which concerns habits of mind that will impede your ability to effectively do research and enlighten yourself.

The first bad habit is moralising. Righteousness demands that you be right and the enemy be wrong. It brooks no grey space and is thus antithetical to appreciating complexity. This is manifest in the trade debate in many places. On the right, a good example is the republicans agitating to remove Stiglitz from his directorship of the World Bank on the grounds that he is a leftist. Never mind that he has a Nobel Prize in very much mainstream economics. However, Stiglitz dug his own grave by himself utilising moral language to promote his popular books. In the context of his rhetoric, one would think Stiglitz is some sort of radical firebrand, but actually all he advocates for in terms of trade is a greater appreciation of market failures (especially pertaining to information) and the political economy of developing countries when doing trade policy. He actually doesn’t propose anything but a Washington consensus mk2 in his work. On the left, you have people arguing for protectionism out of a misplaced view that developing countries can only advance if they are inoculated against the predations of advanced economies. This reads evil into the profit motive. What ends up happening is that the wings address each other while ignoring the centre. This is most clearly in abundance in Salon articles, which always take an at least 800-word detour to fire a broadside at the Chicago school as though it still exists—mainstream economics has moved on! I imagine when libertarians get together they hurl barbs at Marx as though the mainstream of scholarly inquiry still gives a shit.  

The second bad habit is mistaking criticism for analysis. On the one hand criticism is good—knowledge proceeds by refutation. On the other hand, because just about everything is very complex, it is always a relatively straightforward matter to show where a theory is deficient. What is much more difficult but also much more valuable is to show an alternative that does work, or to extend the present theory to surmount the criticism you offer. The left and the right are both guilty of this tendency. The right criticises the inefficiency of traditional left-wing policies while the left criticises the inequity of traditional right-wing policies. Meanwhile, modern policy is usually a sophisticated combination of market and government tools to address equity and efficiency simultaneously. But politics doesn’t care. Clinton, for example, needs to promise free education to get a rise out of her base—income contingent loans would not be enough even though they are a fundamentally superior approach.

The third and final bad habit is an excessive focus on either the political realities or abstract theories of a phenomenon. If you focus on a phenomenon in a vacuum, as many American trade theorists do, you miss the complexity of what happens on the ground. You consequently advocate for oversimplified positions that are easily refuted by the facts and which often end up harming people. By contrast, if you over-emphasise politics in your analysis of everything you become blind to the elements of an idea that are actually working. This is clear in attacks on micro-finance. There are certainly a lot of earnest idiots applying microfinance the wrong way, often producing no effect or worse, pushing people into bankruptcy. There are also a lot of bankers making small profits off microcredit (if there weren't, we wouldn't have any microcredit). If you look at the average effect of microfinance across all such schemes globally, the effect is small if not negative. But if you look at the success stories, boy are they amazing (and common!). It is no surprise that Mahummad Yunus won a nobel prize, is highly regarded by some of the smartest people on the planet, and built a bank so successful in an area where banking was previously impossible that the Bangaldesh government tried to nationalise it. The basic ideas in microfinance are sound. They need to be rescued from people who would misapply them and thereby ruin their brilliance. Treating everything as a political matter encourages you to throw the baby out with the bathwater in your hunt for justice.

The most important element of a good research methodology is an attitude that takes complexity as given. Parsimony requires that theory be as simple as possible. But that does not mean we need to be categorical. Nothing hurts inquiry quite so much as an inability to differentiate shades of grey.  


Comments