In the interest of full disclosure, this post is not about algebra or calculus, nor is it about financial instruments. It’s about various kinds of business research ‘raw materials’ and how to discern their quality if you are a producer or user of such research.
Whether you are navigating the waters off Tuscany in a cruise ship, or analyzing a new business opportunity, the quality of the data you use is integral to the outcome you will deliver. While the availability and use of high-quality data does not guarantee a desired outcome, the absence of it renders the desired outcome more a result of chance than of strategic intention.
When I originally developed the KVC model, it was partly a test to see how well best practices in manufacturing could be applied to organizational intelligence. In so doing I borrowed heavily from earlier work I had done in TQM (Total Quality Management). One of TQM’s most basic principles is that in any manufacturing process, it is no longer acceptable to just get to the end of the process — the finished goods — and weed out the ones that do not meet quality standards. Instead, it is much more efficient to build quality checks into each stage of the process, reducing the number of adverse quality-related surprises at the end of the process.
So it is with intelligence — the ‘manufacture’ of a knowledge-based product. One of the implications is that each step up the value chain tends to replicate (and in my experience, amplify) any quality shortfalls embedded in earlier stages. ‘Garbage in, garbage out’, as the expression goes — though we prefer its converse, ‘Quality in, quality out.’
Another implication is that wherever possible, in developing your source data, you go to the lowest possible level on the chain at which it’s available. If information and intelligence are the branches and leaves of the ‘knowledge tree’, then data points are the roots. You want to get to the ‘root-most’ level in order to start your analysis.
To the extent you accept at face value someone else’s processing or analysis of the data, you may save yourself some analytic time and effort — but also you leave yourself open to inheriting and building on their errors.
One of the ways this happens is when people rely on second-order ‘derivative’ work at the start of their process. This is work that ‘derives’ from another, or (in Knowledge Value Chain terms) enters at a higher level in the chain. For example, an analyst may cite a newspaper article about a new study that has been released, instead of citing the study itself. This is usually because he hasn’t taken the time to find the root source, read and analyze it, and draw his own conslusions.
The result is a data quality vulnerability — one that’s typically easy to overcome. When you see a newspaper article cited that refers to a study that someone is announcing, it is essential to (where possible) obtain the original study, read it critically, and do your own analysis and interpretation of the results.
I’ll give you a recent example from my casebook. We have been studying various issues related to heath care for more than a year now. Most recently, we’ve done work around the aging of the population and what market opportunities and challenges it presents. One of the unusual things about this assignment is the sheer volume of high-quality material readily available. The US government publishes lots of data — the Census Bureau, Centers for Disease Control and Prevention, and Center for Medicare and Medicaid Services were especially useful to us. In addition, many universities have centers that study health and/or aging, and there are not-for-profit groups (like the Urban Institute) that also do so.
In short, instead of a dearth of data, there are so many sources that we were still discovering new ones more than ten weeks into our study. And of course, none of them totally reconciled in terms of their target population, what they measured, how they measured it, when they measured it — we were dealing with apples, oranges, and pineapples in terms of the comparability of our data.
At the beginning of our aging study I found one source that was especially useful, a monograph published by a leading not-for-profit research organization. Its source list was a rich trove of ‘root’ sources that we culled to build a ‘knowledge base’ for the market opportunity we were evaluating.
One of the key facts we were looking for was the breakdown among funding sources for elderly long-term care — Medicare, Medicaid, private insurance, self-pay, and so on. The monograph contained an analysis and informative pie chart, but — as I’m recommending here — we had also obtained the source material cited (in this case, National Health Expenditures data produced by the US government). We do this as a matter of course, not so much for checking on our sources, but more as a way of digging more deeply into the source material, and to develop still more sources from their citations.
In this case I could get the total figures to agree with this authoritative derivative source, but not the allocations among these sources. After spending an hour or so trying to reconcile the difference, I went back to the monograph and read the part of the text in which it explained the pie chart. There, to my amazement, I found the breakdown very similar to the one I had developed independently. As far as I can tell, there had been an error made by the person (quite possibly not the author) who created the accompanying pie chart.
If I had used only the ‘derivative’ source, I would have given an incorrect result to our client — complete with a citation to a ‘reliable’ source. Only by finding a ‘root’ source and checking against that was I able to deliver a result that met our standards of data quality.
It’s not totally clear in this case that an incorrect answer would have significantly affected our client’s decision to pursue an entry into this market. But in this case — as is often the case — the cost of higher-quality data was not much greater than incorrect data. So having the best you can get becomes a good work habit to develop.
Imagine for a second you’ve won the lottery, and are now driving a new $75,000 BMW M3. Would you put low-octane gasoline and bargain motor oil into it? Unlikely, since the value and usefulness of what you have would likely be degraded significantly by doing so.
Yet companies do this all the time — put low-quality data into a highly-tuned decision-making machine — often without knowing it. The quality of raw unprocessed data is not always immediately obvious to its end user, since by that time it has been transformed by the ‘intelligence manufacturing’ process. (It’s usually clear to me where the vulnerabilities are, but that comes with having done business research for a long time. I’ve seen most of the mistakes you can make, and made a fair number of them myself.)
I’ll give you some of the signposts we use for data quality in a future post. But one of the things that immediately starts me checking is the use of derivative sources, where the roots are available.
Tim,
“Go to the source” is good advice for anyone researching anything, whether for corporate use or personal interest. I remember the first time I heard it, when I was taking an elective in comparative theology at college (a nice contrast for an engineering major) and quoted the King James version of the bible as an authoritative source… My professor, who was (in his own words) an ex-Jesuit, gave me back a Latin source for the King James, a Greek source for the Latin, and an Aramaic source for the Greek, wih his own contemporary translations or each, and demonstrated how far from the ‘root’ the King James had strayed. It was – and still is – a very valuable lesson!
Mark.
Thanks Mark,
That’s a great example. I too always found it amazing that God spoke like a 17th-century Englishman!
But seriously, the lack of curiosity I’ve described can have profound and disastrous consequences. For example, during the US mortgage bubble that led up to the meltdown of 2008, the raters and buyers of the derivative financial instruments (e.g., mortgage-backed securities) apparently didn’t understand the value of the ‘root’ instruments (the sub-prime mortgages from which they were derived, and the quality of which did not measure up.) The rest, as they say, is history.
Tim
Tim,
Interesting work on credit analysis is being done by Ann Rutledge and Sylvain Raynes, whom I met at NYU 10 years ago back when I ran a noncredit finance program there. (Disclosure: they have since become friends and clients.) They’ve published a monograph and a textbook with Oxford Univ. Press, and are applying to get official SEC recognition for their firm R&R Consulting as a credit ratings agency.
Their method involves the use of granular data about the underlying obligations (e.g. mortgages) to do heavy-duty cash flow analysis (1) before the deal is closed and (2) periodically afterwards over the life of the pool.
As your pie chart illustration says, it’s taking advantage of information that’s there for the taking–but you have to exercise the initiative to look for it.
Katherine
Thanks Katherine,
R&R’s work (love that name!) looks interesting, thanks for bringing it to my attention.
There are currently nine ‘nationally recognized’ credit ratings agencies, but always room for more if they do good work. Unfortunately the ‘issuer pays’ compensation model, which many of them use, creates inherent conflicts of interest and incentives to over-rate issues.
The SEC ramped up oversight of the CRAs only in 2007 — by which time a lot of the damage had already been done. Better late, than never, I suppose.