Finally, we have enough data to run a basic empirical analysis. While the data set is very small (47 observations) we can nonetheless run some rather back-of-the-envelope regressions.
I built a data set using the rankings chart, mapping the A-F scale to a 12pt scale (e.g. A–>11.0-11.9). I assigned each observation within each coarse grade bin a grade that accounted for the fact that some whiskies ranked higher within the bin than others (that is, not all As are exactly equal).
I then generated a set of whisky characteristics whose explanatory power I was interested in (with regards to marks). These were my selections:
1. Age (my interest is to test the Jim Murray hypothesis, “Age does not a good whisky make”. Set of dummy variables, control group is ‘NAS’)
2. ABV (Do I inherently prefer cask-strength whiskies? Set of dummy variables, control group is ‘ABV<43%’)
3. Research (Does my research of the online consensus help?)
I also generated some control variables:
1. Region (to control for any possible region/single malt bias. control group is ‘blend’)
2. Peat (to control for what I think is a personal favouritism for peated whisky)
3. Minsample (This indicator marks whether the whisky was tried fewer times than my minimum sample threshold [a 20cl bottle]).
4. lnpriceml (controls for the percentage effect of price per ml — a proxy for raw quality, but admittedly, also for marketing gimmicks)
Running a kitchen-sink regression, I find the following: minsample and peat are insignificant at 10%. Region is also jointly-insignificant at 10%, and if we re-cast the region variable as a single malt variable (vs. blends) we get the same result. Adding controls individually produces the same results. The price control is significant at the 1% level.
So, after all is said and done, what do I find?
1. Jim Murray’s hypothesis is rejected using my sample (0.006<p-value<0.05). Why is there an interval for the p-value? Well, the problem lies in the price control. The price control is necessary to be a proxy for quality, but unfortunately, my regressors for age and ABV are positively correlated with price. Because price is such a good proxy for unmeasurable quality (and I have no others), I would commit the sin of omitted variable bias if I failed to include price as a control, but if I include it, the coefficient estimators for age and ABV end up underestimated (but standard deviation does not meaningfully change). Thus, I provide an interval of the p-value here and say that this test is significant at a level between 0.6-5%.
I find that whiskies younger than 10 years are roughly 2 grade levels lower (on average) over no-age-statement whiskies, and that 10-16yo whiskies tend to get about a 0.75 to 1 grade bump. 17yo+ whiskies get 2 levels. This is as I suspected. Out of curiosity, I tested my suspicion that Jim Murray’s hypothesis may hold up unconditionally, and sure enough it does (p-value >0.26), but when you condition on ABV and price, age does matter. One can see this by comparing my marks for the Auchentoshan Valinch and the Lagavulin 16. The reason they are so close, then, is not because age doesn’t matter, but rather that the Valinch is cask-strength and the Lagavulin is not (Many will argue it’s because they’re is something wrong with me, and that might be true). Bottle the Valinch at 43% (something like the Auchentonshan Classic) and you’d probably see that Auchentoshan around a C+, according to these results (hey, that sounds eerily close to what a standard entry-level OB would get…).
2. ABV matters (H0: ABV doesn’t matter | p-value<0.001), with cask-strength whiskies earning a 4 grade level bump over their sub-43% brethren. Whiskies of the non-chillfiltered 46% variety earn ~2 grade levels, and those around 43-46% earn approximately one grade level. Even after controlling for the ‘single-malt’ bonus (blends are usually low ABV), it seems like my palate prefers CS whiskies. Doesn’t surprise me, as these are often smaller-batch and better crafted. There is also something to be said for the ability to add your own water, to bring the whisky down to the level where you enjoy it best.
3. Research helps. I reject the hypothesis that research is useless to my whisky journey, with researched whiskies getting (on average) a ~1.5 grade level bump over whiskies I try in a bar, or am given as gifts. This is probably not a surprise to anyone.
Some further points of note: I don’t seem to be biasing peated whiskies (whether they be Islay peat or otherwise). Also, Islay whiskies, and single malts in general, (all other things considered,) are not being favoured.
Note: The Breusch-Pagan / Cook-Weisberg test for heteroskedasticity cannot reject the hypothesis of a constant variance (Prob > chi2 = 0.525), so we run our regressions using standard OLS with non-robust standard errors.
Some Caveats: There are (quite a few) caveats worth noting about this “quick and dirty” study that anyone reading this should know before they pass it on to others.
1. Omitted Variable Bias. This is perhaps the biggest one. What could I be forgetting?
2. Small Sample Bias. Goes without saying that 47 observations is very small. It will be interesting to see how these numbers change (or if they change!) as the data set grows.
3. Multicollinearity. This affects the magnitude of the age and ABV regressors. Because price is positively correlated with both of these sets of variables, it’s likely that their coefficients are underestimated.
4. Personal Preference. As a final caveat, note that this study measures how certain whisky characteristics affect my personal grades. There will always be some sort of unmeasurable personal component to these results (I am not a robot, after all). That means that the old saying “Your Mileage May Vary” is worth keeping in the back of your mind as you read this. If I had a panel data set that included marks from All Things Whisky, Ralfy, WhiskyBitch and LAWS, (with an appropriate way to convert all the different marking schemes) I might be able to do some analysis that removes the individual fixed effect.
5. Analyzing a Time-Series, Cross-sectionally. It’s clear that all these whiskies were not tasted on the same day. This leaves the study open for a possible bias due to the omitted time-trend. Controlling for the time trend may account for my getting used to the power of cask-strength whiskies (and thus enjoying them more as time went on). It might also capture the effect that broadening my whisky experience has on marks: the more I find whiskies that push the upper (and lower,) limit of my marks, the more my older marks become biased if I don’t go back and re-taste them (Laphroaig QC has continued to hold up, though).
Lastly, I’ll say that I’m not a professional empiricist. In fact, quite the opposite, I am chiefly a theorist. But it’s healthy to be able to do both, and so I am using this project to rehabilitate my once-adequate empirical muscle. These “results” are a first-blush attempt to analyse this data set, which will no doubt improve over time as more data, and better variables and techniques present themselves. If you have any comments as to how I might improve this analysis (techniques, variables, or otherwise) do let me know. Also, if anyone has a suggestion for a proxy for quality that may not be correlated with age or ABV, have at it.