yellow brick road to stats heaven

yellowbrickstats.com

~ a loose collection of statistical and quantitative research material for fun and enrichment ~

by roland b. stark

what a Batman Graph can show about wealth and poverty in the US


For some of you, this will furnish Example # 1,043 of why you need to graph the relationships between the variables you’re analyzing.

'Batman' Graph of US Wealth and Poverty


The US Census provides a variety of indicators of socioeconomic status. These can be assembled into a Wealth index and a Poverty index to describe each of 33,000 ZIP codes – in this case using principal components analysis. This creates scales where the US average is 0 and the standard deviation is 1.

Now, simple intuition might tell you that there’s a negative linear relationship between Wealth and Poverty. Correlation would support this to some degree (R2 = .061). But even the quickest glance at the scatterplot above should change your thinking. Nor, for that matter, is this essentially a quadratic relationship, where, with higher and higher Wealth, Poverty would decrease, only more and more slowly (R2 = .076; not much of an increase). Or a cubic one (.077; almost no increase at all).

As average Wealth increases, what happens to the average level of Poverty? If you compare the zone where Wealth is below 1.0 to the zone between -1.0 and 0, mean Poverty increases sharply. The top of the shape – Batman’s right arm in flight – leans to our right. Then, from Wealth of 0 to Wealth of 2.0, mean Poverty decreases slightly. Finally, further towards our right (the “left arm”), Poverty increases slightly once more.

Variability is important to this relationship too. For the small percentage of points that lie in the region to the left of -1.5 or to the right of +1.5, the mean for Poverty does a good job of describing the data, but everywhere in between there is so much variation that the mean hardly captures the story.

The next graph zooms in a bit; it displays the ZIPs in a more granular way; and it fits linear and lowess lines to the data, where “lowess” stands for “locally weighted scatterplot smoother.” It’s an exploratory, opportunistic alternative to fit lines that are directly determined by linear, quadratic, or cubic equations.

'Batman' Graph with Linear and Lowess Fit Lines


With this visualization we again see that the linear fit line does a very poor job of describing the pattern. The lowess fit is better, though again it follows the mean rather than accounting for the variability. No amount of such modeling can account for the central puzzle of these charts: why ZIP code Poverty takes on such a wide range of values only when Wealth falls in a narrow range just below the US average.


I welcome any comments you might like to add, publicly or privately.


copyright 2008 - 2019 by roland b. stark.

yellow brick stats homepage


yellowbrickstats home

my statistical and research consulting work