As GIS manager for my county government, one of my tasks was to prepare a County Atlas based on the most recent Census numbers. One of the fundamental areal properties mapped was population, we had raw population figures in a database based on a variety of partitions of the terrain; zip codes, Census districts, voting precincts, fire and police districts, zoning, traffic analysis zones (TAZ), etc. the idea was to produce a choropleth thematic map that gave as accurate an impression as possible of the population distribution of the county. TAZ were chosen because they were the smallest, most granular partition available. The TAZ were relatively small polygons, roughly uniform in shape and size, and there were a lot of of them.
Each TAZ was colored by its population concentration, and we chose 5 categories, from lowest to highest, because that seemed to give a detailed yet relatively uncluttered map that communicated the most information at a glance. The population of the TAZ varied from 0 (industrial areas and parkland), to 26,625 (high density public housing and seaside condominiums).
The question then arises, how do you select the breakpoints that determine which population category (and color) a TAZ polygon will be assigned? Our GIS software provided us with several different statistical options. All of them perfectly legitimate and totally objective, yet each producing a very distinct map.
Equal Interval – Divide the entire data range into 5 equal intervals.
Equal Area – Divide the data ranges so the total spatial area assigned to each range is roughly the same
Quantile – Divide the ranges so roughly the same number of TAZ are assigned to each range.
Jenk’s Method – An algorithm that assigns “natural” breakpoints based on formal statistical criteria.
Standard Deviation – The algorithm assigns categories (colors) based on SD’s from the mean.
In addition, the software gave the user the option of defining his own custom breakpoints, essentially giving the unscrupulous cartographer the capability of using the data to “prove” just about any damned thing he wants.
For comparison purposes, we provided population distribution maps using the first four methods, as well as a population density map ( TAZ population / TAZ area) using the fifth (SD) method. We also provided a density dot map (1 dot = 200 people, dots distributed randomly within each TAZ), so that the Atlas users could see how their conclusions about the data might be influenced by the way it was displayed.
Many more Census parameters were mapped in the Atlas, and in each we selected the method we judged, as professional geographers, was most illustrative and informative of that parameter, that best reflected ground truth. But we also printed the statistical method we used for each map so the users would have some feeling for how their objective assessment of the data might simply be an artifact of the subjective prejudices, of the cartographer.
You know this illustrates a theme I am constantly bringing up here on this board. Statistics is an invaluable tool for analyzing the world, but it is just a tool. It can misused in a variety of ways, either by a lack of sophistication in its use, or by deliberate and malicious prevarication. If the cartographer has an agenda, it will affect the map and the information it communicates. And this bias is not necessarily deliberate or malicious. Very often, it is subconscious, and unrecognized by the geographer (or analyst, or statistician). You have been warned.
-
Why didn't they just go with a straightforward gradation of color based solely on the number?
-
They did the exactly same thing I did, they just used more categories .
- That is an option, and software does exist to allow you to do that.
-
They did the exactly same thing I did, they just used more categories .