Algorithmic Districting

@Brian_Olson and others who are interested,

Do you still think “lowest average distance from the center of each census block to the center of the district” is the best metric for evaluating a map?

Have you tried the square of the distance? Do you have maps or examples showing why that would be better or worse?

How are you defining the center of the block and the district? Some geographic center? Or a population weighted center?

A friend of mine wants to run an initiative that re-districts our city based on a good algorithm.

Best,

~ Andy Jennings

Basically yes, with a few tweaks.
Originally I measured the land-area centers of districts; now I think the population-center of the district is better.
I would guess that square vs linear distance doesn’t make a big difference in the result. In both cases there is pressure against sprawling spaghettified districts. I don’t think squaring it to “press harder” will change much.
The Census data provide “an interior point” for each block, so I just went with that. It’s not specifically any kind of center. Blocks don’t tend to be large so I don’t think it’ll make much difference to have a very precise or specific kind of interior or center point; they might move by a few meters. If I had to calculate something I would probably go with a very simple and fast to calculate “center of bounding box”.

Thanks @Brian_Olson,

It seems like a pretty cool fact that for a set of points, the sum of the squares of the distances from each point to the center is equivalent to the sum of the squares of the distances between every pair of points (up to a scaling factor). https://rangevoting.org/TheorDistrict.html#QualityDefns

As WDS says, this may make the optimization problem easier.

But more importantly, as I was talking with a friend (who actually wants to try and do this as an initiative), I was wondering if this makes writing the law easier, because you don’t have to define “center”.

Of course, the census blocks aren’t point masses, but you could say “minimize the average of the squares of the distances between the CLOSEST POINTS of every pair of census blocks in a district”, and it would be a good, simple rule, wouldn’t it?

Having a block simplified to a single point is really valuable, computationally. The “closest points between two blocks” would be a comparison of 3-50 points on each side of every pair of blocks, which could explode the processing time. Actually calculating an N-squared comparison of every block to every other block is not feasible. I’ve also simplified “distance” in a cheat by treating longitude and latitude as cartesian X,Y. I scale based on bounding box midpoint latitude (cosine thereof) to correct for distortion away from the equator to keep things square-ish. I’ve considered a couple better workarounds to make things more geometrically correct but still fast to calculate. One could use a map projection like an “equal area” polar projection centered over the center of the state to keep relative positions and distances pretty proportional. Or, one could map the lat-lon polar coordinates to X,Y,Z on the standard spheroid and sqrt(dx*dx + dy*dy + dz*dz) isn’t much worse than doing just X,Y. But when doing as many millions of these kinds of operations per second it’s important to set things up to be fast.