Filed under “blind squirrels,” there is a remarkably thoughtful post at the Puddle by David Colarusso. I know, right, but just because the Lawyerist has been moribund for years doesn’t mean it can’t find a nut. And so it has.
The National Institute of Justice has announced the winners of $1.2 million in prizes for its Crime Forecasting Challenge. The challenge asked data scientists to develop algorithms to better predict the occurrence of crimes based on location.
Thankfully, team Anderton (named for the precrime division head in Philip. K. Dick’s Minority Report) is collecting none of that prize money. I say “thankfully” because I am Team Anderton.
This was a “legal hacker” challenge of the sort that tech futurists adore, especially since there was some serious loot at stake. Colarusso could have tried to glom his piece by crafting a serious string of code. Instead, he went the other direction, testing the efficacy of the game by submitting a hack that was intentionally awful.
Team Anderton was a canary in the data mine. Dashed off over two weekends, its forecasts embodied my greatest fears for predictive policing. If it fared well it meant trouble.1 Luckily, it did not, but it still has something to teach us. What follows is the story of Team Anderton, a companion to the classic How to Lie with Statistics updated for the digital age. Fed by inexpensive computing, statistics has a new name: data science. And though it can be used for good, there is a dark side. As attorneys, we must be mindful that such misuse threatens the rights we have sworn to protect.
I had just picked up Cathy O’Neil’s Weapons of Math Destruction, which devoted much of a chapter to the dangers of predictive policing, and though I really wanted to write something explaining all of the dangers such a well-intentioned competition was likely to face, I wasn’t sure what more I could add to the growing chorus of concern surrounding algorithmic bias.2 Then I hit on the canary idea. It’s one thing to talk about how algorithms go bad, but to show the banal birth of such an algorithm seemed like a story worth telling.
Colarusso’s post is long, very long, and occasionally dense. Very dense. He goes flying off on tangents to explain things, which is understandable given that much of the issue isn’t yet on people’s radar, and lawyers aren’t particularly good at grasping the intricacies of “data science.” He was left with the choice of being thorough and precise or fast-paced and interesting. He chose the former. Frankly, I don’t blame him. Plus, he had the added virtue of being one of the few posts to ever appear at the Puddle of any depth. Any more posts like this and I may have to change my nickname, Sam.
The crux of Colarusso’s canary test is that algorithms can be right for the wrong reason. This particular hack was based on five years of data supplied by the Portland police, and as such, began from a GIGO foundation. Pick any street corner in a bad neighborhood, predict there will be crime there, and boom, you nailed it. It’s not that drugs aren’t being ingested in white neighborhoods, but that there is no data to prove it and predict it. Isn’t data cool?
Consequently, the forecasts do not offer some insight into previously unseen criminal activity. They answer the question, “Where will we find crime?” with “Where we always have.” Perhaps you could account for potential sources of bias, like over-policing, by controlling for things like the frequency of police visits and neighborhood demographics, but if such a genuinely thoughtful model resulted in a map that differed from the traditional distribution of “crimes” it wouldn’t score very well. This is because we aren’t judging models based on some omniscient understanding of where crimes actually occur. We have to judge them based on available data. For challenge participants constructing and fine tuning their models, this meant historic call data: Where will we find crime? Where we always have.
Put aside some time, if you think predictive algorithms have a potential to be the future of criminal law, and muddle through Colarusso’s post. It’s unimportant whether you agree with his issues or not, but that you appreciate that there are issues, really tough issues, that make you realize that whatever simplistic idea you have about how tech will save us is a big, steaming pile of crap.