Ben Wellington has been blowing the minds of New Yorkers for quite a while.
The affable Pratt Institute visiting professor behind I Quant NY is an unabashed advocate for open government data and has attracted a great deal of media coverage finding creative uses for the reams of data spit out by the New York City government every day.
He’s used parking ticket data to identify the most frequently blocked driveways in NYC, taught us all how to not get screwed by the MTA, and revealed the horrifying truth about our collective proximity to Starbucks outlets. Now writing at the New Yorker, Wellington is fast becoming New York’s king of awesome data nerdery.
Many of Wellington’s numbers come through a site called NYC Open Data, a Bloomberg-era initiative designed to give New Yorkers ready access to the huge quantities of information routinely collected by city government. The idea is simple: Take as much as possible of what agencies already record, upload it to one site in easy-to-use spread table formats, and see what people can do with it. The Voice wrote about one such project — a personal homemade transit timer — last year.
NYC Open Data was hailed as a huge step forward, and for good reason. Not many cities have anything that comes close. A few clicks will produce spreadsheets full of everything from restaurant inspection scores to school attendance rates and rodent infestations.
A few days ago, Wellington released a video of a TEDx New York talk in which he discusses some of the challenges he’s faced with the work he does. Titled “Why Open Data Is Still Too Closed,” the discussion drew attention to some interesting problems. As the title of his talk suggests, Wellington’s not hammering the city’s efforts, not by a long shot; he is nothing if not enthusiastic, and he’s something of a New York City booster. His every nudge is buffered with declarations about how “we, as a city, can do better than this.” But he does have some stories. In one case, an acquaintance had to file a Freedom of Information Law request for taxi location data, something that could have easily been posted on the city’s website. It was data that was public in theory but not practice. The researcher ultimately had to walk into a city office with a hard drive under one arm.
“[The taxi data] was ‘public,’ ” Wellington says, making scare quotes. “But it wasn’t public. And we can do better than that as a city. We don’t need our citizens walking around with hard drives.” Other objections are about the seemingly minor but actually very significant issue of format. Car accident data, for example, is released as a series of documents in odious PDF form — relatively easy for humans to read, but far more difficult for a computer to analyze in bulk.
The problems Wellington points out exist elsewhere. Stop-and-frisk data, for example, posted regularly by the NYPD as the result of a lawsuit several years back, is released in a proprietary format that makes it extremely difficult — and potentially expensive — for independent researchers to work with. Precinct crime data, like those accident reports, is posted in PDF, making it difficult for the casual user to compile.
So Wellington raises some excellent points — exactly the kinds of things reporters routinely whine and shout about. And he’s just so damn nice as he does it, maybe the city will listen.