Last month, I worked on a project for the 2013 Urban Data Challenge, a competition to visualize a week’s worth of transportation data for three cities: San Francisco, Zurich, and Geneva. (My team won second place in the challenge so I’ll hopefully be visiting Switzerland some time soon!)
At the outset of the competition, I was fortunate to meet with some talented people with experience in civil engineering, data science, architecture, programming, and visual design. We considered a number of different ideas, but finally decided upon a way to quantify, rank, and then visualize the frustration of transit users in each city.
Our final application visualizes frustration on a Monday in each city. At different times of day, you can view frustration in terms of speed, the crowdedness or capacity of vehicles, and delay. Then, if you zoom in on one transit stop, a total grade of the frustration is calculated for this stop. There are a variety of other factors that I would have liked to consider, which are summarized along with our project methodology.
The application is here: http://frustration-index.herokuapp.com/
Incidentally, frustration is a word that summarizes my experience wrangling with the raw transit data for these cities. The tasks of formatting, normalizing, analyzing the data and formulating output required much more effort than I had anticipated. The dataset for San Francisco was particularly challenging. I spent a number of restless nights trying to correlate real data from October 2012 with the Google Transit Feed for the same time period. With the exception of a few hundred outlying buses, I was finally able to sync the Google schedules with the raw dataset. This was tremendously rewarding, but there were still countless other issues that our team had to figure out.
I’ll be writing a separate post about findings and trends in the data, as well as remaining questions, issues, ideas, et cetera that I would like to return to. For instance, I would love to apply theories of computer networking to this data, such as vehicle queuing and routing and channel flow, capacity, congestion, and reliability.
But enough banter, here’s a video.