Big Data = Big Value?

Michael Schade, data analyst at Mobility Lab in Greater Washington, USA, explains in an interview with PTV how we can use and benefit from big data in traffic and transportation.

Michael Schade, data analyst at Mobility Lab in Greater Washington, USA. His credo: “Anything that democratizes data is a good thing.”

Traffic-relevant data volumes continue to grow at a steady rate. Real-time data complements historical data, offering completely new evaluation and analysis options. Michael Schade, data analyst at Mobility Lab in Greater Washington, USA, explains how we can use and benefit from big data in traffic and transportation

What are you doing with (big) data?
I work for Mobility Lab which is a research group trying to help people to find more choices how to get around. For instance, a favorite data set is from Capital Bikeshare (CaBi) which releases a new data set every three months with all the trip histories from that period. This is now in the millions and millions of trips that have been taken and each one of those is available for people to study.

I use data of transportation systems to show the people in the region how their transportation networks affect their cities. We are creating tools which show the impact as well as tools to let the people explore their own neighborhoods and to see what the choices are.

How has data been gathered and how can people play with them?
The CaBi operator collects the data and they add it every three month to their online library, so you don’t even need to sign an agreement to use it. You can just download it on their website and easily play with it. In fact, I run a group called transportation techies where we regularly have nights where we invite people from the community to come and show us what they have done with the data. We call it “CaBi Hack Night”. It is a lot of fun. You get a wide range of people: from professional programmers, people who work for transportation or planning departments, people who work for bicycle advocacy groups, to people who are just curious and want to experiment and to try something new.

What kind of data of the bicycles will be stored?
It is very simple: destination-origin information. Each station has a computer that logs activity. In our system the bikes themselves do not have any computers on board. So we know only the information from when they dock the bicycle and when they take it out of the dock.

How are you using this data?
Most importantly we see where the busy parts of the network are. Everyone has a hunch about which neighborhoods use it most and it is nice to see your hunch is validated or refuted. Last year saw some interesting jostling for bragging rights as the busiest station, with a tight three-way race between Dupont Circle, the Lincoln Memorial, and Union Station. Usually the changes in traffic levels are a reflection in the neighborhood itself. As a neighborhood grows its bicycle usage will grow as well. It is interesting to see bicycle usage as a kind of a marker for other data.

And it is also interesting to see how tourists in Washington use the bicycle systems. Last year for one period the Lincoln Memorial became the most popular spot for cyclists. We can look at the data to see that the majority of CaBi trips are from “casual” users, those who buy one- or three-day passes. We assume most casual users are tourists, as opposed to residents who would be more inclined to get the monthly or annual passes. And in Washington it was actually a battle to see whether or not the Park Service would allow us to have a station anywhere near the national memorials and monuments to begin with. To see that the most popular station is one that we fought to have placed near one of the more important tourist destinations, that is very validating to see that it was worth the fight, to see the fact that people do use the system.

Metro Traffic, Hourly entries and exits

Metro Traffic,
Hourly entries and exits

Everybody can use it, share it and connect it with other information?
It is truly open data, there is no restriction on it at all.

Which other organizations have interest in this kind of data?
The transportation departments themselves have a very strong interest in this data. Because, it is their job to see where do we put the next station. And they have to plan for how to manage their network. Therefore, they need to know these numbers. Nice thing is when you release the data, all of a sudden you have crowdsourced data analysis. The government planners no longer have to work alone. If you open up the data to the public, the public will tell you how to interpret it.

Do you use also Public Transport information?
Yes, we also use the Metro system. The data the Metro operator offer is a little bit different. They have an API that lets us access their scheduled information as well as real-time information. We also receive historical information sporadically as separate stats. In February, we had a so called “Metro Hack-Night”. Hacking in the sense that we were encouraging people to do small creative projects with the Metro data. And people really appreciated having access to the data and they created interesting projects.

Can you name examples?
We had someone who just created graphs, basically activity charts for every single station. This became a very large project with over 90 subway stations. Then comes the interpretation: Whenever you find a change that is somehow unusual you dive right in. Why does it show activity at 5 in the morning? And you realize because there was a big race that day and everyone came to this one station to participate in the race. So it ties together our life as community with our transportation networks supporting that.

Is there a link between bike and Metro data?
We definitely want to see that. The agencies wouldn’t do that on their own. Through our group we encourage people to try to do that. We definitely talk about the linkage between bike share and Metro. We try to compare two stations, one is near the Metro and the other one is not. Then we try to look at the patterns to see if the people are using the system to commute using both systems. We call it the last mile problem. How do the people go from their doorstep to the Metro system? If we look at the patterns we can conclude that people use multiple systems to do that.

Do you expect that the data will be shared with other transport modes?
We invite anyone to look at the data they find of interest. So we also do have bike hack-nights where people are collecting and visualizing their GPS data. So we talk about how you can view them on a map, how you can view group rides when people are competing together. But not many people have experimented with visualizations that combine different modes.

Is that a topic for the future?
I would love to see people combining different modes and to continue to track how they interact.

Do you share your experience with other regions internationally?
In our group we try to reach the largest audience possible. We are still building our membership here in the DC region. A few months ago we had our first non-American attending; a gentleman from England – that was exciting.

What is the biggest challenge for the future regarding big data in general?
The big challenge is still getting transit agencies to participate and share their data, to get them on board with truly open data.

More information

“Capital Bikeshare“ is a unique bicycle sharing system that serves Washington, D.C. and neighboring jurisdictions. Since 2008, it has been continuously investing in the expansion of stations and services, now offering its users over 2,500 bicycles and 300 stations across four regions (Washington, Arlington, Alexandria, Montgomery County). Washington’s Capital Bikeshare regained its crown as largest overall network in the USA in 2014. The previous champ, New York’s Citibike now ranks second. Links:

About Michael Schade
Michael Schade has a degree in computer science with a major in transportation systems. He worked as a consultant for Mobility Lab and as programmer at a start-up company called Transitscreen. His credo is: “Anything that democratizes data is a good thing, anything that makes it easier for people to access and analyze on their own. The data should not belong just to the planning departments or just to the transportation agencies, it should belong to everyone. It is our data.”