hosts nationwide hackathon on open data from URL shortener

The federal government is hosting a hackathon focused on unlocking the value from the newly opened click data from its URL shortener. Organizers hope the developer community can create apps that provide meaningful information from the online audience’s activity. Later this month, has organized a nationwide hack day, inviting software developers, entrepreneurs, and citizens to engage with the data produced by, its URL shortener.

UPDATE: Check out the projects created at the Hack Day, including the animation below:

The hackathon fits into a larger open government zeitgeist. Simply put, if you enjoy building applications that improve the lives of others, there may never have been a better time to be alive. Whether it’s rethinking transportation or convening for a datacamp, every month, there are new hackathons, challenges, apps contests and code-a-thons to participate in, contributing time and effort to the benefit of others. This July is no exception. Last Saturday, Google Chicago hosted a hackathon to encourage people to work on Apps for Metro Chicago. On the Saturday after OSCON, an API Hackday in Portland, Oregon for “an all-day coding fest focused on building apps and mashups.” If you’re free and interested in participating in a new kind of public service, on July 29th, hack days will be hosted by in Washington, D.C., Measured Voice in San Diego, bitly* in New York City, and SimpleGeo in San Francisco. If New Yorkers still have some fire in your belly to collaborate with their local government, the city of New York is hosting its first-ever hackathon to re-imagine on July 30-31.

How URL shorteners and work

To understand why this particular set of open data from is interesting, however, you have to know a bit more about and how social media has changed information sharing online. A URL is the Web address, like, say,, that a citizen types into a Web browser to go to a site. Many URLs are long, which makes sharing them on Twitter or other mobile platforms awkward. As a result, many people share shortened versions. (O’Reilly Media links are shortened to, for instance.) One of the challenges that face users is that, unless a citizen uses one of several tools to view what the actual hyperlink is below the link, he or she might be led astray or exposed to malicious code that was included in the original link. In other words, this is about being able to trust a link.

Last year, the United States General Services Administration (GSA) launched a URL shortener at the Gov 2.0 Expo in Washington, D.C. Whenever a government employee used (or any service that uses to shorten URLs, like Tweetdeck) to shorten a .gov or .mil URL, the link will be converted to a short That meant that whenever a citizen saw a short URL on a social network, she knows the content came from an official government source.

For more on how URLs work, watch Michele Chronister’s presentation from the last year’s Gov 2.0 Expo, below. Chronister is a presidential management fellow and Web content manager for in the Office of Citizen Services and Innovative Technologies at the GSA.

This March, the GSA added a URL shortener for civilian use. “The whole idea is to improve people’s experience when dealing with government information online,” explained Jed Sundwall, a contractor for and, via email. “We keep in the domain for usability reasons. It’s crystal clear, worldwide, that URLs point to trustworthy government information.”

According to Sundwall, ABC senior White House correspondent Jake Tapper was the first to use it when he tweeted out a link to a PDF containing new unemployment information at the Bureau of Labor and Statistics: “For those asking follow-ups on unemployment, here’s the BLS link

Months later, Tapper has been followed by thousands of other people that have used the URL shortener simply by using the tools there already knew.”The beauty is that Jake used it without knowing he was using it,” writes Sundwall.”We’re trying making it easy for anyone to identify .gov information as it’s being shared online.”

That easy identification is quite helpful given the increasing pace of news and information sharing on the Web. “Trust is a valuable thing online, and being able to know that the information you’re receiving is reliable and accurate is difficult yet essential — especially so for government websites, where people go for critical information, like health services and public safety,” wrote Abhi Nemani, director of strategy and communications for Code for America.

Code for America is “excited to be partnering them to help bring together passionate developers, designers, and really anyone interested to see what we can hack together with the data,” wrote Nemani. The hackathon will tap into “a huge and growing resource for new and really interesting apps,” he wrote at the Code for America blog. “See, this data gives a lens into how people are interacting with government, online; an increasingly important lens as citizen/government interaction moves from the front desk or the phone line to the web browser.”

To learn a bit(ly) more about the hackathon and its goals, I conducted an email interview with Michele Chronister and Sundwall.

What does the GSA hope to achieve with this hackathon? How can open data help the agency achieve the missions taxpayers expect their dollars to be applied towards?

Chronister: We hope to encourage software developers, entrepreneurs, and curious citizens to engage with the data produced by data provides real-time insights into the government content people are sharing online and we know hack day participants will surprise us with creative new uses for the data. We anticipate that what’s produced will benefit the government and the public. Making this data public expands GSA’s commitment to open, participatory and transparent government.

What hacks can come of this that aren’t simply visualizing the most popular content being shared using

Sundwall: First of all, the issue of popular content is an important one. Before this data set, no one has had such a broad view of how government information is being viewed online. Getting a view of what’s popular across government in real time is a big deal, but a big list of popular URL’s isn’t killer per se.

The data from includes a lot of data beyond just clicks, including clickers’ browser version (firefox v ie, mobile v desktop, etc) and IP-derived geo data. It’s also real time. This allows people to look at the data across a number of different dimensions to get actionable meaning out of it. A few ideas:

1. Geo data. The geo data included in the feed is derived from IP addresses, which makes it intentionally imprecise for privacy reasons (we don’t show the IP address of each click), but precise enough to spot location-based trends.

One of the reasons we brought SimpleGeo on as a collaborator for the hack day is because they’re really good at making location data easy to work with. Their Context product makes it easy to filter clicks through a number of geographic boundaries including legislative districts. They also make it easy to mash the data up with Census demographic data.

We want to let journalists, analysts, campaign strategists, and other researchers know that data is a powerful tool to spot trends in the areas where they work. I gave a demo of to Richard Boly at the State Dept soon after we launched and thought it could be a tool for country desk officers to spot trends in their countries. Hint: if you’re coming to the hack day, think about building something like this.

We hacked together a quick video showing click data mapped out across the US for most of June: red dots are non-mobile clicks and green are mobile. It’s a blunt visualization, but it’s fascinating to watch the clicks pulse across the country, from the east to west in the morning, and then from red to green when people leave their desks and get on their phones.

We could enhance visualizations like this to see if there are trends in how particular kinds of information are shared throughout cities and across the country. I wouldn’t be surprised if clicks on certain links from certain agencies turn out to be leading indicators—perhaps municipal leaders should pay attention to spikes in clicks on links.

2. Browser data. We log, on average, about 56,000 clicks on links per day. It’s not a ton of data like Google, but the dataset provides a really nice sample of user behavior—particularly social media users because the short URLs are most frequently shared and clicked via Twitter and FB.

I’m hoping data can be useful to people tracking trends in browser adoption and trends in mobile usage. The data science team at bitly is already doing this kind of analysis with their much larger set of click data, but we’re really excited to give a slice of that data out to researchers for free.

3. Contextual data. Each link points to a file that is likely to include some amount of machine readable content such as an HTML page title, meta description, body content, etc. Many links, if not most, are shared via Twitter. Both the content of the link’s file and the content of the tweet that included the link when it was shared provide insight into not just what links people are sharing, but what topics people are talking about.

What are some of the early successes — and failures — that inform how the GSA is approaching its open data initiatives? And how will it all relate to citizen engagement?

Chonister: has successfully built a community of people interested in government data and we hope to expand on that by making’s data more available. One part of this is releasing the click data to the public. We also provide XML for all of our frequently asked questions on and a product recall API. These resources can be found at

We know that raw government data is not interesting or useful to everyone which is why we are trying to engage specific communities with the hack day. Hopefully any tools created in the hack day will help engage a larger audience and show what’s possible when government opens their data and makes it available.

What are some useful examples of “infohacks” where someone can easily find useful information already?

Sundwall: actually used a method to finding useful government information from (and by instructing people to search for + tsunami on Twitter after the Japan earthquake in early March — this was the best way for people to find the best government information about the tsunami at the time. It allowed us to crowdsource the best government resources about the tsunami by relying on what everyone on Twitter was already finding and sharing. You won’t see this now, but at the time, the search results featured a few “top tweets” pointing to useful government information. let us know it was authoritative even though it was being shared from non-govt Twitter accounts like @BreakingNews.

This Twitter search trick is one of my favorite hacks. I subscribe to RSS feeds of + awesome and + cool and find great crowdsourced govt information every day. Just last week, this tweet inspired this blog post, which ended up being the most popular post on the blog ever.

How else could this data be made more useful to citizens – or government?

Sundwall: Researchers could use this Twitter search method to be notified of new information by subscribing to searches like + cancer, + human rights, + Afghanistan, etc. I sometimes get a kick out of searching “ + wtf.” I’m a nerd.

What’s the incentive for developers to donate their time and skill to hacking on this data?

Sundwall: This is the best question. I hope some of the ideas I’ve presented above give an idea of how powerful this dataset is. This is the kind of information that organizations usually regard as proprietary because it gives them intelligence that they don’t want their competitors to have. I’m really really proud to work with the folks at because opening up this dataset reveals a deep understanding of how open data can work. wants to help people by helping them find the government information they need. This data will allow other people to join them in this endeavor. As Tim says, “Create more value than you capture.” I hope that people will recognize the value in this data and create tools, apps, more efficient research methods, and perhaps even businesses based on it. I’m certain this data will prove to be valuable to many people who will discover applications of it that we haven’t imagined yet.

*Editor’s Note: is funded by O’Reilly AlphaTech Ventures.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.