Farewell, Thomas.gov. Hello, Congress.gov.

THOMAS-redirecting-to-Congress.gov_

On November 19th, Thomas.gov, the venerable website of the United States Congress, will begin to redirect visitors to Congress.gov. The new site, which launched in beta in September 2012, will become the primary governmental resource for the text of legislation, past, present and future, along with reports from committees, speeches from the floor of Congress and cost estimates from the Congressional Budget Office.

While the official announcement was made today by the Library of Congress, Thomas.gov’s custodian, leading headlines about Congress trading in the new Congress.gov and a note in Roll Call, the transition from THOMAS.gov to Congress.gov has been going on all fall, including updates to the new site and launching the Constitution Annotated and associated app.

THOMAS is centuries old, at least as measured in terms of Internet time. Launched in January of 1995, Thomas.gov was one of the first 23,000 websites to go online. When it went live the Internet had a worldwide user base of less than 40 million people, the majority of whom surfed the young World Wide Web using Mosaic and Netscape, checked their email on Eudora and dialed in on America Online. Watch the video below to get a sense of what life was like online nearly two decades ago.

Today, Thomas.gov receives, on average, 10 million visits every year, although I suspect many of those visits come from wonky repeat customers in or around the District of Columbia. I have no servers logs to prove that one way or another, but THOMAS has long been alternately beloved of or bemoaned by Congressional staffers and correspondents, all of whom have had to rely upon its increasingly creaky infrastructure for nearly two decades as the national repository of legislation and reports. So, too, have millions of Americans around the rest of the country who want to read proposed bills.

While incremental improvements to search and sharing in recent years have improved the site, for a decade people interested in tracking Congress have increasingly turned to sites like Govtrack or the New York Times for data created by scraping THOMAS. What does that mean, in practice? While Congress.gov will be official source of information, until its operators move to act as a platform for legislative data instead of a portal for legislative information. Open government advocates have been calling for the release of bulk legislative data for many years, culminating in frustration this September when a Library of Congress cost estimate acknowledged that Congress.gov “was not designed specifically to facilitate the extraction of the data as XML documents for bulk download.”

Putting the issue of bulk data aside, the new Congress.gov is an immense improvement on THOMAS in every way, as I reported last year:

Tapping into a growing trend in government new media, the new Congress.gov features responsive design, adapting to desktop, tablet or smartphone screens. It’s also search-centric, with Boolean search and, in an acknowledgement that most of its visitors show up looking for information, puts a search field front and center in the interface. The site includes member profiles for U.S. Senators and Representatives, with associated legislative work. In a nod to a mainstay of social media and media websites, the new Congress.gov also has a “most viewed bills” list that lets visitors see at a glance what laws or proposals are gathering interest online.

Since September 2012 digital staff at the Law Library of Congress have been busy since the Congress.gov launched in beta, adding new features and context at a steady pace, including adding the Congressional Recordcommittee reports, standing committee pages, and the ability to “Search within results.

On November 19th, when THOMAS is retired, the social media outposts of the site will also transition. @THOMASDotGov will transition its more than 15,500 followers to a new identity.

In a press release, the Library of Congress indicated that the old site will remain accessible from the Congress.gov homepage through late 2014. After that, historians may have to hope that the National Archives adopts whatever code or data retains historical interest into its servers, lest it moulder and succumb to bitrot — unfortunately, the configuration of the robots.txt file for Thomas.gov appears to have prevented the Internet Archive from preserving its iterations over the years.

If you’re interested in learning how to use the new Congress.gov, you can register at beta.congress.gov/help for training sessions scheduled for November 14, January 16, March 11 and March 16.

Cato study rating Congress on open legislative data gives low grades for transparency

The school year may have just begun but Congress has already received an early report card on the transparency of its legislative data. The verdict? A 2.47 GPA, on average, if you don’t include the 4 Incompletes. That’s on average a bit better than a C+, for those who’ve long since forgotten how grade point averages are computer. It also means that while Congress “passed this term,” any teacher’s note would likely include a stern warn that when it comes to legislative transparency, the student needs to show improvement before graduation.

Rate Congress Transparency Report Card(function() { var scribd = document.createElement(“script”); scribd.type = “text/javascript”; scribd.async = true; scribd.src = “http://www.scribd.com/javascripts/embed_code/inject.js”; var s = document.getElementsByTagName(“script”)[0]; s.parentNode.insertBefore(scribd, s); })();

Jim Harper, director of information policy studies at the Cato Institute, analyzed the “Publication Practices for Transparent Government” and found it a bit, well, wanting.

If you’re interested in opening up the United States federal legislative system, you can tune into a livestream of special DC forum this morning where Harper and other open government stakeholders “rates Congress. Brandon Arnold, director of government affairs at the Cato Institute, will moderate a discussion between Harper, Rep. Darrell Issa, chairman of the House Committee on Oversight and Government Reform, and John Wonderlich, policy director at the Sunlight Foundation.

new TWTR.Widget({
version: 2,
type: ‘search’,
search: ‘#ratecongress’,
interval: 30000,
title: ‘How transparency is’,
subject: ‘US legislative data?’,
width: ‘auto’,
height: 300,
theme: {
shell: {
background: ‘#07567a’,
color: ‘#ffffff’
},
tweets: {
background: ‘#ffffff’,
color: ‘#444444’,
links: ‘#b31919’
}
},
features: {
scrollbar: false,
loop: true,
live: true,
hashtags: true,
timestamp: true,
avatars: true,
toptweets: true,
behavior: ‘default’
}
}).render().start();

A better data model

The Cato paper analyzes Congressional achievement through the lens of four basic concepts in data publication: authoritative sourcing, availability, machine-discoverability, and machine-readability. “Together, these practices will allow computers to automatically generate the myriad stories that the data Congress produces has to tell,” writes Harper in a blog post today. “Following these practices will allow many different users to put the data to hundreds of new uses in government oversight.

That data model used to produce this analysis should be of interest to the broader open government data community, in terms of a good matrix for rating a given legislature. “Data modeling is pretty arcane stuff, but in this model we reduced everything to ‘entities,’ each having various ‘properties,’ explained Harper. “The entities and their properties describe the logical relationships of things in the real world, like members of Congress, votes, bills, and so on. We also loosely defined several ‘markup types’ guiding how documents that come out of the legislative process should be structured and published. Then we compared the publication practices in the briefing paper to the ‘entities’ in the model.”

While the obvious takeaway is that Congress could do better, Harper gives the Senate and House due credit and time to improve. “This stuff is tough sledding,” he allowed. “The data model isn’ the last word, and there are things happening varied places on and around Capitol Hill to improve things. Several pieces of the legislative process nobody has ever talked about publishing as data before, so we forgive the fact that this isn’t already being done. If things haven’t improved in another year, then you might start to see a little more piquant commentary.”