In 2014, there are now decades of digital government history, an unknown amount of which has already been lost to bit rot and backup failures. Today, the United States National Archives has issued new guidance for federal agencies for transferring permanent electronic records.
In NARA Bulletin 2014-04, the Archives identifies the “preferred” and “acceptable” file formats for the U.S. government to use in transferring data and information into the nation’s digital memory.
The guidance on structured data formats should be a useful reference for all levels of government in the United States, as they prepare to contribute their digital archives to posterity. NARA states a preference for Comma Separated Value (CSV), OpenDocument Format Spreadsheet (ODS), ASCII Text, JavaScript Object Notation (JSON) and Extensible Markup Language (XML) files over proprietary formats.
Yesterday, I participated in a short teleconference with Canada’s open government advisory panel considering the next version of the country’s open government “action plan.” As readers may know, I accepted an invitation in 2012 from Canadian Minister of Parliament Tony Clement, the president of Canada’s Treasury Board, to be a member of Canada’s advisory panel on open government, joining others from Canada’s tech industry, the academy and civil society. (I shared several recommendations for open government in the first meeting, held on February 28th, 2012, and in another in 2013.)
In preparation for yesterday’s discussion, I downloaded the Open Government Partnership’s Internal Review Mechanism’s report on Canada, which highlights progress in meeting the country’s (largely self-defined) goals for open government, particularly with respect to open data, and identified significant weaknesses in the public consultation taken to date.
The consultative process during the development of the action plan was weak. The consultation, which was only done online, including a Twitter chat session with the TBS President, took place during a public holiday and no draft plan was circulated in advance for discussion. There was minimal awareness raising around the consultation process, which resulted in low participation.
The IRM researcher found minimal evidence of attempts to engage civil society during implementation of the action plan with the exception of the consultation on open data and the Open Government Licence. Consultation on commitments in these areas was seen as significantly stronger and more productive than the consultations for development of the action plan and the year one government selfRassessment.
Consultation of the self assessment report was carried out online and was not widely publicized, resulting in a limited level of participation.
Based upon this report and my own observations, I made three suggestions on yesterday’s call:
1) Adoption of an open source e-petition platform from the United Kingdom. While many people remain dubious about online petitions, the tool could be seeded with proposed open government reforms and solicit new ones.
2) Acknowledgement of ongoing debates about electronic surveillance. The Harper administration should launch a more proactive public discussion of what the Canadian people have a right to know about how their electronic communications are being collected, stored and used. Any broad consultation around open government Canada will include this issue.
3) More civic engagement with the media. If improving public consultation is a priority, government officials must go onto television and radio broadcasts, along with sitting down for print interviews. Public engagement through social media and government websites are simply not enough.
The Canadian government should also engage journalists who are making information requests, specifically data journalists, as they are key players in the ecosystem around confirming data releases and quality. If the government faces significant doubts, it will have to turn to more trusted third parties to validate its programs and their efficacy.
After months of discussion regarding how the government can avoid another healthcare.gov debacle, legislative proposals are starting to emerge in Washington. Last year, FITARA gathered steam before running into a legislative impasse. Today, a new draft bill introduced for discussion in the United House of Representatives proposes specific reforms that substantially parallel those made by the United Kingdom after a similar technology debacle in its National Health Service.
The subtext for the ‘Reforming Federal Procurement of Information Technology Act’ (RFP-IT), is the newfound awareness in Congress and the nation at large driven by the issues with Healthcare.gov that something is profoundly amiss in the way that the federal government buys, builds and maintains technology.
“Studies show that 94 percent of major government IT projects between 2003 and 2012 came in over budget, behind schedule, or failed completely, said Representative Anna G. Eshoo (D-CA), ranking member of the House Communications and Technology Subcommittee, and co-sponsor of RFP-IT, in a statement. “In an $80 billion sector of our federal government’s budget, this is an absolutely unacceptable waste of taxpayer dollars. Furthermore, thousands of pages of procurement regulations discourage small innovative businesses from even attempting to navigate the rules. Our draft bill puts proven best practices to work by instituting a White House office of IT procurement and gives all American innovators a fair shake at competing for valuable federal IT contracts by lowering the burden of entry.”
Create a U.S. Digital Government Office (DGO) that would not only govern the country’s mammoth federal information technology project portfolio more effectively but actively build and maintain aspects of it
Increase the size of a contract for IT services allowable under the Small Business Act from $100,000 to $500,000
Create a U.S. DGO fund supported by 5% of the fees collected by executive agencies for various types of contracts
“In the 21st century, effective governance is inextricably linked with how well government leverages technology to serve its citizens,” said Representative Gerry Connolly (D-VA), ranking member of the House Oversight and Government Reform Subcommittee, and co-sponsor of RFP-IT, in a statement. “Despite incremental improvements in federal IT management over the years, the bottom line is that large-scale federal IT program failures continue to waste taxpayers’ dollars, while jeopardizing our Nation’s ability to carry out fundamental constitutional responsibilities, from conducting a census to securing our borders. Our RFP-IT discussion draft recognizes that transforming how the federal government procures critical IT assets will likely require bolstering ongoing efforts to comprehensively strengthen general federal IT management practices with targeted enhancements that promote innovative and bold procurement strategies from the White House on down.”
“This, I think, really works well alongside FITARA, which calls for increased agency CIO authority,” wrote Johnson. “What will hopefully end up happening if both bills pass, is that good talent can get inside of government, and agencies that perform well can operate independently, and agencies that don’t can be pulled back in and reformed, while still having operational continuity (meaning: while that reform is happening, IT projects can still be done well, and run by the DGO).”
In 2014, digital government supports open government. What’s unclear is whether this proposal from two Democratic lawmakers can gain a Republican co-sponsor in the GOP-controlled legislative body or if a federal IT reform-minded Senator like Mr. Carper or Mr. Booker will take it up in the Senate.
This is singular bill isn’t a panacea, however, Johnson emphasized, pointing to the need to fix SAM.gov, the error-prone website for contractors to register with the federal government, and reforms to registration for “set-aside” business.
“We’re not sure how Congress writes a ‘stop throwing errors when a user clicks submit on sam.gov’ law,” wrote Johnson. “That’s going to take hearings, and most likely, a digital government office to fix. And we think this is a bill that complements Issa’s FITARA. Since this bill is at the discussion draft stage, perhaps soon we’ll see some Republicans jump on board.
UPDATE: On July 30, RFP-IT was officially introduced. (Full text of the bill, via Rep. Eshoo’s office): “The Reforming Federal Procurement of Information Technology (RFP-IT) Act, introduced by Rep. Anna G. Eshoo (D-Calif.), Ranking Member of the Communications and Technology Subcommittee, Rep. Gerry Connolly (D-Va.), Ranking Member of the Oversight and Government Reform Subcommittee, Rep. Richard Hanna (R-N.Y.), Chairman of the Small Business Subcommittee on Contracting and Workforce, and Rep. Eric Swalwell (D-Calif.), Ranking Member of the Committee on Science, Space and Technology’s Energy Subcommittee, and Rep. Suzan DelBene (D-Wash.)”
1) It would officially establish a Digital Government Office within the White House Office of Management and Budget (OMB), with the U.S. CIO at its head as a Senate-confirmed presidential appointee, reporting to the head of the OMB, shifting from “electronic government” to “digital government.” 2) It would codify the Presidential Innovation Fellows program. 3) It would expand competition for federal IT contracting under a simplified process that would ease the regulatory and compliance burden upon smaller companies bidding, bumping the threshold for information tech projects up to $500,000. 4) Establish a digital service pilot program 5) Direct the General Services Administrator to conduct an in-depth analysis of IT Schedule 70. 6) Direct the Comptroller General of the United States to produce three reports to Congress within 2 years of the law passing, on 1) the effectiveness of the 18F program of the General Services Administration, 2) IT Schedule 70, and 3) “challenges and barriers to entry for small business technology firms.”
The Google home page currently has a link to ask President Obama a question in a Google+ Hangout. That’s some mighty popular online real estate devoted to citizen engagement.
As ever, laws and institutions lag the rapid pace of technological change. In 2014, for instance, mandating that the person designated to publish federal information must be a practical printer “versed in the art of bookbinding” is a statutory remnant of a bygone age.
Last week, Senator Amy Klobuchar [D-MN] introduced the Government Publishing Office Act of 2014, S.1947, which would rename the United States Government Printing Office the Government Publishing Office. (It would also strike the bookbinding requirement.)
The current Public Printer of the United States supported the proposal. “Publishing defines a broad range of services that includes print, digital, and future technological advancements,” said Public Printer Davita Vance-Cooks, in a statement. “The name Government Publishing Office better reflects the services that GPO currently provides and will provide in the future. I appreciate the efforts of Senators Klobuchar and Chambliss for introducing and supporting this bill. GPO will continue to meet the information needs of Congress, Federal agencies, and the public and
carry out our mission of Keeping America Informed.”
“The idea of renaming GPO was discussed in a December Committee on House Administration hearing entitled “Mission of the Government Printing Office in a post-print world”, which I wrote about here,” said Daniel Schuman, policy director at Citizens for Responsibility and Ethics in Washington (CREW), in a blog post on the GPO bill.
One of the most important open government data efforts in United States history came into being in 1993, when citizen archivist Carl Malamud used a small planning grant from the National Science Foundation to license data from the Securities and Exchange Commission, published the SEC data on the Internet and then operated it for two years. At the end of the grant, the SEC decided to make the EDGAR data available itself — albeit not without some significant prodding — and has continued to do so ever since. You can read the history behind putting periodic reports of public corporations online at Malamud’s website, public.resource.org.
Two decades later, Malamud is working to make the law public, reform copyright, and free up government data again, buying, processing and publishing millions of public tax filings from nonprofits to the Internal Revenue Service. He has made the bulk data from these efforts available to the public and anyone else who wants to use it.
“This is exactly analogous to the SEC and the EDGAR database,” Malamud told me, in an phone interview last year. The trouble is that data has been deliberately dumbed down, he said. “If you make the data available, you will get innovation.”
November Form 990s now ready. http://t.co/HDoMzPjpY0 We have 7,335,804 Form 990s available. *STILL* no word from the IRS.
Making millions of Form 990 returns free online is not a minor public service. Despite many nonprofits file their Form 990s electronically, the IRS does not publish the data. Rather, the government agency releases images of millions of returns formatted as .TIFF files onto multiple DVDs to people and companies willing and able to pay thousands of dollars for them. Services like Guidestar, for instance, acquire the data, convert it to PDFs and use it to provide information about nonprofits. (Registered users view the returns on their website.)
As Sam Roudman reported at TechPresident, Luke Rosiak, a senior watchdog reporter for the Washington Examiner, took the files Malamud published and made them more useful. Specifically, he used credits for processing that Amazon donated to participants in the 2013 National Day of Civic Hacking to make the .TIFF files text-searchable. Rosiak then set up CItizenAudit.org a new website that makes nonprofit transparency easy.
“This is useful information to track lobbying,” Malamud told me. “A state attorney general could just search for all nonprofits that received funds from a donor.”
Malamud estimates nearly 9% of jobs in the U.S. are in this sector. “This is an issue of capital allocation and market efficiency,” he said. “Who are the most efficient players? This is more than a CEO making too much money — it’s about ensuring that investments in nonprofits get a return.
“I think inertia is behind the delay,” he told me, in our interview. “These are not the expense accounts of government employees. This is something much more fundamental about a $1.6 trillion dollar marketplace. It’s not about who gave money to a politician.”
If I order these IRS DVDs, my cost is $2910. Media and gov get them free, but none of them lifting a finger to help. http://t.co/B6m5VECV1O
When asked for comment, a spokesperson for the White House Office of Management and Budget said that the IRS “has been engaging on this topic with interested stakeholders” and that “the Administration’s Fiscal Year 2014 revenue proposals would let the IRS receive all Form 990 information electronically, allowing us to make all such data available in machine readable format.”
Today, Malamud sent a letter of complaint to Howard Shelanski, administrator of the Office of Information and Regulatory Affairs in the White House Office of Management and Budget, asking for a review of the pricing policies of the IRS after a significant increase year-over-year. Specifically, Malamud wrote that the IRS is violating the requirements of President Obama’s executive order on open data:
The current method of distribution is a clear violation of the President’s instructions to
move towards more open data formats, including the requirements of the May 9, 2013 Executive Order making “open and machine readable the new default for government
information.”
I believe the current pricing policies do not make any sense for a government
information dissemination service in this century, hence my request for your review.
There are also significant additional issues that the IRS refuses to address, including
substantial privacy problems with their database and a flat-our refusal to even
consider release of the Form 990 E-File data, a format that would greatly increase the
transparency and effectiveness of our non-profit marketplace and is required by law.
It’s not clear at all whether the continued pressure from Malamud, the obvious utility of CitizenAudit.org or the bipartisan budget deal that President Obama signed in December will push the IRS to freely release open government data about the nonprofit sector,
The furor last summer over the IRS investigating the status of conservative groups claimed tax-exempt status, however, could carry over into political pressure to reform. If political groups were tax-exempt and nonprofit e-file data were published about them, it would be possible for auditors, journalists and Congressional investigators to detect patterns. The IRS would need to be careful about scrubbing the data of personal information: last year, the IRS mistakenly exposed thousands of Social Security numbers when it posted 527 forms online — an issue that Malamud, as it turns out, discovered in an audit.
“This data is up there with EDGAR, in terms of its potential,” said Malamud. “There are lots of databases. Few are as vital to government at large. This is not just about jobs. It’s like not releasing patent data.”
If the IRS were to modernize its audit system, inspector generals could use automated predictive data analysis to find aberrations to flag for a human to examine, enabling government watchdogs and investigative journalists to potentially detect similar issues much earlier.
That level of data-driven transparency remains in the future. In the meantime, CitizenAudit.org is currently running on a server in Rosiak’s apartment.
Whether the IRS adopts it as the SEC did EDGAR remains to be seen.
Earlier today, at the White House Education Datapalooza, an official from the United States Department of the Treasury informed a packed theater and livestream that students, parents and citizens would finally be able to do something simple and profoundly useful over the Internet: download a transcript of their tax return from the Internal Revenue Service.
“I am very excited to announce that the IRS has just launched, this week, a transcript application which will give taxpayers the ability to view, print, and download tax transcripts,” said Katherine Sydor, a policy advisor in the Office of Consumer Policy of the Treasury, “making it easier for student borrowers to access tax records he or she might need to submit loan applications or grant applications.” [VIDEO]
Previously, filers could request a copy of the transcript (not the full return) but would have to wait 5-10 business days to receive it in the mail. For people who needed more rapid access for applications, the delay could be critical. A White House fact sheet subsequently confirmed the news, under the rubric of “streamlining application paperwork,” and a quick follow up with an official secured the correct URL for the new IRS Web application to get a tax transcript.
I created an account, which involved jumping through the hoops familiar from establishing online access bank accounts — choosing pass phrase, pass image and security questions — and then answered a number of questions that made it pretty clear that the IRS knew exactly who I was and where I had lived. (It’s not clear whether they hold this information or used a credit bureau, from the consumer-side.)
When I tried to actually download the transcript, though, I ran into some issues: first, a browser error in Chrome — “This XML file does not appear to have any style information associated with it. The document tree is shown below.” Using Firefox, however, I was able to at least get the page where I could choose from various years of transcripts.
Unfortunately, clicking any of the links delivered a file that my Macbook was unable to parse. I was, however, able to log into IRS.gov and easily download last year’s tax return with one click to my iPhone. Success!
While the technical problems I ran into suggest that Apple computer users might run into some issues, I have a funny feeling that (the vast majority) of people who are running Internet Explorer on a Windows machine will fare better.
The fact that American citizens could not access their own tax returns online in 2014 might seem jarring but, until this week, that was the status quo. This advance represents the sort of somewhat mundane but important shift that the Obama administration’s approach to digital government have enabled over the past five years.
While the troubles behind the botched launch of Healthcare.gov have shaken the confidence of many citizens in the capacity of this administration to deliver effective digital services and months of headlines about digital surveillance by the National Security Agency have diminished trust in government overall, the ability of the “tech surge” to fix the site and the success of the technology team at the Consumer Financial Protection Bureau not only offers a guide for how to avoid similar issues but highlights a less salacious and boring reality that will generate no headlines nor heated rhetoric on cable news shows: most public officials and civil servants are quietly working to deliver better customer service for citizens.
Being able to download a tax transcript online is not, however, without risks. The Internal Revenue Service will need to continue to be vigilant about security. The new functionality will almost certainly inspire fraudsters to create mockups of the government website that look similar and then send phishing emails to consumers, urging them to “log in” to fake websites.
Perhaps most problematically, people will download tax transcripts to mobile devices and laptops and then not take steps to protect them with encryption. If you do download your transcripts or personal health information, make sure to also install full disk encryption on every machine you own. Leaving your files unprotected there is like leaving the door to your house unlocked with your tax returns and medical records on the kitchen table.
I have asked the IRS for comment on the new feature, browser and operating system and security guidance and will update this post if and when I receive any.
Update: comment from the IRS on follows.
How much time and technical resources did the IRS invest in deploying the feature? Has the IRS increased the capacity of the website for more demand?
From establishing the business case and receiving funding plus approval to start the work to implementation took approximately one year. Additional time was spent in ideation, innovation, and confirming requirements of the product prior to receiving approval.
I had trouble downloading my transcript on an Apple computer using Chrome and Firefox. (I was able to get it through my iPhone.) What browsers and operating systems does the new function officially support?
As a web application, Get Transcript is supported on most modern OS/browser combinations. While there may be intermittent issues due to certain end-user configurations, IRS has not implemented any restrictions against certain browsers or operating systems. We are continuing to work open issues as they are identified and validated.
A side note: For the best user experience, taxpayers may want to try up-to-date versions of internet explorer and a supported version of Microsoft windows; however, that is certainly not a requirement.)
Does the IRS have any guidance for ensuring that Americans connect securely to the website and then protect tax returns on their home computers once they have downloaded them?
The IRS has made good progress on oversight and enhanced security controls in the area of information technology. With state-of-the-art technology as the foundation for our portal (e.g. irs.gov), we continue to focus on protecting the PII of all taxpayers when communicating with the IRS.
However, security is a two-way street with both the IRS and users needing to take steps for a secure experience. On our end, our security is comparable to leaders in private industry.
Our IRS2GO app has successfully completed a security assessment and received approval to launch by our cybersecurity organization after being scanned for weaknesses and vulnerabilities.
Any personally identifiable information (PII) or sensitive information transmitted to the IRS through IRS2Go for refund status or tax record requests uses secure communication channels that meet or exceed federal requirements for encryption. No PII is passed back to the taxpayer through IRS2GO and no PII is stored on the smartphone by the application.
When using our popular Where’s My Refund? application, taxpayers may notice just a few of our security measures. The URL for Where’s My Refund? begins with https. Just like in private industry, the “s” is a key indicator that a web user should notice indicating you are in a “secure session.” Taxpayers may also notice our message that we recommend they close their browser when finished accessing your refund status.
As we become a more mobile society and able to link to the internet while we’re on the go, we remind taxpayers to take precautions to protect themselves from being victimized, including using secure networks, firewalls, virus protection and other safeguards.
We always recommend taxpayers check with the Federal Trade Commission for the latest on reporting incidents of identity theft. You can find more information on our website, including tips if you believe you have become the victim of identity theft.
Does the IRS have any plans to provide Americans with access or insight to estimated tax returns online in the future? Now that we have the ability to establish user accounts, would it ever be possible, for instance, for people with simple taxes (1040EZ, etc) to log in, review an estimated return, make any required edits, and then e-file it on IRS.gov?
IRS: The IRS is considering a number of new proposals that may become a part of the online services roadmap some time in the future. This may include a taxpayer account where up to date status could be securely reviewed by the account owner.
This September, I visited the United Kingdom’s Ministry of Justice and looked at the last remaining section of the Magna Carta that remains in effect. I was not, however, in a climate-controlled reading room, looking at a parchment or sheepskin.
Rather, I was sitting in the Ministry’s sunny atrium, where John Sheridan was showing me the latest version of the seminal legal document, now living on online, on his laptop screen. The remaining section that is in force is rather important to Western civilization and the rule of law as many citizens in democracies now experience it:
NO Freeman shall be taken or imprisoned, or be disseised of his Freehold, or Liberties, or free Customs, or be outlawed, or exiled, or any other wise destroyed; nor will We not pass upon him, nor [X1condemn him,] but by lawful judgment of his Peers, or by the Law of the Land. We will sell to no man, we will not deny or defer to any man either Justice or Right.
From due process to eminent domain to a right to a jury trial, many of the rights that American or British citizens take as a given today have their basis in the English common law that stems from this document.
Over a cup of tea, Sheridan caught me up on the progress that his team has made in digitizing documents and improving the laws of the land. There are now 2 million monthly unique visitors to legislation.gov.uk every month, with 500+ million page views annually. People really are reading Parliament’s output, he observed, and increasingly doing so on tablets and mobile devices. The amount of content flowing into the site is considerable: according to Sheridan, the United Kingdom is passing laws at an estimated rate of 100,000 words every month, or twice as much as the complete works of Shakespeare.
Notable improvements over the years include the ability to compare the original text of legislation versus the latest version (as we did with the Magna Carta) and view a timeline of changes using a slider for navigation, exploring any given moment in time. Sheridan was particularly proud of the site’s rendering of legislation in HTML, include human-readable permanent uniform resource locators (URLS) and the capacity to produce on-demand PDFs of a given document. (This isn’t universally true: I found some orders appear still as PDFs).
More specifically, Sheridan highlighted a “good law” project, wherein the Office of the Parliamentary Counsel (OPC) of Britain is working to help develop plain language laws that are “necessary, clear, coherent, effective and accessible.” A notable component of this good law project is an effort to apply a tool used in online publishing, software development and advertising — A/B testing — to testing different versions of legislation for usability.
The video of a TedX talk embedded below by Richard Heaton, the permanent secretary of the United Kingdom’s Cabinet Office and first parliamentary counsel, explores the idea of “good law” at more length:
Sheridan went on to describe one of the more ambitious online collaborations between a government and its citizens I had heard of to date, a novel cross-Atlantic challenge co-sponsored by the UK and US governments, and a hairy legal technology challenge bearing down upon societies everywhere: what happens when software interprets the law?
For instance, he suggested, consider the increasing use of Oracle software around legislation. “As statutes are interpreted by software, what’s introduced by the code? What about quality testing?”
As this becomes a data problem, “you need information to contextualize it,” said Sheridan. “If you’re thinking about legislation as code, and as data, it raises huge questions for the rule of law.”
Sheridan has been one of the world’s foremost proponents of publishing legislative data through APIs, an approach that has come under criticism by open government data advocates after the government shutdown in the United States. (In 2014, forward-thinking governments publishing open data might consider provide basic visualization tools to site visitors, API access for third-party developers and internal users, and bulk data downloads.) One key difference between the approach of his team and other government entities might be that the National Archives are “dogfooding,” or consuming the same data through the same interface that they expect third-parties to use, as Sheridan wrote last March:
“We developed the API and then built the legislation.gov.uk website on top of it. The API isn’t a bolt-on or additional feature, it is the beating heart of the service. Thanks to this approach it is very easy to access legislation data – just add /data.xml or /data.rdf to any web page containing legislation, or /data.feed, to any list or search results. One benefit of this approach is that the website, in a way, also documents the API for developers, helping them understand this complex data.”
Perhaps because of that perspective, Sheridan, was as supportive of an APIs when we talked this September as he had been in 2012:
The legislation.gov.uk API has changed everything for us. It powers our website. It has enabled us to move to an open data business model, securing the editorial effort we need from the private sector for this important source of public data. It allows us to deliver information and services across channels and platforms through third party applications. We are developing other tools that use the API, using Linked Data – from recording the provenance of new legislation as it is converted from one format to another, to a suite of web based editorial tools for legislation, including a natural language processing capability that automatically identifies the legislative effects. Everything we do is underpinned by the API and Linked Data. With the foundations in place, the possibilities of what can be done with legislation data are now almost limitless.
Sheridan noted to me that the United Kingdom’s legislative open government data efforts are now acting as a platform for large commercial legal publishers and new entrants, like mobile legislative app, iLegal.
The iLegal app content is derived from the legislation.gov.uk API and offers handy features, like offline access to all items of legislation. iLegal currently costs £49.99/$74.99 annually or £149.99/$219.99 for a lifetime subscription, which might seem steep but is a fraction of the cost of of Halsbury’s Statutes, currently listed at £9,360.00 from Lexis-Nexis.
This approach to publishing the laws of the land online, in structured form under an open license, is an instantiation of the vision for Law.gov that citizen archivist Carl Malamud has been advocating for in the United States. 2013 saw some progress in that vein when the U.S. House of Representatives publishes U.S. Code as open government data.)
What’s notable about the United Kingdom’s example, however, is that less then a decade ago, none of this could have been possible. Why? As ScraperWiki founder Francis Irving explained, the UK’s database of laws was proprietary data until December 2006. Now, however, the law of the land is released back to the people as it is updated, a living code available in digital form to any member of the public that wishes to read or reuse it.
The United Kingdom, however, has moved beyond simply publishing legislation as open data: they’re actively soliciting civic participation in its maintenance and improvement. For the last year, the National Archives has been guiding the world’s leading commercial open data curation project.
“We are using open data as business model for fulfilling public services,” said Sheridan, in our interview. “We train people to do editorial work. They are paid to improve data. The outputs are public.”
In other words, the open government data always remains free to the people through legislation.gov.uk but any academic, nonprofit or commercial entity can act to add value to it and sell access to the resulting applications, analyses or interfaces.
Since the start of the UK project, they have doubled the number of people working on their open data, Sheridan told me. “The bottleneck is training,” he said. “We have almost unlimited editorial expertise available through our website. We define the process and rules, and then let anyone contribute. For example, we’re now working on revising legislation, identifying changes, researching it — when it comes in, what it affects — and then working with editor. Previous to this effort, government hasn’t been able to revise secondary legislation.”
Sheridan said that the next step is feedback for other editorial values.
“We’re looking for more experts,” he said. “They’re generally paid for by someone. It’s very close to open source software model. They must be able to demonstrate competence. There’s a 45-minute test, which we’re now given to thousands of people.”
If this continues to work, distributed online collaboration is a “brilliant way to help improve the quality of law,” said Sheridan.
“It’s a way to get the work done — and the work is really hard. You have to invest time and energy, and you must protect the reputation of the Archive. This is somewhat radical for the nation’s statute book. We have redesigned the process so people can work with us. It’s not a wiki, but participation is open. It’s peer production.”
A trans-Atlantic challenge to map legislative data
The U.K. National Archives and U.S. Library of Congress have asked for help mapping elements from bills to the most recent Akoma Ntoso schema. (Akoma Ntoso is an emerging global standard for machine-readable data describing parliamentary, legislative and judiciary documents.) The best algorithm that maps U.S. bill XML or UK bill XML to Akoma Ntoso XML, including necessary data files and supporting documentation, will win $10,000.
If you have both skills and interest, get cracking: the challenge closes on December 31, 2013.
Today, Democrats on the House Energy and Commerce Committee released a memorandum regarding a security briefing on the Affordable Care Act (embedded below) that includes a summary of a classified briefing from Dr. Kevin Charest, the HHS Chief Information Security Officer, and Ned Holland, HHS Assistant Secretary for Administration. The memorandum states that “there have been no successful security attacks on Healthcare.gov. In it, Dr. Charest is quoted as saying that “no person or group has hacked into Healthcare.gov, and no person or group has maliciously accessed any personally identifiable information from users.”
The authors of the memorandum, Representatives Henry A. Waxman and Diana DeGette, write that “the information provided in the briefing was reassuring,” given the assurances of the chief information security officer that “the security of Healthcare.gov has not been breached, and hackers have had no access to personally identifiable information.”
Despite this letter, it’s not clear whether the Healthcare.gov security concerns that TrustedSec has highlighted have been addressed. Given the continued focus of Congressional committees on the issue, expect more assessments and audits to emerge in the future.
As more and more governments release data around the world, the conditions under which it is published and may be used will become increasingly important. Just as open formats make data easier to put to work, open licenses make it possible for all members of the public to use it without fear.
Given that wonky but important issue, it’s important that governments that want to maximize the rewards of the work involved in cleaning and publishing open government data get the policy around its release right. Today, several open government advocates have released an updated Best-Practices Language for Making Data “License-Free”, which can found online at at theunitedstates.io/licensing.
“In short what we say is ‘Use Creative Commons Zero (CC0),’ which is a public domain dedication,” said Josh Tauberer, the founder of Govtrack.us, via email. “We provide recommended language to put on government datasets and software to put the data and code into the world-wide public domain. In a way, it’s the opposite of a license.
Tauberer, Eric Mill, developer at the Sunlight Foundation, and Jonathan Gray, director of policy and ideas at the Open Knowledge Foundation, who have been working on the guidance since May, all blogged about the new guidance:
“Back in May, the Administration’s Memorandum on Open Data created very confusing guidance for agencies about what constitutes open data by saying open data should be ‘openly licensed’,” explained Tauberer, via email. “In response to that, we began working on guidance for federal agencies for how to make sure their data in open under the definition in the 8 Principles of Open Government Data.”
The basic issue, he said, is that the memorandum directed agencies to make data open but, in the view of these advocates, told agencies the wrong thing about what open data actually means. “We’re correcting that with precise, actionable direction,” said Tauberer.
What would the consequences of United States government entities not adopting this guidance be?
“Because M-13-13 required open licensing as the new default, I worry about agencies taking the guidance too literally and applying licensing where they might not have before, even if the work is exempt from copyright,” said Tauberer. “Or they may now consider open licensing of works produced by a contractor to be the new norm, since it is permitted by M-13-13, but for certain core information produced by government this would be a major step backward.”
“Imagine if after FOIA’ing an agency’s deliberative documents, The New York Times was legally required to provide attribution to a contractor, or, worse, to the government itself,” said Tauberer. “The federal government is relying more and more on contractors and lawyers, so it’s important that we reinforce these norms now.”
While it remains to be seen if the White House Office of Management and Budget merges this best practice into its open data policy, the advocates have already had success getting it adopted.
“Since we first published the guidance in August, it’s led to three government projects using our advice,” said Tauberer. “Partly in response to our nudging, in October OSTP’s Project Open Data re-licensed its schema for federal data catalog inventory files. (It had been licensed under CC-BY because of non-governmental contributors to the schema, but now it uses CC0.) In September and October, The CFPB followed our guidance and applied CC0 to their “qu” project and their eRegs platform.”