Opening IRS e-file data would add innovation and transparency to $1.6 trillion U.S. nonprofit sector

One of the most important open government data efforts in United States history came into being in 1993, when citizen archivist Carl Malamud used a small planning grant from the National Science Foundation to license data from the Securities and Exchange Commission, published the SEC data on the Internet and then operated it for two years. At the end of the grant, the SEC decided to make the EDGAR data available itself — albeit not without some significant prodding — and has continued to do so ever since. You can read the history behind putting periodic reports of public corporations online at Malamud’s website, public.resource.org.

Meals-on-Wheels-Reports

Two decades later, Malamud is working to make the law public, reform copyright, and free up government data again, buying, processing and publishing millions of public tax filings from nonprofits to the Internal Revenue Service. He has made the bulk data from these efforts available to the public and anyone else who wants to use it.

“This is exactly analogous to the SEC and the EDGAR database,” Malamud told me, in an phone interview last year. The trouble is that data has been deliberately dumbed down, he said. “If you make the data available, you will get innovation.”

Making millions of Form 990 returns free online is not a minor public service. Despite many nonprofits file their Form 990s electronically, the IRS does not publish the data. Rather, the government agency releases images of millions of returns formatted as .TIFF files onto multiple DVDs to people and companies willing and able to pay thousands of dollars for them. Services like Guidestar, for instance, acquire the data, convert it to PDFs and use it to provide information about nonprofits. (Registered users view the returns on their website.)

As Sam Roudman reported at TechPresident, Luke Rosiak, a senior watchdog reporter for the Washington Examiner, took the files Malamud published and made them more useful. Specifically, he used credits for processing that Amazon donated to participants in the 2013 National Day of Civic Hacking to make the .TIFF files text-searchable. Rosiak then set up CItizenAudit.org a new website that makes nonprofit transparency easy.

“This is useful information to track lobbying,” Malamud told me. “A state attorney general could just search for all nonprofits that received funds from a donor.”

Malamud estimates nearly 9% of jobs in the U.S. are in this sector. “This is an issue of capital allocation and market efficiency,” he said. “Who are the most efficient players? This is more than a CEO making too much money — it’s about ensuring that investments in nonprofits get a return.

Malamud’s open data is acting as a platform for innovation, much as legislation.gov.uk is the United Kingdom. The difference is that it’s the effort of a citizen that’s providing the open data, not the agency: Form 990 data is not on Data.gov.

Opening Form 990 data should be a no-brainer for an Obama administration that has taken historic steps to open government dataLiberating nonprofit sector data would provide useful transparency into a $1.6 trillion dollar sector for the U.S. economy.

After many letters to the White House and discussions with the IRS, however, Malamud filed suit against the IRS to release Form 990 data online this summer.

“I think inertia is behind the delay,” he told me, in our interview. “These are not the expense accounts of government employees. This is something much more fundamental about a $1.6 trillion dollar marketplace. It’s not about who gave money to a politician.”

When asked for comment, a spokesperson for the White House Office of Management and Budget said that the IRS “has been engaging on this topic with interested stakeholders” and that “the Administration’s Fiscal Year 2014 revenue proposals would let the IRS receive all Form 990 information electronically, allowing us to make all such data available in machine readable format.”

Today, Malamud sent a letter of complaint to Howard Shelanski, administrator of the Office of Information and Regulatory Affairs in the White House Office of Management and Budget, asking for a review of the pricing policies of the IRS after a significant increase year-over-year. Specifically, Malamud wrote that the IRS is violating the requirements of President Obama’s executive order on open data:

The current method of distribution is a clear violation of the President’s instructions to
move towards more open data formats, including the requirements of the May 9, 2013
Executive Order making “open and machine readable the new default for government
information.”

I believe the current pricing policies do not make any sense for a government
information dissemination service in this century, hence my request for your review.
There are also significant additional issues that the IRS refuses to address, including
substantial privacy problems with their database and a flat-our refusal to even
consider release of the Form 990 E-File data, a format that would greatly increase the
transparency and effectiveness of our non-profit marketplace and is required by law.

It’s not clear at all whether the continued pressure from Malamud, the obvious utility of CitizenAudit.org or the bipartisan budget deal that President Obama signed in December will push the IRS to freely release open government data about the nonprofit sector,

The furor last summer over the IRS investigating the status of conservative groups claimed tax-exempt status, however, could carry over into political pressure to reform. If political groups were tax-exempt and nonprofit e-file data were published about them, it would be possible for auditors, journalists and Congressional investigators to detect patterns. The IRS would need to be careful about scrubbing the data of personal information: last year, the IRS mistakenly exposed thousands of Social Security numbers when it posted 527 forms online — an issue that Malamud, as it turns out, discovered in an audit.

“This data is up there with EDGAR, in terms of its potential,” said Malamud. “There are lots of databases. Few are as vital to government at large. This is not just about jobs. It’s like not releasing patent data.”

If the IRS were to modernize its audit system, inspector generals could use automated predictive data analysis to find aberrations to flag for a human to examine, enabling government watchdogs and investigative journalists to potentially detect similar issues much earlier.

That level of data-driven transparency remains in the future. In the meantime, CitizenAudit.org is currently running on a server in Rosiak’s apartment.

Whether the IRS adopts it as the SEC did EDGAR remains to be seen.

[Image Credit: Meals on Wheels]

Intelligence community turns to Tumblr and Twitter to provide more transparency on NSA surveillance programs


Yesterday afternoon, the Office of the Director of National Intelligence began tumbling towards something resembling more transparency regarding the National Security Agency’s electronic surveillance programs.

The new tumblog, “Intelligence Community on the Record,” is a collection of  statementsdeclassified documents, congressional testimony by officials, speeches & mediainterviewsfact sheets, details of oversight & legal compliance, and video. It’s a slick, slim new media vehicle, at least as compared to many government websites, although much of the content itself consists of redacted PDFs and images. (More on that later.) It’s unclear why ODNI chose Tumblr as its platform, though the lack of hosting costs, youthful user demographics and easy publishing have to have factored in.

In the context of the global furor over electronic surveillance that began this summer when the Washington Post and Guardian began publishing stories based upon the “NSA Files” leaked by former NSA contractor Edward Snowden, the new tumblr has been met with a rather …skeptical reception online.

Despite its reception, the new site does represent a followthrough on President Obama’s commitment to set up a website to share information with the American people about these programs. While some people in the federal technology sector are hopeful:

…the site won’t be enough, on its own. The considerable challenge that it and the intelligence community faces is the global climate of anger, fear and distrust that have been engendered by a summer of fiery headlines. Despite falling trust in institutions, people still trust the media more than the intelligence community, particularly with respect to its role as a watchdog.

Some three hours after it went online, a series of new documents went online and were tweeted out through the new Twitter account, @IConTheRecord:

The launch of the website came with notable context.

First, as the Associated Press reported, some of the documents released were made public after a lawsuit by the Electronic Frontier Foundation (EFF). In a significant court victory, the EFF succeeded in prompting the release of a 2011 secret court opinion finding NSA surveillance unconstitutional. It’s embedded below, along with a release on DNI.gov linked through the new tumblr.

The opinion showed that the NSA gathered thousands of Americans’ emails before the court struck down the program, causing the agency to recalibrate its practices.

Second, Jennifer Valentino and Siobhan Gorman Carpenter reported at The Wall Street Journal that the National Security Agency can reach 75% of Internet traffic in the United States. Using various programs, the NSA applies algorithms to filter and gather specific information from a dozen locations at major Internet junctions around North America. The NSA defended these programs as both legal and “respectful of Americans’ privacy,” according to Gorman and Valentino: According to NSA spokeswoman Vanee Vines, if American communications are “incidentally collected during NSA’s lawful signals intelligence activities,” the agency follows “minimization procedures that are approved by the U.S. attorney general and designed to protect the privacy of United States persons.”

The story, which added more reporting to confirm what has been published in the Guardian and Washington Post, included a handy FAQ with a welcome section detailed what was “new” in the Journal’s report. The FAQ also has clear, concise summaries of fun questions you might still have about these NSA programs after a summer of headlines, like “What privacy issues does this system raise?” or “Is this legal?”

The NSA subsequently released a statement disputing aspects of the Journal’s reporting, specifically the “the impression” that NSA is sifting through 75% of U.S. Internet communications, which the agency stated is “just not true.” The WSJ has not run a correction, however, standing by its reporting that the NSA possesses the capability to access and filter a majority of communications flowing over the Internet backbone.

Reaction to the disclosures has fallen along pre-existing fault lines: critical lawmakers and privacy groups are rattled, while analysts point to a rate of legal compliance well above 99%, with now-public audits showing most violations of the rules and laws that govern the NSA coming when “roamers” from outside of the U.S.A. traveled to the country.

Thousands of violations a year, however, even if they’re out of more than 240,000,000 made, is still significant, and the extent of surveillance reported and acknowledged clearly has the potential to have a chilling effect on free speech and press freedom, from self-censorship to investigative national security journalism. The debates ahead of the country, now more informed by disclosures, leaks and reporting, will range from increased oversight of programs to legislative proposals to update laws for collection and analysis to calls to significantly curtail or outright dissolve these surveillance programs all together.

Given reports of NSA analysts intentionally abusing their powers, some reforms to the laws that govern surveillance are in order, starting with making relevant jurisprudence public. Secret laws have no place in a democracy.

Setting all of that aside for a moment — it’s fair to say that this debate will continue playing out on national television, the front pages of major newspapers and online outlets and in the halls and boardrooms of power around the country — it’s worth taking a brief look at this new website that President Obama said will deliver more transparency into surveillance programs, along with the NSA’s broader approach to “transparency”. To be blunt, all too often it’s looked like this:

…so heavily redacted that media outlets can create mad libs based upon them.

That’s the sort of thing that leads people to suggest that the NSA has no idea what ‘transparency’ means. Whether that’s a fair criticism or not, the approach taken to disclosing documents as images and PDFs does suggest that the nation’s spy agency has not been following how other federal agencies are approaching releasing government information.

As Matt Stoller highlighted on Twitter, heavily redacted, unsearchable images make it extremely difficult to find or quote information.

Unfortunately, that failing highlights the disconnect between the laudable efforts the Obama administration has made to release open government data from federal agencies and regulators and the sprawling, largely unaccountable national security state aptly described as Top Secret America.”

Along with leak investigations and prosecution of whistleblowers, drones and surveillance programs have been a glaring exception to federal open government efforts, giving ample ammunition to those who criticize or outright mock President Obama’s stated aspiration to be the “most transparent administration in history.” As ProPublica reported this spring, the administration’s open government record has been mixed. Genuine progress on opening up data for services, efforts to leverage new opportunities afforded by technology to enable citizen participation or collaboration, and other goals set out by civil society has been overshadowed with failures on other counts, from the creation of the Affordable Care Act to poor compliance with the Freedom of Information Act and obfuscation of the extend of domestic surveillance.

In that context, here’s some polite suggestions to the folks behind the new ODNI tumblr regarding using the Web to communicate:

  • Post all documents as plaintext, not images and PDFs that defy easy digestion, reporting or replication. While the intelligence budget is classified, surely some of those untold billions could be allotted to persons taking time to release information in both human- and machine-readable formats.
  • Put up a series of Frequently Asked Questions, like the Wall Street Journal’s. Format them in HTML. Specifically address that reporting and provide evidence of what differs. Posting the joint statement on the WSJ stories as text is a start but doesn’t go far enough.
  • Post audio and plaintext transcripts of conference calls and all other press briefings with “senior officials.” Please stop making the latter “on background.” (The transcript of the briefing with NSA director of compliance John DeLong is a promising start, although getting it out of a PDF would be welcome.
  • Take questions on Twitter and at questions@nsa.gov or something similar. If people ask about programs, point them to that FAQ or write a new answer. The intelligence community is starting behind here, in terms of trust, but being responsive to the public would be a step in the right direction.
  • Link out to media reports that verify statements. After DNI Clapper gave his “least untruthful answer” to Senator Ron Wyden in a Congressional hearing, these “on the record” statements are received with a great deal of skepticism by many Americans. Simply saying something is true or untrue is unlikely to be received as gospel by all.
  • Use animated GIFs to communicate with a younger demographic. Actually, scratch that idea.

Fung outlines principles for democratic transparency and open government

Archon Fung has published a new paper” [PDF] on open government, information and democracy. The abstract includes a useful breakdown of the components of democratic transparency:

In Infotopia, citizens enjoy a wide range of information about the organizations
upon which they rely for the satisfaction of their vital interests. The provision of
that information is governed by principles of democratic transparency. Democratic
transparency both extends and critiques current enthusiasms about transparency. It
urges us to conceptualize information politically, as a resource to turn the behavior of
large organizations in socially beneficial ways. Transparency efforts have targets, and we
should think of those targets as large organizations: public and civic, but especially private
and corporate. Democratic transparency consists of four principles. First, information
about the operations and actions of large organizations that affect citizens’ interests
should be rich, deep, and readily available to the public. Second, the amount of available
information should be proportionate to the extent to which those organizations
jeopardize citizens’ interests. Third, information should be organized and provided in
ways that are accessible to individuals and groups that use that information. Finally, the
social, political, and economic structures of society should be organized in ways that
allow individuals and groups to take action based on Infotopia’s public disclosures.

Fung’s paper focuses on focus upon “information about the activities of
large organizations—especially corporations and governments—rather than individuals” and “the important, defensive, face of the informational problem: information that people need to protect themselves against the actions of large organizations and to navigate the terrain created by such organizations,” as opposed to the myriad positive uses of open government data.

Will Maryland’s new open data initiative be a platform for a more open government?

Maryland joined 39 other states in the union when it officially launched its open data inititive on Wednesday.

Governor Martin O’Malley unveiled Data.Maryland.gov at a panel discussion in Annapolis on Wednesday, at a panel discussion hosted in conjunction with the Future of Information Alliance (FIA), an inter-disciplinary partnership between the University of Maryland, College Park and 10 founding partners.

“Big data is forever changing the way we manage, market, and move information, and in Maryland, it is also changing the way we govern with better choices and better results,” said Governor O’Malley. “Together, we set public goals, relentlessly measure government performance on a weekly basis, broadly share information, and put it on the internet for all to see. We publicly identify our problems and crowd source the solutions with open access to data. That’s why today we’re launching data.maryland.gov – a movement away from ideological, hierarchal, bureaucratic governing and toward information-age governing that is fundamentally entrepreneurial, collaborative, relentlessly interactive and performance driven.”

The path to standing up Maryland’s new open data platform extends back into the last decade when the O’Malley administration and the state’s legislature first started taking substantive steps towards putting more government data online.

These efforts were preceded by two important open government laws that laid a foundation for transparency in the 21st century:

1970: Maryland passes Public Information Act that established the public’s right to inspect public records, providing that “[a]ll persons are entitled to have access to information about the affairs of government and the official acts of public officials and employees.”

1977: Maryland passes an Open Meetings Act to “allow the general public to view the entire deliberative process.”

2008: Governor O’Malley launched StateStat, publishing performance and management statistics online. The governor subsequently touted the use of performance data a year later as a way to save taxpayer dollars. “RSS, XML, GIS, API: this is what smart, transparent governance will look like in the years ahead,” he said.

2010: Maryland webcasts more hearings and meetings online.

June 2011: Maryland General Assembly establishes a Joint Committee on Transparency and Open Government

April 2012: (Former) Maryland chief innovation officer Bryan Sivak hosts open data roundtable. [Baltimore Sun]

December 2012: Maryland Governor Martin O’Malley establishes an open data working group with an executive order. [Maryland.gov]

May 2013: Maryland launches data.maryland.gov using Socrata’s cloud-based open data platform.

Whither open government?

While the launch of an open data platform is an important digital milestone, it doesn’t in of itself address substantive concerns about Maryland’s open government challenges. TechPresident asked whether Maryland becoming the open government state in 2011, a question that came loaded with decades of context.

On the one hand, the new open data is a substantive step towards addressing the criticisms of open government advocates who noted that Maryland was lagging other states in the nation in its digital initiatives.

On the other, the 236 datasets on data.maryland.gov at launch do not include spending data. Many transparency advocates would like to see that change: Maryland received a low grade in PIRG’s annual report on government spending, as examined through the prism of  data delivering online.

According to PIRG, “Maryland’s transparency website, which garnered a ‘C’ grade, provides checkbook-level information on contracts and other expenditures. However, it lacks detailed information on economic development tax credits and the projected and achieved benefits of economic development subsidies.”

The state government’s compliance with Maryland’s Freedom of Information Act (PDF) is also unclear. While journalists, researchers and other freedom of information requestors now have a new way to ask for data (a nominate button on the new open data website) if they don’t receive an immediate reply, they’ll be hard-pressed to know who to turn to in individual agencies. There is, as of yet, no comprehensive list of Maryland FOIA officers online yet, nor independent institution, auditor or ombudsman with statutory authority to ensure that FOIA requests are complied with in a timely or effective manner.

It’s unclear whether any of this new open data will substantially mitigate Maryland’s record on transparency. According to report card by State Integrity, Maryland ranks 40th in the nation when assessed on 14 different categories</a.

While access to electronic information may improve, Maryland’s story includes a political history rife with corruption in the latter part of the 20th century and a present marked by murky procurement policies, oft-ignored auditors’ reports, spotty access to information and limitied executive and legislative branch accountability.

As Christian Borge detailed for Public Integrity in August of 2012, Maryland faces open government challenges around lobbying, contracting and political cronyism. Websites like StateStat, BayStat, and GreenPrint have featured data disclosures made at the discretion of the O’Malley administration, as is the case with this new open data platform. The state of play in Maryland is an excellent example of the ambiguity of open government and open data, where states release data relevant to services, performance or of economic value but not requests from the media for information related to the exercise (or abuse) of power, the existence of policial corruption or potentially embarrassing errors.

This state of affairs is what led to iSolon.org president Jim Snider to decry Maryland’s fake open government in 2010, much as open government advocates have criticized the Obama administration’s record on open data, open government and FOIA compliance. As Snider pointed out in March, Maryland’s Board of Elections also has serious open government issues.

Whether any of this figures into the 2014 election for governor remains to be seen. Maryland Attorney General Doug Gansler is a leading contender in the crowded field in the developing 2014 MD gubernatorial race. Whether the leading law enforcement official in Maryland chooses to make open data or open government part of the issues in his campaign is, like the political winds in Annapolis, not clear. To date, Gansler’s record on technology primarily has focused upon targeting sexual predators on social networking sites, not using digital technology to make Maryland government more open, transparent or accountable to its 5.8 million people.

None of this means that Maryland’s new open data initiative won’t matter for government transparency, improved civic services or economic activity in the private sector. This step forward does matter and adds what increasingly looks like a basic building block for governance to Maryland’s toolkit. It just means that the citizens of the Old Line State by the Bay need to keep asking for more than data from their elected officials.