Advocates Release Best Practices for Making Open Government Data “License-Free”

CC-0-PD-blog1As more and more governments release data around the world, the conditions under which it is published and may be used will become increasingly important. Just as open formats make data easier to put to work, open licenses make it possible for all members of the public to use it without fear.

Given that wonky but important issue, it’s important that governments that want to maximize the rewards of the work involved in cleaning and publishing open government data get the policy around its release right. Today, several open government advocates have released an updated Best-Practices Language for Making Data “License-Free”, which can found online at at theunitedstates.io/licensing.

“In short what we say is ‘Use Creative Commons Zero (CC0),’ which is a public domain dedication,” said Josh Tauberer, the founder of Govtrack.us, via email. “We provide recommended language to put on government datasets and software to put the data and code into the world-wide public domain. In a way, it’s the opposite of a license.

Tauberer, Eric Mill, developer at the Sunlight Foundation, and Jonathan Gray, director of policy and ideas at the Open Knowledge Foundation, who have been working on the guidance since May, all blogged about the new guidance:

“Back in May, the Administration’s Memorandum on Open Data created very confusing guidance for agencies about what constitutes open data by saying open data should be ‘openly licensed’,” explained Tauberer, via email. “In response to that, we began working on guidance for federal agencies for how to make sure their data in open under the definition in the 8 Principles of Open Government Data.”

The basic issue, he said, is that the memorandum directed agencies to make data open but, in the view of these advocates, told agencies the wrong thing about what open data actually means. “We’re correcting that with precise, actionable direction,” said Tauberer.

What would the consequences of United States government entities not adopting this guidance be?

“Because M-13-13 required open licensing as the new default, I worry about agencies taking the guidance too literally and applying licensing where they might not have before, even if the work is exempt from copyright,” said Tauberer. “Or they may now consider open licensing of works produced by a contractor to be the new norm, since it is permitted by M-13-13, but for certain core information produced by government this would be a major step backward.”

Getting ahead of these kinds of issues is not an abstract issue, similar to concerns about language regarding the “mosaic effect” in the U.S. open data policy.

“Imagine if after FOIA’ing an agency’s deliberative documents, The New York Times was legally required to provide attribution to a contractor, or, worse, to the government itself,” said Tauberer. “The federal government is relying more and more on contractors and lawyers, so it’s important that we reinforce these norms now.”

The language has been endorsed by many of the prominent open government advocates in the world, including the Sunlight Foundation, the Open Knowledge FoundationPublic Knowledge, The Center for Democracy and Technology, The Electronic Frontier Foundation, The Free Law Project, the OpenGov Foundation, Carl Malamud at Public.Resource.Org, Jim Harper at WashingtonWatch.com, Citizens for Responsibility and Ethics in Washington, and MuckRock News.

While it remains to be seen if the White House Office of Management and Budget merges this best practice into its open data policy, the advocates have already had success getting it adopted.

“Since we first published the guidance in August, it’s led to three government projects using our advice,” said Tauberer. “Partly in response to our nudging, in October OSTP’s Project Open Data re-licensed its schema for federal data catalog inventory files. (It had been licensed under CC-BY because of non-governmental contributors to the schema, but now it uses CC0.) In September and October, The CFPB followed our guidance and applied CC0 to their “qu” project and their eRegs platform.”

11 thoughts on “Advocates Release Best Practices for Making Open Government Data “License-Free”

  1. Pingback: Updated Guidance for Federal Agencies’ Open Data Licensing Joshua Tauberer’s Blog

  2. I appreciate the appeal of a CC0 approach to advocates – who see this as a way of giving unrestricted reuse rights to users of government data.

    However there’s a converse risk around CC0 data in that it is used without any requirement for attributing the data creator or linking to the original data source.

    When it comes to using data for public purposes it is important and strategic to provide a route to validate that the data is accurate and from an authentic source. In a Cc0 approach there is no requirement for a data reuser to link back to the source data, allowing the end user to check that the reuser has interpreted the data within context and not changed the data in ways that could render it inaccurate.

    By default end users (consumers) of reused data are less likely to trust the data and more likely to form inaccurate perceptions of the original source (government agency) under a CC0 license as there is greater room for accidental or malicious inaccurate use of the data to present a distorted perspective. Even where these do not occur, end users are given greater trust in any data reuse when they can see where the data came from, a benefit for those individuals, organisations and companies that are mashing up government open data to create new insights and services.

    In my view CC BY (Creative Commons By Attribution) offers the best balance of reuse rights and end user validation as by default it allows end users to easily link back to the original data source and validate the data for themselves. Of course not all end users are capable of doing this, however sufficient are to mitigate the impact.

    Academically it is best practice – even required – to link back to the primary source to allow a consumer to validate any perspective or repeat any experiment. Even though it is not normally for an academic audience, government data should be treated the same way – using CC BY rather than CC0.

  3. [following represents my personal view, not the view of the CFPB]

    Craig brings up a fantastic point, one that I’ve struggled *mightily* with since entering government service. After bouncing back and forth between both perspectives, I’ve finally settled down on the less constrained side of the fence which CC0 represents.

    Here’s a non-exhaustive list why:

    – Limited legal resources. There will always be a finite # of lawyers and a finite # of hours in the day. The very last thing we have the capacity for is sending notices to people we find in violation of a CC-BY provision. There are an almost infinite # of things they could be doing that would be a better use of resources and have a greater positive impact for citizens. If we can’t universally track the usage of the data to determine whether or not the proper attribution is in place, and don’t have the legal resources to universally enforce it, there’s no reason to be heavy handed in our licensing text.

    – Licenses don’t fix (most) problems. If someone consumes govt data, and knowingly or by mistake alters the data, the presence of the CC-BY does absolutely nothing to fix that. It doesn’t give me the ability to demand that their application or service be taken down until the data is fixed or the context corrected. It only gives me the right to ask for a link back. I can’t even demand where said link is placed, or that users consult that link before consuming the redistribution of the data.

    – The sharpest of edge cases. While some increasingly fractional # of users (compared to the # of govt data consumers) are capable of doing accurate validations given two data sources (presuming they are aware and understand any mathematical or structural transformations the data has gone through via the 3rd party), what then if they find said errors? Neither they nor the govt has the ability/capacity to stop that 3rd party, and even if they stop it for one particular use case, as govt data circulates more and more, where does this chain of responsibility end? It just increasingly becomes a larger eater of resources without end.

    – Trust and merit are related. If someone builds an application or service that has fraudulent govt data in it, the web already has multiple and redundant mechanisms to communicate to users the error. Those are above and beyond the fact that if I look at my weather app and it says its sunny outside and I walk into a hurricane when I leave my house, that app will not last long in the marketplace.

    If in some rare cases a service becomes very popular by misusing govt data, that will quickly be brought to the attention of data owners (presuming said agency has a great data catalog, but that’s a separate issue. :] ). Govt already has incredible powers to get their message into the media, and it would take a simple press release to effectuate a marketplace change at that point.

    – We should instead spend time making it easy to get it right. Gov agencies themselves are usually remixing and taking on data from other sources, public or private. We make it as easy as possible for our end users to understanding where our data comes and how it should be contextualized. That allows 3rd parties to get it right the first time as much as possible.

    All that being said, nation-states should be empowered to find their own balance around these issues. CC-BY is obviously great, and a huge improvement vs other regimes. From the US perspective, we have the concept of ‘public domain’, but few legal instruments to make it operational and real. For the US, I believe CC0 is the current best vehicle for us to give the public domain substance and give developers, academics, journalists, and consumers the certainty they need they can build value-added services on top of public data.

  4. I think there’s room for both a CC0 and a CC-BY (or even CC-BY-SA) in the context of open government data. Certainly those datasets which are in the public domain can and should be affirmatively marked CC0. While the preference is certainly for open government data to be cited, I tend to lean toward Noah’s rationale for why it’s just impractical to enforce such a requirement.

    However, the government also funds a wide variety of research activity and, as the memorandum on Increasing Access to the Results of Federally-Funded Scientific Research (http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf) advocates, the government also needs to “develop approaches for identifying and providing appropriate attribution to scientific datasets.” In this particular instance, CC0 will not accomplish that policy objective. That’s where CC-BY or CC-BY-SA make more sense.

  5. This is becoming a great discussion and I’ve highlighted it in my blog for more Australian and European perspectives.

    Let’s take some of the points raised by Noah and explore them, but first let me highlight that I look at this area from a global perspective, not from a US one.

    I appreciate Noah – and the other advocates – have taken a US-centric approach and recognise that their views may not apply across other jurisdictions.

    My initial standpoint is that open data must be considered a global movement and global resource. Individuals and organisations in any country must feel empowered to use open data from any jurisdiction in their activities. Certainly there may be more from a specific jurisdiction using that jurisdiction’s data, however I believe that we need to think global, not local, when setting the parameters that frame data.

    Secondly, the US might see itself as the leader in the open data movement – and it has notably led in a number of ways. However there needs to be care taken when US advocates move from leading to advocating others to mimic their approach. Taking a more conservative and sensitive approach to other nations and cultures tends to see greater gains and less pushback.

    Keep in mind that the US is one of the few jurisdictions with an existing tradition of rights-free government data release from your Federal Government, and this approach is alien and difficult to understand in most parts of the world.

    Also the CC0 license is not as widely accepted legally as the CC BY license and poses challenges in a number of jurisdictions. Even in many jurisdictions where CC is not yet legally accepted, the notion of free reuse with attribution is still manageable within their legal systems without extensive rework. The notion of CC0 is much harder for (conservative, risk-averse) public services and governments to adopt without considerable experience.

    None of this prevents the US from striking out with CC0 – but it could ‘spook the horses’ elsewhere and contribute to a slowdown, rather than acceleration, of open data acceptance. If the position is taken that without a CC0 position the data isn’t truly open, many jurisdictions would baulk at opening up their data in any way.

    When striding ahead with CC0, at least give other nations (their governments and public servants) a way to dip their toes in the water with ‘safer’ licenses also considered OK for open data release, allowing them to gradually get used to the temperature of the water and make their own decisions as to the best open license.

    – Limited legal resources.
    I appreciate that the US is the most litigatious society in the world, thus from a US perspective it makes sense not to legislate unless a law is vigorously and consistently enforced. This isn’t the case elsewhere in the world where legal rights are a risk-mitigation mechanism and don’t need to be enforced to be valued. Many governments see inappropriate reuse of their data as a significant risk when becoming more open and providing a legally-enforceable approach they can choose to use if this is realised allows this risk to managed, whether or not they choose in specific instances to enforce it.

    Notably laws do not always require expensive court cases to enforce. A gentle non-legal reminder that someone needs to link back to a data source is often sufficient in most parts of the world to encourage compliance.

    – Licenses don’t fix (most) problems.
    No-one has said they do. However the issue that is addressed by a CC BY license isn’t the inappropriate reuse of data (people can always do this regardless of the license) – what it does address is the fears and risk requirements of data providers by giving them the right to take a legally endorsed step if they choose.

    The provision of an attribution link is not onerous on data reusers and is a very low hurdle to meet – it isn’t a barrier to open data in any way I see, so there’s little cost in taking this approach for data reusers and much to gain in encouraging data providers to open their data by giving them a little legal security via an attribution license (whether they use it or not).

    – The sharpest of edge cases.
    See my point about litigious. Having a right doesn’t require that it be enforced.

    – Trust and merit are related.
    Definitely – but why not give a boost to that trust and provide a commercial benefit to the data reuser?

    If the data provider is trusted (at least to provide data), then it provides a trust benefit to the reuser to provide an attribution link… you can trust us because you can trust them. This can push use/sales of the data reuser’s services.

    Of course this doesn’t have to be forced, but there’s no cost to requiring it via a CC BY license. Again it’s not even a speed hump for the reuser to comply with CC BY.

    And should a particular data source not be trusted, requiring an attribution link is actually a useful caution or warning to end users, can encourage data reusers to seek out better quality datasets and encourage government agencies to invest in the quality of their own data – improving policy outcomes. That’s a good payout for a small attribution requirement.

    As to the ‘government can release a media release to correct inappropriate use’. Having worked in government it’s not a consistent remedy to rumours and innuendos – which I think US celebrities as well as the US government have also discovered.

    Sure governments have some ability to communicate its side, but not as much as people may think, particularly in jurisdictions with a high concentration of ownership of traditional media (such as in both Italy and Australia). Here we call it a Murdocracy – where traditional media can say whatever editors please and then downgrade any right of reply through placement, timing or ignoring alternative views (or facts). This situation exists in a number of liberal western democracies, so cannot be dismissed as only applicable to more dictatorial regimes.

    Essentially governments need as many levers as they can get to communicate their views and the facts. CC BY gives a lever which is extremely valuable.

    Other things….

    What’s the benefit of 0?
    One thing I’ve not seen discussed by any of the CC0 advocates is the benefit of the 0 over other CC licenses (such as CC BY).

    I talk to a lot of public servants about CC licenses – often have to explain them to people (even those in legal teams) who are unfamiliar with the concept even though it has been accepted and even mandated for some jurisdictions here in Australia.

    One thing I’ve never been able to communicate is the additional benefits offered by CC0 over CC BY.

    This may be a limitation in me, however I don’t see any advantage to data providers, data reusers or end users of removing that BY obligation.

    Can the advocates can offer up some compelling reason why dropping the BY provides significant advantage to any or all of these groups and offsets the benefits I see for CC BY in reducing provider risk and increasing end user trust?

    Finally – CC0 is not as widely accepted as CC BY. It is not a legal license in Australia and doesn’t have the same international penetration as CC BY.

    This raises several issues. Firstly jurisdictions may not be able to enact a CC0 approach, even if they wanted to, without other legislation and legal changes. Secondly it presents an interesting legal position for reuse of data across jurisdictions.

    I’ve always treated US CC0 data as CC BY as this is legal in Australia, however this may not meet a legal test (if I am ever tested) and could pose challenges in many other jurisdictions as well.

    CC BY isn’t universally adopted, but it has broader reach thus, in my opinion, remains the sweet spot for data licensing around the world.

  6. Whoosh that’s a lot to reply to.

    When preparing the update to our guidance, Eric and I were aware of this issue of provenance or validation of the data. A similar issue had been raised by Bonnie Klein on James Jacob’s FreeGovInfo blog (http://wordpress.freegovinfo.info/node/3991), who said that private sector reusers of government data should be required to cite their source so that they can’t try to fool users into buying (from them) what they can get for free from the government.

    I don’t want to speak for Eric much but I think we both rejected the argument outright, and I certainly reject the argument that CC-BY is a better standard for everyone. For one, *our* government cannot under current law apply copyright licenses to federal government data, period. This is a long legal tradition rooted in 1st Amendment principles that our government does not control of the flow of information in the public, and rooted in the purpose of copyright to promote the science and arts though an incentive to publish. Our guidance affirms that tradition and says use the same tradition for works produced by a contractor and works used in foreign jurisdictions. The idea of applying CC-BY to government data would require a significant change to our copyright law, and a larger change to our legal and ethical traditions.

    So I reject the notion that what works for the world works for the United States. We’re not going to stop advocating for CC0 in the United States just because other countries have a laxer stance on these things. That’s ridiculous. And as our guidance says, we’re addressing U.S. federal government agencies and are by no means trying to impose this standard on the rest of the world.

    Craig you wrote, “One thing I’ve not seen discussed by any of the CC0 advocates is the benefit of the 0 over other CC licenses (such as CC BY).” We don’t discuss it much because, well, U.S. federal government works can’t be licensed to begin with. (I don’t see how one could coherently believe that works should be public domain domestically but attribution-required in other jurisdictions, so of course we don’t suggest that.) Further, I did address attribution specifically in my blog post, so please read that before making claims about what no one is talking about:
    http://razor.occams.info/blog/2013/12/12/updated-guidance-for-federal-agencies-open-data-licensing/.

  7. Hi Joshua,

    That’s fine – as a US Federal Government only position, CC0 clearly makes sense.

    Provided it isn’t being advocated more broadly, or as a model to follow globally, no worries, advocate away.

    BTW I don’t see other jurisdictions as having a ‘laxer stance on these things’. That’s your opinion, and that’s fine, but it could help if you spent some time understanding the context in which copyright operates in other countries – particularly at a time when the US is attempting to push its own views on copyright on other nations through the TPP.

    You might find that other nations have put far more thought and consideration into weighing the rights of copyright owners and reuse, particularly in a social context, rather than simply a commercial context, than the US has.

    And again the US isn’t the world, nor does it stand apart from the world. Nations have more in common than in difference, and open data has far more strategic and commercial value to everyone if all nations are open, not simply the US Federal government – so balancing the different needs of nations is far more important than simply advocating a single position for a single situation in the US.

  8. Pingback: Resumen de las noticias sobre web pública y transparencia a 19/12

  9. Pingback: In a win for open government advocacy, DC removes flaws its municpal open data policy | E Pluribus Unum

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.