What cities can learn from Gainesville’s experiment with radical transparency

City_of_Gainesville_Commissioner_Mail_Archive_-_Message_Archive

There’s much to be learned from the experience of the city Gainesville, Florida, where a commissioners voted in 2014 to publish the public’s email correspondence with them and the mayor online.

More than five years on, the city government and its residents have are ground zero for an tumultuous experiment in hyper-transparent government in the 21st century, as Brad Harper reports for the Montgomery Advertiser.

It’s hard not to read this story and immediately see a core flaw in the design of this digital governance system: the city government is violating the public’s expectation of privacy by publishing email online.

“Smart cities” will look foolish if they adopt hyper-transparent government without first ensuring the public they serve understands whether their interactions with city government will be records and published online.

Unexpected sunshine will also dissolve public trust if there’s a big gap between the public’s expectations of privacy and the radical transparency that comes from publishing the emails residents send to agencies online.

Residents should be offered multiple digital options for interacting with governments. In addition to exercising their rights to freedom of expression, assembly and petition on the phone, in written communications with a given government, or in person at hearing or town halls, city (and state) governments should break down three broad categories of inquiries into different channels:

Emergency Requests: Emergency calls go to 911 from all other channels. Calls to 911 are recorded but private by default. Calls should not be disclosed online without human review.

Service Requests: Non-emergency requests should go 311, through a city call center or through 311 system. Open data with 311 requests is public by default and are disclosed online in real-time.

Information Requests: People looking for information should be able to find a city website through a Web search or social media. A city.gov should use a /open page that includes open data, news, contact information for agencies and public information officers, and a virtual agent or “chat bot” to guide their search.

If proactive disclosures aren’t sufficient, then there should be way to make Freedom of Information Act requests under the law if the information people seek is not online. But public correspondence with agencies should be private by default.

 

US FOIA Advisory Committee considers recommendations to part the rising curtains of secrecy

Government secrecy, as measured by censorship or non-responsiveness under the Freedom of Information Act, is at an all-time high during the Trump administration. The Freedom of Information Act is getting worse under Trump for a variety of reasons. Continued secrecy … Continue reading

15 key insights from the Pew Internet and Life Project on the American public, open data and open government

Today, a new survey released by the Pew Research Internet and Life Project provided one of the most comprehensive snapshots into the attitudes of the American public towards open data and open government to date. In general, more people surveyed are guardedly optimistic about the outcomes and release of open data, although that belief does vary with their political views, trust in government, and specific areas.  (Full disclosure: I was consulted by Pew researchers regarding useful survey questions to pose.)

mixed-hopes-open-data-improve-pew

“Trust in government is the reference that people bring to their answers on open government and open data,” said John Horrigan, the principal researcher on the survey, in an interview. “That’s the frame of reference people bring. A lot of people still aren’t familiar with the notion, and because they don’t have a framework about open data, trust dominates, and you get the response that we got.”

more-trust-gov-more-benefits-open-data

While majorities of the American public use applications and services that use government data, from GPS to weather to transit to health apps, relatively few are aware that data produced and released by government drives them.

“The challenge for activists or advocates in this space will be to try to make the link between government data and service delivery outcomes,” said Horrigan. “If the goals are to make government perform better and maybe reverse the historic tide of lowered trust, then the goal is to make improvements real in delivery. If this is framed just as argument over data quality, it would go into an irresolvable back and forth into the quality of government data collection. If you can cast it beyond whether unemployment statistics are correct or not but instead of how government services improve or saved money, you have a chance of speaking to wether government data makes things better.”

The public knowledge gap regarding this connection is one of the most important points that proponents, advocates, journalists and publishers who wish to see funding for open data initiatives be maintained or Freedom of Information Act reforms pass.

“I think a key implication of the findings is that – if advocates of government data initiatives hope that data will improve people’s views about government’s efficacy – efforts by intermediaries or governments to tie the open data/open government to the government’s collection of data may be worthwhile,” said Horrigan. “Such public awareness efforts might introduce a new “mental model” for the public about what these initiatives are all about. Right now, at least as the data for this report suggests, people do not have a clear sense of government data initiatives. And that means the context for how they think about them has a lot to do with their baseline level of trust in the government – particularly the federal government.”

Horrigan suggested thinking about this using a metaphor familiar to anyone who’s attended a middle school dance.

“Because people do engage with the government online, just through services, it’s like getting them on a big dance floor,” he suggested. “They’re on the floor, where you want them, but they’re on the other part of it. They don’t know that there’s another part of the dance that they’d like to see or be drawn to that they’d want to be in. There’s an opportunity to draw them. The good news that they’re on the dance floor, the bad news is they don’t know about all of it. Someone might want to go over and talk to them an explain that if you go over here you might have a better experience.”

Following are 13 more key insights about the public’s views regarding the Internet, open data and government. For more, make sure to read the full report on open government data, which is full of useful discussion of its findings.

One additional worth noting before you dive in: this survey is representative of American adults, not just the attitudes of people who are online. “The Americans Trends Panel was recruited to be nationally representative, and is weighted in such a way (as nearly all surveys are) to ensure responses reflect the general population,” said Horrigan. “The overall rate of internet use is a bit higher than we typically record, but within the margin of error. So we are comfortable that the sample is representative of the general population.”

Growing number of Americans adults are using the Internet to get information and data

While Pew cautions that the questions posed in this survey are different from another conducted in 2010, the trend is clear: the way citizens communicate with government now includes the Internet, and the way government communicates with citizens increasingly includes digital channels. That use now includes getting information or data about federal, state and local government.

internet-use-info-government-pew

College-educated Americans and millennials are more hopeful about open data releases

college-grad-millennials-more-hopeful-open-data-pew

Despite disparities in trust and belief in outcomes, there is no difference in online activities between members of political parties

no-difference-trust-parties-in-online-activities

Wealthier Americans are comfortable with open data about real estate transactions but not individual mortgages

wealthy-comfortable-transaction-data-not-mortgages-pew

This attitude is generally true across all income levels.

people-comfortable-data-sharing-not-mortgage-pew

College graduates, millennials and higher-income adults are more likely to use data to monitor government performance

About a third of college grads, young people and wealthy Americans have checked out performance data or government contracting data, or about 50% more than other age groups, lower income or non-college grads.

college-grads-higher-income-monitor-performance-pew

The ways American adults interact with government services and data digitally are expanding

ways-people-interact-government-info-pew

But very few American adults think government data sharing is currently very effective:

few-think-govt-data-sharing-effective-pew

A small minority of Americans, however, have a great deal of trust in federal government at all:

majorities-low-trust-government-pew

In fact, increasing individual use of data isn’t necessarily correlated with belief in positive outcomes:

Pew grouped the 3,212 respondents into four quadrants, seen below, with a vertical axis ranging from optimism to skepticism and a horizontal axis that described use. Notably, more use of data doesn’t correlate to more belief in positive outcomes.

“In my mind, you have to get to the part of the story where you show government ran better as a result,” said Horrigan. “You have to get to a position where these stories are being told. Then, at least, while you’re opening up new possibilities for cynicism or skepticism, you’re at least focused on the data as opposed to trust in government.”

quadrants-belief-open-data

tech-profiles-quadrants-open-data-pew

Instead…

Belief in positive outcomes from the release of open data is correlated with a belief that your voice matters in this republic:

belief-voice-matters-believe-open-data-improves-pew

If you trust the federal government, you’re more likely to see the benefit in open data:

trust-fed-govt-more-likely-to-see-benefit

But belief in positive outcomes from the release of open data is related to political party affiliation:

democrats-more-likely-to-believe-open-data-pay-off-pew

Put simply, Democrats trust the federal government more, and that relates to how people feel about open data released by that government.

democrats-trust-fed-govt-more

 

Political party has an impact upon the view of open data in the federal government

One challenge is that if President Barack Obama says “open data” again, he may further associate the release of government data with Democratic policies, despite bipartisan support for open government data in Congress. If a Republican is elected President in November 2016, however, this particular attitude may well shift.

“That’s definitely the historic pattern, tracked over time, dating to 1958,” said Horrigan, citing a Pew study. “If if holds and a Republican wins the White House, you’d expect it to flip. Let’s say that we get a Republican president and he continues some of these initiatives to make government perform better, which I expect to be the case. The Bush administration invested in e-government, and used the tools available to them at the time. The Obama administration picked it up, used the new tools available, and got better. President [X] could say this stuff works.”

democrats-more-upbeat-view-open-data

The unresolved question that we won’t know the answer to until well into 2017, if then, is whether today’s era of hyper-partisanship will change this historic pattern.

There’s bipartisan agreement on the need to use government data better in government. Democratss want to improve efficiency and effectiveness, Republicans want to do the same, but often in the context of demonstrating that programs or policies are ineffective and thereby shrink government. If the country can rise about partisan politics to innovate government, awareness of the utility of releases will grow, along with support for open data will grow.

“Many Americans are not much attuned to government data initiatives, which is why they think about them (in the attitudinal questions) through the lens of whether they trust government,” said Horrigan. “Even the positive part of the attitudinal questions (i.e., the data initiatives can improve accountability) has a dollop of concern, in that even the positive findings can be seen as people saying: ‘These government data initiatives might be good because they will shine more light on government – which really needs it because government doesn’t perform well enough.’ That is an opportunity of course – especially for intermediaries that might, through use of data, help the public understand how/whether government is being accountable to citizens.”

That opportunity is cause for hope.

“Whether it is ‘traditional’ online access for doing transactions/info searches with respect to government, or using mobile apps that rely on government data, people engage with government online, “said Horrigan. “That creates the opportunity for advocates of government data initiatives to draw citizens further down the path of understanding (and perhaps better appreciating) the possible impacts of such initiatives.”

17 million tax transcripts downloaded through IRS website, reducing offline requests by 40%

irs-transcriptAccording to a post on the White House blog, 17 million tax transcripts have been downloaded over the Internet since the feature launched in January 2014. The interesting outcome is that, according to the post, offline requests are down by 40%.

There was no clear return on the investment provided on what providing this online service saved taxpayers, but if we assume there are processing costs involved with sending transcripts through the mail and that, once online, the Internet service scales, that’s a good result, as is enabling instant electronic access to something that used to take 5-10 business days to arrive in print form.

Of note: it looks like Americans can expect more online services from the IRS in the near future, according to the the authors of the White House blog post, U.S. Deputy Chief Technology Officer Nick Sinai and Rajive Mathur, director of Online Services at the Internal Revenue Service:

“Building on the initial success of Get Transcript, there are more exciting improvements to IRS services in the pipeline. For instance, millions of taxpayers contact the IRS every year to ask about their tax status, whether their filing was received, if their refund was processed, or if their payment posted. In the future, taxpayers will be able to answer these types of questions independently by signing in to a mobile-friendly, personalized online account to conduct transactions and see all of their tax information in one place. Users will be able to view account history and balance, make payments or see payment status, or even authorize their tax preparer to view or make changes to their tax return. This will also include the ability to download personal tax information in an easy to use and machine-readable format so that taxpayers can share with trusted recipients if desired.”

Promising. I hope that the leadership of the IRS explores how the agency could act as a platform to enable more, much-needed innovation around personal data access and digital services in the years to come, enabling a modern ecosystem of tax software based on a standardized application programming interface.

Improving online self-service could have an enormous impact upon every single American taxpayer, from saving tax dollars on the government side to saving time and gray hairs year round in offices and kitchen tables. Per Sinai and Mathur, the IRS currently receives over 80 million phone calls per year, sends out almost 200 million paper notices every year, receives over 50 million unique visitors to its website each month during filing season.

More context and FAQ on how to download your tax transcript here.

Data journalism and the changing landscape for policy making in the age of networked transparency

This morning, I gave a short talk on data journalism and the changing landscape for policy making in the age of networked transparency at the Woodrow Wilson Center in DC, hosted by the Commons Lab.

Video from the event is online at the Wilson Center website. Unfortunately, I found that I didn’t edit my presentation down enough for my allotted time. I made it to slide 84 of 98 in 20 minutes and had to skip the 14 predictions and recommendations section. While many of the themes I describe in those 14 slides came out during the roundtable question and answer period, they’re worth resharing here, in the presentation I’ve embedded below:

[REPORT] On data journalism, democracy, open government and press freedom

On May 30, I gave a keynote talk on my research on the art and science of data journalism at the first Tow Center research conference at Columbia Journalism School in New York City. I’ve embedded the video below:

My presentation is embedded below, if you want to follow along or visit the sites and services I described.

Here’s an observation drawn from an extensive section on open government that should be of interest to readers of this blog:

“Proactive, selective open data initiatives by government focused on services that are not balanced by support for press freedoms and improved access can fairly be criticized as “openwashing” or “fauxpen government.”

Data journalists who are frequently faced with heavily redacted document releases or reams of blurry PDFs are particularly well placed to make those critiques.”

My contribution was only one part of the proceedings for “Quantifying Journalism: Metrics, Data and Computation,” which you can catch up through the Tow Center’s live blog or TechPresident’s coverage of measuring the impact of journalism.

[FAQ] How do I download a tax transcript from IRS.gov?

UPDATE: This service was taken offline after IRS security was compromised.

irs-transcriptIn January 2014, the IRS quietly introduced a new feature at IRS.gov that enabled Americans to download their tax transcript over the Internet. Previously, filers could request a copy of the transcript (not the full return) but had to wait 5-10 business days to receive it in the mail. For people who needed more rapid access for applications, the delay could be critical.

What’s a tax transcript?

It’s a list of the line items that you entered onto your federal tax return (Form 1040), as it was originally filed to the IRS.

Wait, we couldn’t already download a transcript like this in 2014?

Nope. Previously, filers could request a copy of the transcript (not the full return) but they would have to wait 5-10 business days to receive it in the mail.

Why did this happen now?

The introduction of the IRS feature coincided with a major Department of Education event focused on opening up such data. A U.S. Treasury official said that the administration was doing that to make it “easier for student borrowers to access tax records he or she might need to submit loan applications or grant applications.”

Why would someone want their tax transcript?

As the IRS itself says, “IRS transcripts are often used to validate income and tax filing status for mortgage applications, student and small business loan applications, and during tax preparation.” It’s pretty useful.

OK, so what do I do to download my transcript?

Visit “get transcript” and register online. You’ll find that the process is very similar to setting up online access for a bank accounts. You’ll need to choose a pass phrase, pass image and security questions, and then answer a series of questions about your life, like where you’ve lived. If you write them down, store them somewhere safe and secure offline, perhaps with your birth certificate and other sensitive documents.

Wait, what? That sounds like a lot of of private information.

True, but remember: the IRS already has a lot of private data about you. These questions are designed to prevent someone else from setting up a fake account on your behalf and stealing it from them. If you’re uncomfortable with answering these questions, you can request a print version of your transcript. To do so, you’ll need to enter your Social Security number, data of birth and street address online. If you’re still uncomfortable doing so, you can visit or contact the IRS in person.

So is this safe?

It’s probably about as safe as doing online banking. Virtually nothing you do online is without risk. Make sure you 1) go to the right website 2) connect securely and 3) protect the transcript, just as you would paper tax records. Here’s what the IRS told me about their online security:

“The IRS has made good progress on oversight and enhanced security controls in the area of information technology. With state-of-the-art technology as the foundation for our portal (e.g. irs.gov), we continue to focus on protecting the PII of all taxpayers when communicating with the IRS.

However, security is a two-way street with both the IRS and users needing to take steps for a secure experience. On our end, our security is comparable to leaders in private industry.

Our IRS2GO app has successfully completed a security assessment and received approval to launch by our cybersecurity organization after being scanned for weaknesses and vulnerabilities.

Any personally identifiable information (PII) or sensitive information transmitted to the IRS through IRS2Go for refund status or tax record requests uses secure communication channels that meet or exceed federal requirements for encryption. No PII is passed back to the taxpayer through IRS2GO and no PII is stored on the smartphone by the application.

When using our popular “Where’s My Refund?” application, taxpayers may notice just a few of our security measures. The URL for Where’s My Refund? begins with https. Just like in private industry, the “s” is a key indicator that a web user should notice indicating you are in a “secure session.” Taxpayers may also notice our message that we recommend they close their browser when finished accessing your refund status.

As we become a more mobile society and able to link to the internet while we’re on the go, we remind taxpayers to take precautions to protect themselves from being victimized, including using secure networks, firewalls, virus protection and other safeguards.

We always recommend taxpayers check with the Federal Trade Commission for the latest on reporting incidents of identity theft. You can find more information on our website, including tips if you believe you have become the victim of identity theft.”

What do I do with the transcript?

If you download tax transcripts or personal health information to a mobile device, laptop, tablet or desktop, install passcodes and full disk encryption, where available, on every machine its on. Leaving your files unprotected on computers connected to the Internet is like leaving the door to your house unlocked with your tax returns and medical records on the kitchen table.

I got an email from the IRS that asks me to email them personal information to access my transcript. Is this OK?

Nope! Don’t do it: it’s not them. The new functionality will likely inspire criminals to create mockups of the government website that look similar and then send phishing emails to consumers, urging them to “log in” to fake websites. You should know that IRS “does not send out unsolicited e-mails asking for personal information.” If you receive such an email, consider reporting the phishing to the IRS. Start at www.irs.gov/Individuals/Get-Transcript every time.

I tried to download my transcript but it didn’t work. What the heck?

You’re not alone. I had trouble using an Apple computer. Others have had technical issues as well.

Here’s what the IRS told me: “As a web application Get Transcript is supported on most modern OS/browser combinations. While there may be intermittent issues due to certain end-user configurations, IRS has not implemented any restrictions against certain browsers or operating systems. We are continuing to work open issues as they are identified and validated.”

A side note: For the best user experience, taxpayers may want to try up-to-date versions of Internet Explorer and a supported version of Microsoft Windows; however, that is certainly not a requirement.)”

What does that mean, in practice? That not all modern OS/browser combinations are supported, potentially including OS X and Android, that the IRS digital staff knows it — although they aren’t informing IRS.gov users regarding what versions of IE, Windows or other browsers/operating systems are presently supported and what is not — and are working to improve.

Unfortunately, ongoing security issues with Internet Explorer means that in 2014, we have the uncomfortable situation where the Department of Homeland Security is recommending that people avoid using Internet Explorer while the IRS recommends that its customers choose it for the “best experience.”

Given the comments from frustrated users, the IRS could and should do better on all counts.

Will I be able to file my tax return directly to the government through IRS.gov now?

You can already file your federal tax return online. According to the IRS, almost 120 million people used IRS e-file last year.

Well, OK, but shouldn’t having a user account and years of returns make it easier to file without a return at all?

It could. As you may know, other countries already have “return-free filing,” where a taxpayer can go online, login and access a pre-populated tax return, see what the government estimates her or she owes, make any necessary adjustments, and file.

Wait, that sounds pretty good. Why doesn’t the USA have return-free filing yet?

Yes, it does. As ProPublica reported last year, “the concept has been around for decades and has been endorsed by both President Ronald Reagan and a campaigning PresidentObama.”

As ProPublica reported last year, both H&R Block and Intuit, the maker of TurboTax, have lobbied against free and simple tax filing in Washington, given that it’s in their economic self-interest to do so:

In its latest annual report filed with the Securities and Exchange Commission, however, Intuit also says that free government tax preparation presents a risk to its business. Roughly 25 million Americans used TurboTax last year, and a recent GAO analysis said the software accounted for more than half of individual returns filed electronically. TurboTax products and services made up 35 percent of Intuit’s $4.2 billion in total revenues last year. Versions of TurboTax for individuals and small businesses range inprice from free to $150.

What are the chances return-free filing could be on IRS.gov soon?

Hard to say, but the IRS told me that something that sounds like a precursor to return-free filing is on the table.  According to the agency, “the IRS is considering a number of new proposals that may become a part of the online services roadmap some time in the future. This may include a taxpayer account where up to date status could be securely reviewed by the account owner.”

Creating the ability for people to establish secure access to IRS.gov to review and download tax transcripts is a big step in that direction. Whether the IRS takes any more steps  soon is more of a political and policy question than a technical one, although the details of the latter matter.  

Is the federal government offering other services like this for other agencies or personal data?

The Obama administration has been steadily modernizing government technology, although progress has been uneven across agencies. While the woes of Healthcare.gov attracted a lot of attention, many federal agencies have improved how they deliver services over the Internet. One of the themes of the administration’s digital government approach is “smart disclosure,” a form of targeted transparency in which people are offered the opportunity to download their own data, or data about them, from government or commercial services. The Blue Button is an example of this approach that has the potential to scale nationally.

PCAST report on big data and privacy emphasizes value of encryption, need for policy

pcast-4-4-2014 (1)
April 4, 2014 meeting of PCAST at National Academy of Sciences

This week, the President’s Council of Advisors on Science and Technology (PCAST) met to discuss and vote to approve a new report on big data and privacy.

UPDATE: The White House published the findings of its review on big data today, including the PCAST review of technologies underpinning big data (PDF), discussed below.

As White House special advisor John Podesta noted in January, the PCAST has been conducting a study “to explore in-depth the technological dimensions of the intersection of big data and privacy.” Earlier this week, the Associated Press interviewed Podesta about the results of the review, reporting that the White House had learned of the potential for discrimination through the use of data aggregation and analysis. These are precisely the privacy concerns that stem from data collection that I wrote about earlier this spring. Here’s the PCAST’s list of “things happening today or very soon” that provide examples of technologies that can have benefits but pose privacy risks:

 Pioneered more than a decade ago, devices mounted on utility poles are able to sense the radio stations
being listened to by passing drivers, with the results sold to advertisers.26
 In 2011, automatic license‐plate readers were in use by three quarters of local police departments
surveyed.  Within 5 years, 25% of departments expect to have them installed on all patrol cars, alerting
police when a vehicle associated with an outstanding warrant is in view.27  Meanwhile, civilian uses of
license‐plate readers are emerging, leveraging cloud platforms and promising multiple ways of using the
information collected.28
 Experts at the Massachusetts Institute of Technology and the Cambridge Police Department have used a
machine‐learning algorithm to identify which burglaries likely were committed by the same offender,
thus aiding police investigators.29
 Differential pricing (offering different prices to different customers for essentially the same goods) has
become familiar in domains such as airline tickets and college costs.  Big data may increase the power
and prevalence of this practice and may also decrease even further its transparency.30
 reSpace offers machine‐learning algorithms to the gaming industry that may detect
early signs of gambling addiction or other aberrant behavior among online players.31
 Retailers like CVS and AutoZone analyze their customers’ shopping patterns to improve the layout of
their stores and stock the products their customers want in a particular location.32  By tracking cell
phones, RetailNext offers bricks‐and‐mortar retailers the chance to recognize returning customers, just
as cookies allow them to be recognized by on‐line merchants.33  Similar WiFi tracking technology could
detect how many people are in a closed room (and in some cases their identities).
 The retailer Target inferred that a teenage customer was pregnant and, by mailing her coupons
intended to be useful, unintentionally disclosed this fact to her father.34
 The author of an anonymous book, magazine article, or web posting is frequently “outed” by informal
crowd sourcing, fueled by the natural curiosity of many unrelated individuals.35
 Social media and public sources of records make it easy for anyone to infer the network of friends and
associates of most people who are active on the web, and many who are not.36
 Marist College in Poughkeepsie, New York, uses predictive modeling to identify college students who are
at risk of dropping out, allowing it to target additional support to those in need.37
 The Durkheim Project, funded by the U.S. Department of Defense, analyzes social‐media behavior to
detect early signs of suicidal thoughts among veterans.38
 LendUp, a California‐based startup, sought to use nontraditional data sources such as social media to
provide credit to underserved individuals.  Because of the challenges in ensuring accuracy and fairness,
however, they have been unable to proceed.

The PCAST meeting was open to the public through a teleconference line. I called in and took rough notes on the discussion of the forthcoming report as it progressed. My notes on the comments of professors Susan Graham and Bill Press offer sufficient insight and into the forthcoming report, however, that I thought the public value of publishing them was warranted today, given the ongoing national debate regarding data collection, analysis, privacy and surveillance. The following should not be considered verbatim or an official transcript. The emphases below are mine, as are the words of [brackets]. For that, look for the PCAST to make a recording and transcript available online in the future, at its archive of past meetings.


 

graham-sSusan Graham: Our charge was to look at confluence of big data and privacy, to summarize current tech and the way technology is moving in foreseeable future, including its influence the way we think about privacy.

The first thing that’s very very obvious is that personal data in electronic form is pervasive. Traditional data that was in health and financial [paper] records is now electronic and online. Users provide info about themselves in exchange for various services. They use Web browsers and share their interests. They provide information via social media, Facebook, LinkedIn, Twitter. There is [also] data collected that is invisible, from public cameras, microphones, and sensors.

What is unusual about this environment and big data is the ability to do analysis in huge corpuses of that data. We can learn things from the data that allow us to provide a lot of societal benefits. There is an enormous amount of patient data, data about about disease, and data about genetics. By putting it together, we can learn about treatment. With enough data, we can look at rare diseases, and learn what has been effective. We could not have done this otherwise.

We can analyze more online information about education and learning, not only MOOCs but lots of learning environments. [Analysis] can tell teachers how to present material effectively, to do comparisons about whether one presentation of information works better than another, or analyze how well assessments work with learning styles.
Certain visual information is comprehensible, certain verbal information is hard to understand. Understanding different learning styles [can enable] develop customized teaching.

The reason this all works is the profound nature of analysis. This is the idea of data fusion, where you take multiple sources of information, combine them, which provides much richer picture of some phenomenon. If you look at patterns of human movements on public transport, or pollution measures, or weather, maybe we can predict dynamics caused by human context.

We can use statistics to do statistics-based pattern recognition on large amounts of data. One of the things that we understand about this statistics-based approach is that it might not be 100% accurate if map down to the individual providing data in these patterns. We have to very careful not to make mistakes about individuals because we make [an inference] about a population.

How do we think about privacy? We looked at it from the point of view of harms. There are a variety of ways in which results of big data can create harm, including inappropriate disclosures [of personal information], potential discrimination against groups, classes, or individuals, and embarrassment to individuals or groups.

We turned to what tech has to offer in helping to reduce harms. We looked at a number of technologies in use now. We looked at a bunch coming down the pike. We looked at several tech in use, some of which become less effective because of pervasivesness [of data] and depth of analytics.

We traditionally have controlled [data] collection. We have seen some data collection from cameras and sensors that people don’t know about. If you don’t know, it’s hard to control.

Tech creates many concerns. We have looked at methods coming down the pike. Some are more robust and responsive. We have a number of draft recommendations that we are still working out.

Part of privacy is protecting the data using security methods. That needs to continue. It needs to be used routinely. Security is not the same as privacy, though security helps to protect privacy. There are a number of approaches that are now used by hand that with sufficient research could be automated could be used more reliably, so they scale.

There needs to be more research and education about education about privacy. Professionals need to understand how to treat privacy concerns anytime they deal with personal data. We need to create a large group of professionals who understand privacy, and privacy concerns, in tech.

Technology alone cannot reduce privacy risks. There has to be a policy as well. It was not our role to say what that policy should be. We need to lead by example by using good privacy protecting practices in what the government does and increasingly what the private sector does.

pressBill Press: We tried throughout to think of scenarios and examples. There’s a whole chapter [in the report] devoted explicitly to that.

They range from things being done today, present technology, even though they are not all known to people, to our extrapolations to the outer limits, of what might well happen in next ten years. We tried to balance examples by showing both benefits, they’re great, and they raise challenges, they raise the possibility of new privacy issues.

In another aspect, in Chapter 3, we tried to survey technologies from both sides, with both tech going to bring benefits, those that will protect [people], and also those that will raise concerns.

In our technology survey, we were very much helped by the team at the National Science Foundation. They provided a very clear, detailed outline of where they thought that technology was going.

This was part of our outreach to a large number of experts and members of the public. That doesn’t mean that they agree with our conclusions.

Eric Lander: Can you take everybody through analysis of encryption? Are people using much more? What are the limits?

Graham: The idea behind classical encryption is that when data is stored, when it’s sitting around in a database, let’s say, encryption entangles the representation of the data so that it can’t be read without using a mathematical algorithm and a key to convert a seemingly set of meaningless set of bits into something reasonable.

The same technology, where you convert and change meaningless bits, is used when you send data from one place to another. So, if someone is scanning traffic on internet, you can’t read it. Over the years, we’ve developed pretty robust ways of doing encryption.

The weak link is that to use data, you have to read it, and it becomes unencrypted. Security technologists worry about it being read in the short time.

Encryption technology is vulnerable. The key that unlocks the data is itself vulnerable to theft or getting the wrong user to decrypt.

Both problems of encryption are active topics of research on how to use data without being able to read it. There research on increasingly robustness of encryption, so if a key is disclosed, you haven’t lost everything and you can protect some of data or future encryption of new data. This reduces risk a great deal and is important to use. Encryption alone doesn’t protect.

Unknown Speaker: People read of breaches derived from security. I see a different set of issues of privacy from big data vs those in security. Can you distinguish them?

Bill Press: Privacy and security are different issues. Security is necessary to have good privacy in the technological sense if communications are insecure, they clearly can’t be private. This goes beyond, to where parties that are authorized, in a security sense, to see the information. Privacy is much closer to values. security is much closer to protocols.

Interesting thing is that this is less about purely tech elements — everyone can agree on right protocol, eventually. These things that go beyond and have to do with values.

With major pharmacies on board, is the Blue Button about to scale nation-wide?

blue_button_for_homepage1The Obama administration announced significant adoption for the Blue Button in the private sector today. In a post at the White House Office of Science and Technology blog, Nick Sinai, U.S. deputy chief technology officer and Adam Dole, a Presidential Innovation Fellow at the U.S. Department of Health and Human Services, listed major pharmacies and retailers joining the Blue Button initiative, which enables people to download a personal health record in an open, machine-readable electronic format:

“These commitments from some of the Nation’s largest retail pharmacy chains and associations promise to provide a growing number of patients with easy and secure access to their own personal pharmacy prescription history and allow them to check their medication history for accuracy, access prescription lists from multiple doctors, and securely share this information with their healthcare providers,” they wrote.

“As companies move towards standard formats and the ability to securely transmit this information electronically, Americans will be able to use their pharmacy records with new innovative software applications and services that can improve medication adherence, reduce dosing errors, prevent adverse drug interactions, and save lives. ”

While I referred to the Blue Button obliquely at ReadWrite almost two years ago and in many other stories, I can’t help but wish that I’d finished my feature for Radar a year ago and written up a full analytical report. Extending access to a downloadable personal health record to millions of Americans has been an important, steadily shift that has largely gone unappreciated, despite reporting like Ina Fried’s regarding veterans getting downloadable health information.  According to the Office of the National Coordinator for Health IT, “more than 5.4 million veterans have now downloaded their Blue Button data and more than 500 companies and organizations in the private-sector have pledged to support it.

As I’ve said before, data standards are the railway gauges of the 21st century. When they’re agreed upon and built out, remarkable things can happen. This is one of those public-private initiatives that has taken years to take fruit that stands to substantially improve the lives of so many people. This one started with something simple, when the administration gave military veterans the ability to download their own health records using from on MyMedicare.gov and MyHealthyVet and scaled progressively to Medicare recipients and then Aetna and other players from there.

There have been bumps and bruises along with the way, from issues with the standard to concerns about lost devices, but this news of adoption by places like CVS suggests the Blue Button is about to go mainstream in a big way. According to the White House, “more than 150 million Americans today are able to use Blue Button-enabled tools to access their own health information from a variety of sources including healthcare providers, health insurance companies, medical labs, and state health information networks.”

Notably, HHS has ruled that doctors and clinics that implement the new “BlueButton+” specification will be meeting the requirements of “View, Download, and Transmit (V/D/T)” in Meaningful Use Stage 2 for electronic health records under the HITECH Act, meaning they can apply for reimbursement. According to ONC, that MU program currently includes half of eligible physicians and more than 80 percent of hospitals in the United States. With that carrot, many more Americans should expect to see a Blue Button in the doctor’s office soon.

In the video below, U.S. chief technology officer Todd Park speaks with me about the Blue Button and the work of Dole and other presidential innovation fellows on the project.

Opening IRS e-file data would add innovation and transparency to $1.6 trillion U.S. nonprofit sector

One of the most important open government data efforts in United States history came into being in 1993, when citizen archivist Carl Malamud used a small planning grant from the National Science Foundation to license data from the Securities and Exchange Commission, published the SEC data on the Internet and then operated it for two years. At the end of the grant, the SEC decided to make the EDGAR data available itself — albeit not without some significant prodding — and has continued to do so ever since. You can read the history behind putting periodic reports of public corporations online at Malamud’s website, public.resource.org.

Meals-on-Wheels-Reports

Two decades later, Malamud is working to make the law public, reform copyright, and free up government data again, buying, processing and publishing millions of public tax filings from nonprofits to the Internal Revenue Service. He has made the bulk data from these efforts available to the public and anyone else who wants to use it.

“This is exactly analogous to the SEC and the EDGAR database,” Malamud told me, in an phone interview last year. The trouble is that data has been deliberately dumbed down, he said. “If you make the data available, you will get innovation.”

Making millions of Form 990 returns free online is not a minor public service. Despite many nonprofits file their Form 990s electronically, the IRS does not publish the data. Rather, the government agency releases images of millions of returns formatted as .TIFF files onto multiple DVDs to people and companies willing and able to pay thousands of dollars for them. Services like Guidestar, for instance, acquire the data, convert it to PDFs and use it to provide information about nonprofits. (Registered users view the returns on their website.)

As Sam Roudman reported at TechPresident, Luke Rosiak, a senior watchdog reporter for the Washington Examiner, took the files Malamud published and made them more useful. Specifically, he used credits for processing that Amazon donated to participants in the 2013 National Day of Civic Hacking to make the .TIFF files text-searchable. Rosiak then set up CItizenAudit.org a new website that makes nonprofit transparency easy.

“This is useful information to track lobbying,” Malamud told me. “A state attorney general could just search for all nonprofits that received funds from a donor.”

Malamud estimates nearly 9% of jobs in the U.S. are in this sector. “This is an issue of capital allocation and market efficiency,” he said. “Who are the most efficient players? This is more than a CEO making too much money — it’s about ensuring that investments in nonprofits get a return.

Malamud’s open data is acting as a platform for innovation, much as legislation.gov.uk is the United Kingdom. The difference is that it’s the effort of a citizen that’s providing the open data, not the agency: Form 990 data is not on Data.gov.

Opening Form 990 data should be a no-brainer for an Obama administration that has taken historic steps to open government dataLiberating nonprofit sector data would provide useful transparency into a $1.6 trillion dollar sector for the U.S. economy.

After many letters to the White House and discussions with the IRS, however, Malamud filed suit against the IRS to release Form 990 data online this summer.

“I think inertia is behind the delay,” he told me, in our interview. “These are not the expense accounts of government employees. This is something much more fundamental about a $1.6 trillion dollar marketplace. It’s not about who gave money to a politician.”

When asked for comment, a spokesperson for the White House Office of Management and Budget said that the IRS “has been engaging on this topic with interested stakeholders” and that “the Administration’s Fiscal Year 2014 revenue proposals would let the IRS receive all Form 990 information electronically, allowing us to make all such data available in machine readable format.”

Today, Malamud sent a letter of complaint to Howard Shelanski, administrator of the Office of Information and Regulatory Affairs in the White House Office of Management and Budget, asking for a review of the pricing policies of the IRS after a significant increase year-over-year. Specifically, Malamud wrote that the IRS is violating the requirements of President Obama’s executive order on open data:

The current method of distribution is a clear violation of the President’s instructions to
move towards more open data formats, including the requirements of the May 9, 2013
Executive Order making “open and machine readable the new default for government
information.”

I believe the current pricing policies do not make any sense for a government
information dissemination service in this century, hence my request for your review.
There are also significant additional issues that the IRS refuses to address, including
substantial privacy problems with their database and a flat-our refusal to even
consider release of the Form 990 E-File data, a format that would greatly increase the
transparency and effectiveness of our non-profit marketplace and is required by law.

It’s not clear at all whether the continued pressure from Malamud, the obvious utility of CitizenAudit.org or the bipartisan budget deal that President Obama signed in December will push the IRS to freely release open government data about the nonprofit sector,

The furor last summer over the IRS investigating the status of conservative groups claimed tax-exempt status, however, could carry over into political pressure to reform. If political groups were tax-exempt and nonprofit e-file data were published about them, it would be possible for auditors, journalists and Congressional investigators to detect patterns. The IRS would need to be careful about scrubbing the data of personal information: last year, the IRS mistakenly exposed thousands of Social Security numbers when it posted 527 forms online — an issue that Malamud, as it turns out, discovered in an audit.

“This data is up there with EDGAR, in terms of its potential,” said Malamud. “There are lots of databases. Few are as vital to government at large. This is not just about jobs. It’s like not releasing patent data.”

If the IRS were to modernize its audit system, inspector generals could use automated predictive data analysis to find aberrations to flag for a human to examine, enabling government watchdogs and investigative journalists to potentially detect similar issues much earlier.

That level of data-driven transparency remains in the future. In the meantime, CitizenAudit.org is currently running on a server in Rosiak’s apartment.

Whether the IRS adopts it as the SEC did EDGAR remains to be seen.

[Image Credit: Meals on Wheels]