15 key insights from the Pew Internet and Life Project on the American public, open data and open government

Today, a new survey released by the Pew Research Internet and Life Project provided one of the most comprehensive snapshots into the attitudes of the American public towards open data and open government to date. In general, more people surveyed are guardedly optimistic about the outcomes and release of open data, although that belief does vary with their political views, trust in government, and specific areas.  (Full disclosure: I was consulted by Pew researchers regarding useful survey questions to pose.)

mixed-hopes-open-data-improve-pew

“Trust in government is the reference that people bring to their answers on open government and open data,” said John Horrigan, the principal researcher on the survey, in an interview. “That’s the frame of reference people bring. A lot of people still aren’t familiar with the notion, and because they don’t have a framework about open data, trust dominates, and you get the response that we got.”

more-trust-gov-more-benefits-open-data

While majorities of the American public use applications and services that use government data, from GPS to weather to transit to health apps, relatively few are aware that data produced and released by government drives them.

“The challenge for activists or advocates in this space will be to try to make the link between government data and service delivery outcomes,” said Horrigan. “If the goals are to make government perform better and maybe reverse the historic tide of lowered trust, then the goal is to make improvements real in delivery. If this is framed just as argument over data quality, it would go into an irresolvable back and forth into the quality of government data collection. If you can cast it beyond whether unemployment statistics are correct or not but instead of how government services improve or saved money, you have a chance of speaking to wether government data makes things better.”

The public knowledge gap regarding this connection is one of the most important points that proponents, advocates, journalists and publishers who wish to see funding for open data initiatives be maintained or Freedom of Information Act reforms pass.

“I think a key implication of the findings is that – if advocates of government data initiatives hope that data will improve people’s views about government’s efficacy – efforts by intermediaries or governments to tie the open data/open government to the government’s collection of data may be worthwhile,” said Horrigan. “Such public awareness efforts might introduce a new “mental model” for the public about what these initiatives are all about. Right now, at least as the data for this report suggests, people do not have a clear sense of government data initiatives. And that means the context for how they think about them has a lot to do with their baseline level of trust in the government – particularly the federal government.”

Horrigan suggested thinking about this using a metaphor familiar to anyone who’s attended a middle school dance.

“Because people do engage with the government online, just through services, it’s like getting them on a big dance floor,” he suggested. “They’re on the floor, where you want them, but they’re on the other part of it. They don’t know that there’s another part of the dance that they’d like to see or be drawn to that they’d want to be in. There’s an opportunity to draw them. The good news that they’re on the dance floor, the bad news is they don’t know about all of it. Someone might want to go over and talk to them an explain that if you go over here you might have a better experience.”

Following are 13 more key insights about the public’s views regarding the Internet, open data and government. For more, make sure to read the full report on open government data, which is full of useful discussion of its findings.

One additional worth noting before you dive in: this survey is representative of American adults, not just the attitudes of people who are online. “The Americans Trends Panel was recruited to be nationally representative, and is weighted in such a way (as nearly all surveys are) to ensure responses reflect the general population,” said Horrigan. “The overall rate of internet use is a bit higher than we typically record, but within the margin of error. So we are comfortable that the sample is representative of the general population.”

Growing number of Americans adults are using the Internet to get information and data

While Pew cautions that the questions posed in this survey are different from another conducted in 2010, the trend is clear: the way citizens communicate with government now includes the Internet, and the way government communicates with citizens increasingly includes digital channels. That use now includes getting information or data about federal, state and local government.

internet-use-info-government-pew

College-educated Americans and millennials are more hopeful about open data releases

college-grad-millennials-more-hopeful-open-data-pew

Despite disparities in trust and belief in outcomes, there is no difference in online activities between members of political parties

no-difference-trust-parties-in-online-activities

Wealthier Americans are comfortable with open data about real estate transactions but not individual mortgages

wealthy-comfortable-transaction-data-not-mortgages-pew

This attitude is generally true across all income levels.

people-comfortable-data-sharing-not-mortgage-pew

College graduates, millennials and higher-income adults are more likely to use data to monitor government performance

About a third of college grads, young people and wealthy Americans have checked out performance data or government contracting data, or about 50% more than other age groups, lower income or non-college grads.

college-grads-higher-income-monitor-performance-pew

The ways American adults interact with government services and data digitally are expanding

ways-people-interact-government-info-pew

But very few American adults think government data sharing is currently very effective:

few-think-govt-data-sharing-effective-pew

A small minority of Americans, however, have a great deal of trust in federal government at all:

majorities-low-trust-government-pew

In fact, increasing individual use of data isn’t necessarily correlated with belief in positive outcomes:

Pew grouped the 3,212 respondents into four quadrants, seen below, with a vertical axis ranging from optimism to skepticism and a horizontal axis that described use. Notably, more use of data doesn’t correlate to more belief in positive outcomes.

“In my mind, you have to get to the part of the story where you show government ran better as a result,” said Horrigan. “You have to get to a position where these stories are being told. Then, at least, while you’re opening up new possibilities for cynicism or skepticism, you’re at least focused on the data as opposed to trust in government.”

quadrants-belief-open-data

tech-profiles-quadrants-open-data-pew

Instead…

Belief in positive outcomes from the release of open data is correlated with a belief that your voice matters in this republic:

belief-voice-matters-believe-open-data-improves-pew

If you trust the federal government, you’re more likely to see the benefit in open data:

trust-fed-govt-more-likely-to-see-benefit

But belief in positive outcomes from the release of open data is related to political party affiliation:

democrats-more-likely-to-believe-open-data-pay-off-pew

Put simply, Democrats trust the federal government more, and that relates to how people feel about open data released by that government.

democrats-trust-fed-govt-more

 

Political party has an impact upon the view of open data in the federal government

One challenge is that if President Barack Obama says “open data” again, he may further associate the release of government data with Democratic policies, despite bipartisan support for open government data in Congress. If a Republican is elected President in November 2016, however, this particular attitude may well shift.

“That’s definitely the historic pattern, tracked over time, dating to 1958,” said Horrigan, citing a Pew study. “If if holds and a Republican wins the White House, you’d expect it to flip. Let’s say that we get a Republican president and he continues some of these initiatives to make government perform better, which I expect to be the case. The Bush administration invested in e-government, and used the tools available to them at the time. The Obama administration picked it up, used the new tools available, and got better. President [X] could say this stuff works.”

democrats-more-upbeat-view-open-data

The unresolved question that we won’t know the answer to until well into 2017, if then, is whether today’s era of hyper-partisanship will change this historic pattern.

There’s bipartisan agreement on the need to use government data better in government. Democratss want to improve efficiency and effectiveness, Republicans want to do the same, but often in the context of demonstrating that programs or policies are ineffective and thereby shrink government. If the country can rise about partisan politics to innovate government, awareness of the utility of releases will grow, along with support for open data will grow.

“Many Americans are not much attuned to government data initiatives, which is why they think about them (in the attitudinal questions) through the lens of whether they trust government,” said Horrigan. “Even the positive part of the attitudinal questions (i.e., the data initiatives can improve accountability) has a dollop of concern, in that even the positive findings can be seen as people saying: ‘These government data initiatives might be good because they will shine more light on government – which really needs it because government doesn’t perform well enough.’ That is an opportunity of course – especially for intermediaries that might, through use of data, help the public understand how/whether government is being accountable to citizens.”

That opportunity is cause for hope.

“Whether it is ‘traditional’ online access for doing transactions/info searches with respect to government, or using mobile apps that rely on government data, people engage with government online, “said Horrigan. “That creates the opportunity for advocates of government data initiatives to draw citizens further down the path of understanding (and perhaps better appreciating) the possible impacts of such initiatives.”

17 million tax transcripts downloaded through IRS website, reducing offline requests by 40%

irs-transcriptAccording to a post on the White House blog, 17 million tax transcripts have been downloaded over the Internet since the feature launched in January 2014. The interesting outcome is that, according to the post, offline requests are down by 40%.

There was no clear return on the investment provided on what providing this online service saved taxpayers, but if we assume there are processing costs involved with sending transcripts through the mail and that, once online, the Internet service scales, that’s a good result, as is enabling instant electronic access to something that used to take 5-10 business days to arrive in print form.

Of note: it looks like Americans can expect more online services from the IRS in the near future, according to the the authors of the White House blog post, U.S. Deputy Chief Technology Officer Nick Sinai and Rajive Mathur, director of Online Services at the Internal Revenue Service:

“Building on the initial success of Get Transcript, there are more exciting improvements to IRS services in the pipeline. For instance, millions of taxpayers contact the IRS every year to ask about their tax status, whether their filing was received, if their refund was processed, or if their payment posted. In the future, taxpayers will be able to answer these types of questions independently by signing in to a mobile-friendly, personalized online account to conduct transactions and see all of their tax information in one place. Users will be able to view account history and balance, make payments or see payment status, or even authorize their tax preparer to view or make changes to their tax return. This will also include the ability to download personal tax information in an easy to use and machine-readable format so that taxpayers can share with trusted recipients if desired.”

Promising. I hope that the leadership of the IRS explores how the agency could act as a platform to enable more, much-needed innovation around personal data access and digital services in the years to come, enabling a modern ecosystem of tax software based on a standardized application programming interface.

Improving online self-service could have an enormous impact upon every single American taxpayer, from saving tax dollars on the government side to saving time and gray hairs year round in offices and kitchen tables. Per Sinai and Mathur, the IRS currently receives over 80 million phone calls per year, sends out almost 200 million paper notices every year, receives over 50 million unique visitors to its website each month during filing season.

More context and FAQ on how to download your tax transcript here.

Data journalism and the changing landscape for policy making in the age of networked transparency

This morning, I gave a short talk on data journalism and the changing landscape for policy making in the age of networked transparency at the Woodrow Wilson Center in DC, hosted by the Commons Lab.

Video from the event is online at the Wilson Center website. Unfortunately, I found that I didn’t edit my presentation down enough for my allotted time. I made it to slide 84 of 98 in 20 minutes and had to skip the 14 predictions and recommendations section. While many of the themes I describe in those 14 slides came out during the roundtable question and answer period, they’re worth resharing here, in the presentation I’ve embedded below:

[REPORT] On data journalism, democracy, open government and press freedom

On May 30, I gave a keynote talk on my research on the art and science of data journalism at the first Tow Center research conference at Columbia Journalism School in New York City. I’ve embedded the video below:

My presentation is embedded below, if you want to follow along or visit the sites and services I described.

Here’s an observation drawn from an extensive section on open government that should be of interest to readers of this blog:

“Proactive, selective open data initiatives by government focused on services that are not balanced by support for press freedoms and improved access can fairly be criticized as “openwashing” or “fauxpen government.”

Data journalists who are frequently faced with heavily redacted document releases or reams of blurry PDFs are particularly well placed to make those critiques.”

My contribution was only one part of the proceedings for “Quantifying Journalism: Metrics, Data and Computation,” which you can catch up through the Tow Center’s live blog or TechPresident’s coverage of measuring the impact of journalism.

[FAQ] How do I download a tax transcript from IRS.gov?

UPDATE: This service was taken offline after IRS security was compromised.

irs-transcriptIn January 2014, the IRS quietly introduced a new feature at IRS.gov that enabled Americans to download their tax transcript over the Internet. Previously, filers could request a copy of the transcript (not the full return) but had to wait 5-10 business days to receive it in the mail. For people who needed more rapid access for applications, the delay could be critical.

What’s a tax transcript?

It’s a list of the line items that you entered onto your federal tax return (Form 1040), as it was originally filed to the IRS.

Wait, we couldn’t already download a transcript like this in 2014?

Nope. Previously, filers could request a copy of the transcript (not the full return) but they would have to wait 5-10 business days to receive it in the mail.

Why did this happen now?

The introduction of the IRS feature coincided with a major Department of Education event focused on opening up such data. A U.S. Treasury official said that the administration was doing that to make it “easier for student borrowers to access tax records he or she might need to submit loan applications or grant applications.”

Why would someone want their tax transcript?

As the IRS itself says, “IRS transcripts are often used to validate income and tax filing status for mortgage applications, student and small business loan applications, and during tax preparation.” It’s pretty useful.

OK, so what do I do to download my transcript?

Visit “get transcript” and register online. You’ll find that the process is very similar to setting up online access for a bank accounts. You’ll need to choose a pass phrase, pass image and security questions, and then answer a series of questions about your life, like where you’ve lived. If you write them down, store them somewhere safe and secure offline, perhaps with your birth certificate and other sensitive documents.

Wait, what? That sounds like a lot of of private information.

True, but remember: the IRS already has a lot of private data about you. These questions are designed to prevent someone else from setting up a fake account on your behalf and stealing it from them. If you’re uncomfortable with answering these questions, you can request a print version of your transcript. To do so, you’ll need to enter your Social Security number, data of birth and street address online. If you’re still uncomfortable doing so, you can visit or contact the IRS in person.

So is this safe?

It’s probably about as safe as doing online banking. Virtually nothing you do online is without risk. Make sure you 1) go to the right website 2) connect securely and 3) protect the transcript, just as you would paper tax records. Here’s what the IRS told me about their online security:

“The IRS has made good progress on oversight and enhanced security controls in the area of information technology. With state-of-the-art technology as the foundation for our portal (e.g. irs.gov), we continue to focus on protecting the PII of all taxpayers when communicating with the IRS.

However, security is a two-way street with both the IRS and users needing to take steps for a secure experience. On our end, our security is comparable to leaders in private industry.

Our IRS2GO app has successfully completed a security assessment and received approval to launch by our cybersecurity organization after being scanned for weaknesses and vulnerabilities.

Any personally identifiable information (PII) or sensitive information transmitted to the IRS through IRS2Go for refund status or tax record requests uses secure communication channels that meet or exceed federal requirements for encryption. No PII is passed back to the taxpayer through IRS2GO and no PII is stored on the smartphone by the application.

When using our popular “Where’s My Refund?” application, taxpayers may notice just a few of our security measures. The URL for Where’s My Refund? begins with https. Just like in private industry, the “s” is a key indicator that a web user should notice indicating you are in a “secure session.” Taxpayers may also notice our message that we recommend they close their browser when finished accessing your refund status.

As we become a more mobile society and able to link to the internet while we’re on the go, we remind taxpayers to take precautions to protect themselves from being victimized, including using secure networks, firewalls, virus protection and other safeguards.

We always recommend taxpayers check with the Federal Trade Commission for the latest on reporting incidents of identity theft. You can find more information on our website, including tips if you believe you have become the victim of identity theft.”

What do I do with the transcript?

If you download tax transcripts or personal health information to a mobile device, laptop, tablet or desktop, install passcodes and full disk encryption, where available, on every machine its on. Leaving your files unprotected on computers connected to the Internet is like leaving the door to your house unlocked with your tax returns and medical records on the kitchen table.

I got an email from the IRS that asks me to email them personal information to access my transcript. Is this OK?

Nope! Don’t do it: it’s not them. The new functionality will likely inspire criminals to create mockups of the government website that look similar and then send phishing emails to consumers, urging them to “log in” to fake websites. You should know that IRS “does not send out unsolicited e-mails asking for personal information.” If you receive such an email, consider reporting the phishing to the IRS. Start at www.irs.gov/Individuals/Get-Transcript every time.

I tried to download my transcript but it didn’t work. What the heck?

You’re not alone. I had trouble using an Apple computer. Others have had technical issues as well.

Here’s what the IRS told me: “As a web application Get Transcript is supported on most modern OS/browser combinations. While there may be intermittent issues due to certain end-user configurations, IRS has not implemented any restrictions against certain browsers or operating systems. We are continuing to work open issues as they are identified and validated.”

A side note: For the best user experience, taxpayers may want to try up-to-date versions of Internet Explorer and a supported version of Microsoft Windows; however, that is certainly not a requirement.)”

What does that mean, in practice? That not all modern OS/browser combinations are supported, potentially including OS X and Android, that the IRS digital staff knows it — although they aren’t informing IRS.gov users regarding what versions of IE, Windows or other browsers/operating systems are presently supported and what is not — and are working to improve.

Unfortunately, ongoing security issues with Internet Explorer means that in 2014, we have the uncomfortable situation where the Department of Homeland Security is recommending that people avoid using Internet Explorer while the IRS recommends that its customers choose it for the “best experience.”

Given the comments from frustrated users, the IRS could and should do better on all counts.

Will I be able to file my tax return directly to the government through IRS.gov now?

You can already file your federal tax return online. According to the IRS, almost 120 million people used IRS e-file last year.

Well, OK, but shouldn’t having a user account and years of returns make it easier to file without a return at all?

It could. As you may know, other countries already have “return-free filing,” where a taxpayer can go online, login and access a pre-populated tax return, see what the government estimates her or she owes, make any necessary adjustments, and file.

Wait, that sounds pretty good. Why doesn’t the USA have return-free filing yet?

Yes, it does. As ProPublica reported last year, “the concept has been around for decades and has been endorsed by both President Ronald Reagan and a campaigning PresidentObama.”

As ProPublica reported last year, both H&R Block and Intuit, the maker of TurboTax, have lobbied against free and simple tax filing in Washington, given that it’s in their economic self-interest to do so:

In its latest annual report filed with the Securities and Exchange Commission, however, Intuit also says that free government tax preparation presents a risk to its business. Roughly 25 million Americans used TurboTax last year, and a recent GAO analysis said the software accounted for more than half of individual returns filed electronically. TurboTax products and services made up 35 percent of Intuit’s $4.2 billion in total revenues last year. Versions of TurboTax for individuals and small businesses range inprice from free to $150.

What are the chances return-free filing could be on IRS.gov soon?

Hard to say, but the IRS told me that something that sounds like a precursor to return-free filing is on the table.  According to the agency, “the IRS is considering a number of new proposals that may become a part of the online services roadmap some time in the future. This may include a taxpayer account where up to date status could be securely reviewed by the account owner.”

Creating the ability for people to establish secure access to IRS.gov to review and download tax transcripts is a big step in that direction. Whether the IRS takes any more steps  soon is more of a political and policy question than a technical one, although the details of the latter matter.  

Is the federal government offering other services like this for other agencies or personal data?

The Obama administration has been steadily modernizing government technology, although progress has been uneven across agencies. While the woes of Healthcare.gov attracted a lot of attention, many federal agencies have improved how they deliver services over the Internet. One of the themes of the administration’s digital government approach is “smart disclosure,” a form of targeted transparency in which people are offered the opportunity to download their own data, or data about them, from government or commercial services. The Blue Button is an example of this approach that has the potential to scale nationally.

PCAST report on big data and privacy emphasizes value of encryption, need for policy

pcast-4-4-2014 (1)
April 4, 2014 meeting of PCAST at National Academy of Sciences

This week, the President’s Council of Advisors on Science and Technology (PCAST) met to discuss and vote to approve a new report on big data and privacy.

UPDATE: The White House published the findings of its review on big data today, including the PCAST review of technologies underpinning big data (PDF), discussed below.

As White House special advisor John Podesta noted in January, the PCAST has been conducting a study “to explore in-depth the technological dimensions of the intersection of big data and privacy.” Earlier this week, the Associated Press interviewed Podesta about the results of the review, reporting that the White House had learned of the potential for discrimination through the use of data aggregation and analysis. These are precisely the privacy concerns that stem from data collection that I wrote about earlier this spring. Here’s the PCAST’s list of “things happening today or very soon” that provide examples of technologies that can have benefits but pose privacy risks:

 Pioneered more than a decade ago, devices mounted on utility poles are able to sense the radio stations
being listened to by passing drivers, with the results sold to advertisers.26
 In 2011, automatic license‐plate readers were in use by three quarters of local police departments
surveyed.  Within 5 years, 25% of departments expect to have them installed on all patrol cars, alerting
police when a vehicle associated with an outstanding warrant is in view.27  Meanwhile, civilian uses of
license‐plate readers are emerging, leveraging cloud platforms and promising multiple ways of using the
information collected.28
 Experts at the Massachusetts Institute of Technology and the Cambridge Police Department have used a
machine‐learning algorithm to identify which burglaries likely were committed by the same offender,
thus aiding police investigators.29
 Differential pricing (offering different prices to different customers for essentially the same goods) has
become familiar in domains such as airline tickets and college costs.  Big data may increase the power
and prevalence of this practice and may also decrease even further its transparency.30
 reSpace offers machine‐learning algorithms to the gaming industry that may detect
early signs of gambling addiction or other aberrant behavior among online players.31
 Retailers like CVS and AutoZone analyze their customers’ shopping patterns to improve the layout of
their stores and stock the products their customers want in a particular location.32  By tracking cell
phones, RetailNext offers bricks‐and‐mortar retailers the chance to recognize returning customers, just
as cookies allow them to be recognized by on‐line merchants.33  Similar WiFi tracking technology could
detect how many people are in a closed room (and in some cases their identities).
 The retailer Target inferred that a teenage customer was pregnant and, by mailing her coupons
intended to be useful, unintentionally disclosed this fact to her father.34
 The author of an anonymous book, magazine article, or web posting is frequently “outed” by informal
crowd sourcing, fueled by the natural curiosity of many unrelated individuals.35
 Social media and public sources of records make it easy for anyone to infer the network of friends and
associates of most people who are active on the web, and many who are not.36
 Marist College in Poughkeepsie, New York, uses predictive modeling to identify college students who are
at risk of dropping out, allowing it to target additional support to those in need.37
 The Durkheim Project, funded by the U.S. Department of Defense, analyzes social‐media behavior to
detect early signs of suicidal thoughts among veterans.38
 LendUp, a California‐based startup, sought to use nontraditional data sources such as social media to
provide credit to underserved individuals.  Because of the challenges in ensuring accuracy and fairness,
however, they have been unable to proceed.

The PCAST meeting was open to the public through a teleconference line. I called in and took rough notes on the discussion of the forthcoming report as it progressed. My notes on the comments of professors Susan Graham and Bill Press offer sufficient insight and into the forthcoming report, however, that I thought the public value of publishing them was warranted today, given the ongoing national debate regarding data collection, analysis, privacy and surveillance. The following should not be considered verbatim or an official transcript. The emphases below are mine, as are the words of [brackets]. For that, look for the PCAST to make a recording and transcript available online in the future, at its archive of past meetings.


 

graham-sSusan Graham: Our charge was to look at confluence of big data and privacy, to summarize current tech and the way technology is moving in foreseeable future, including its influence the way we think about privacy.

The first thing that’s very very obvious is that personal data in electronic form is pervasive. Traditional data that was in health and financial [paper] records is now electronic and online. Users provide info about themselves in exchange for various services. They use Web browsers and share their interests. They provide information via social media, Facebook, LinkedIn, Twitter. There is [also] data collected that is invisible, from public cameras, microphones, and sensors.

What is unusual about this environment and big data is the ability to do analysis in huge corpuses of that data. We can learn things from the data that allow us to provide a lot of societal benefits. There is an enormous amount of patient data, data about about disease, and data about genetics. By putting it together, we can learn about treatment. With enough data, we can look at rare diseases, and learn what has been effective. We could not have done this otherwise.

We can analyze more online information about education and learning, not only MOOCs but lots of learning environments. [Analysis] can tell teachers how to present material effectively, to do comparisons about whether one presentation of information works better than another, or analyze how well assessments work with learning styles.
Certain visual information is comprehensible, certain verbal information is hard to understand. Understanding different learning styles [can enable] develop customized teaching.

The reason this all works is the profound nature of analysis. This is the idea of data fusion, where you take multiple sources of information, combine them, which provides much richer picture of some phenomenon. If you look at patterns of human movements on public transport, or pollution measures, or weather, maybe we can predict dynamics caused by human context.

We can use statistics to do statistics-based pattern recognition on large amounts of data. One of the things that we understand about this statistics-based approach is that it might not be 100% accurate if map down to the individual providing data in these patterns. We have to very careful not to make mistakes about individuals because we make [an inference] about a population.

How do we think about privacy? We looked at it from the point of view of harms. There are a variety of ways in which results of big data can create harm, including inappropriate disclosures [of personal information], potential discrimination against groups, classes, or individuals, and embarrassment to individuals or groups.

We turned to what tech has to offer in helping to reduce harms. We looked at a number of technologies in use now. We looked at a bunch coming down the pike. We looked at several tech in use, some of which become less effective because of pervasivesness [of data] and depth of analytics.

We traditionally have controlled [data] collection. We have seen some data collection from cameras and sensors that people don’t know about. If you don’t know, it’s hard to control.

Tech creates many concerns. We have looked at methods coming down the pike. Some are more robust and responsive. We have a number of draft recommendations that we are still working out.

Part of privacy is protecting the data using security methods. That needs to continue. It needs to be used routinely. Security is not the same as privacy, though security helps to protect privacy. There are a number of approaches that are now used by hand that with sufficient research could be automated could be used more reliably, so they scale.

There needs to be more research and education about education about privacy. Professionals need to understand how to treat privacy concerns anytime they deal with personal data. We need to create a large group of professionals who understand privacy, and privacy concerns, in tech.

Technology alone cannot reduce privacy risks. There has to be a policy as well. It was not our role to say what that policy should be. We need to lead by example by using good privacy protecting practices in what the government does and increasingly what the private sector does.

pressBill Press: We tried throughout to think of scenarios and examples. There’s a whole chapter [in the report] devoted explicitly to that.

They range from things being done today, present technology, even though they are not all known to people, to our extrapolations to the outer limits, of what might well happen in next ten years. We tried to balance examples by showing both benefits, they’re great, and they raise challenges, they raise the possibility of new privacy issues.

In another aspect, in Chapter 3, we tried to survey technologies from both sides, with both tech going to bring benefits, those that will protect [people], and also those that will raise concerns.

In our technology survey, we were very much helped by the team at the National Science Foundation. They provided a very clear, detailed outline of where they thought that technology was going.

This was part of our outreach to a large number of experts and members of the public. That doesn’t mean that they agree with our conclusions.

Eric Lander: Can you take everybody through analysis of encryption? Are people using much more? What are the limits?

Graham: The idea behind classical encryption is that when data is stored, when it’s sitting around in a database, let’s say, encryption entangles the representation of the data so that it can’t be read without using a mathematical algorithm and a key to convert a seemingly set of meaningless set of bits into something reasonable.

The same technology, where you convert and change meaningless bits, is used when you send data from one place to another. So, if someone is scanning traffic on internet, you can’t read it. Over the years, we’ve developed pretty robust ways of doing encryption.

The weak link is that to use data, you have to read it, and it becomes unencrypted. Security technologists worry about it being read in the short time.

Encryption technology is vulnerable. The key that unlocks the data is itself vulnerable to theft or getting the wrong user to decrypt.

Both problems of encryption are active topics of research on how to use data without being able to read it. There research on increasingly robustness of encryption, so if a key is disclosed, you haven’t lost everything and you can protect some of data or future encryption of new data. This reduces risk a great deal and is important to use. Encryption alone doesn’t protect.

Unknown Speaker: People read of breaches derived from security. I see a different set of issues of privacy from big data vs those in security. Can you distinguish them?

Bill Press: Privacy and security are different issues. Security is necessary to have good privacy in the technological sense if communications are insecure, they clearly can’t be private. This goes beyond, to where parties that are authorized, in a security sense, to see the information. Privacy is much closer to values. security is much closer to protocols.

Interesting thing is that this is less about purely tech elements — everyone can agree on right protocol, eventually. These things that go beyond and have to do with values.

With major pharmacies on board, is the Blue Button about to scale nation-wide?

blue_button_for_homepage1The Obama administration announced significant adoption for the Blue Button in the private sector today. In a post at the White House Office of Science and Technology blog, Nick Sinai, U.S. deputy chief technology officer and Adam Dole, a Presidential Innovation Fellow at the U.S. Department of Health and Human Services, listed major pharmacies and retailers joining the Blue Button initiative, which enables people to download a personal health record in an open, machine-readable electronic format:

“These commitments from some of the Nation’s largest retail pharmacy chains and associations promise to provide a growing number of patients with easy and secure access to their own personal pharmacy prescription history and allow them to check their medication history for accuracy, access prescription lists from multiple doctors, and securely share this information with their healthcare providers,” they wrote.

“As companies move towards standard formats and the ability to securely transmit this information electronically, Americans will be able to use their pharmacy records with new innovative software applications and services that can improve medication adherence, reduce dosing errors, prevent adverse drug interactions, and save lives. ”

While I referred to the Blue Button obliquely at ReadWrite almost two years ago and in many other stories, I can’t help but wish that I’d finished my feature for Radar a year ago and written up a full analytical report. Extending access to a downloadable personal health record to millions of Americans has been an important, steadily shift that has largely gone unappreciated, despite reporting like Ina Fried’s regarding veterans getting downloadable health information.  According to the Office of the National Coordinator for Health IT, “more than 5.4 million veterans have now downloaded their Blue Button data and more than 500 companies and organizations in the private-sector have pledged to support it.

As I’ve said before, data standards are the railway gauges of the 21st century. When they’re agreed upon and built out, remarkable things can happen. This is one of those public-private initiatives that has taken years to take fruit that stands to substantially improve the lives of so many people. This one started with something simple, when the administration gave military veterans the ability to download their own health records using from on MyMedicare.gov and MyHealthyVet and scaled progressively to Medicare recipients and then Aetna and other players from there.

There have been bumps and bruises along with the way, from issues with the standard to concerns about lost devices, but this news of adoption by places like CVS suggests the Blue Button is about to go mainstream in a big way. According to the White House, “more than 150 million Americans today are able to use Blue Button-enabled tools to access their own health information from a variety of sources including healthcare providers, health insurance companies, medical labs, and state health information networks.”

Notably, HHS has ruled that doctors and clinics that implement the new “BlueButton+” specification will be meeting the requirements of “View, Download, and Transmit (V/D/T)” in Meaningful Use Stage 2 for electronic health records under the HITECH Act, meaning they can apply for reimbursement. According to ONC, that MU program currently includes half of eligible physicians and more than 80 percent of hospitals in the United States. With that carrot, many more Americans should expect to see a Blue Button in the doctor’s office soon.

In the video below, U.S. chief technology officer Todd Park speaks with me about the Blue Button and the work of Dole and other presidential innovation fellows on the project.

Opening IRS e-file data would add innovation and transparency to $1.6 trillion U.S. nonprofit sector

One of the most important open government data efforts in United States history came into being in 1993, when citizen archivist Carl Malamud used a small planning grant from the National Science Foundation to license data from the Securities and Exchange Commission, published the SEC data on the Internet and then operated it for two years. At the end of the grant, the SEC decided to make the EDGAR data available itself — albeit not without some significant prodding — and has continued to do so ever since. You can read the history behind putting periodic reports of public corporations online at Malamud’s website, public.resource.org.

Meals-on-Wheels-Reports

Two decades later, Malamud is working to make the law public, reform copyright, and free up government data again, buying, processing and publishing millions of public tax filings from nonprofits to the Internal Revenue Service. He has made the bulk data from these efforts available to the public and anyone else who wants to use it.

“This is exactly analogous to the SEC and the EDGAR database,” Malamud told me, in an phone interview last year. The trouble is that data has been deliberately dumbed down, he said. “If you make the data available, you will get innovation.”

Making millions of Form 990 returns free online is not a minor public service. Despite many nonprofits file their Form 990s electronically, the IRS does not publish the data. Rather, the government agency releases images of millions of returns formatted as .TIFF files onto multiple DVDs to people and companies willing and able to pay thousands of dollars for them. Services like Guidestar, for instance, acquire the data, convert it to PDFs and use it to provide information about nonprofits. (Registered users view the returns on their website.)

As Sam Roudman reported at TechPresident, Luke Rosiak, a senior watchdog reporter for the Washington Examiner, took the files Malamud published and made them more useful. Specifically, he used credits for processing that Amazon donated to participants in the 2013 National Day of Civic Hacking to make the .TIFF files text-searchable. Rosiak then set up CItizenAudit.org a new website that makes nonprofit transparency easy.

“This is useful information to track lobbying,” Malamud told me. “A state attorney general could just search for all nonprofits that received funds from a donor.”

Malamud estimates nearly 9% of jobs in the U.S. are in this sector. “This is an issue of capital allocation and market efficiency,” he said. “Who are the most efficient players? This is more than a CEO making too much money — it’s about ensuring that investments in nonprofits get a return.

Malamud’s open data is acting as a platform for innovation, much as legislation.gov.uk is the United Kingdom. The difference is that it’s the effort of a citizen that’s providing the open data, not the agency: Form 990 data is not on Data.gov.

Opening Form 990 data should be a no-brainer for an Obama administration that has taken historic steps to open government dataLiberating nonprofit sector data would provide useful transparency into a $1.6 trillion dollar sector for the U.S. economy.

After many letters to the White House and discussions with the IRS, however, Malamud filed suit against the IRS to release Form 990 data online this summer.

“I think inertia is behind the delay,” he told me, in our interview. “These are not the expense accounts of government employees. This is something much more fundamental about a $1.6 trillion dollar marketplace. It’s not about who gave money to a politician.”

When asked for comment, a spokesperson for the White House Office of Management and Budget said that the IRS “has been engaging on this topic with interested stakeholders” and that “the Administration’s Fiscal Year 2014 revenue proposals would let the IRS receive all Form 990 information electronically, allowing us to make all such data available in machine readable format.”

Today, Malamud sent a letter of complaint to Howard Shelanski, administrator of the Office of Information and Regulatory Affairs in the White House Office of Management and Budget, asking for a review of the pricing policies of the IRS after a significant increase year-over-year. Specifically, Malamud wrote that the IRS is violating the requirements of President Obama’s executive order on open data:

The current method of distribution is a clear violation of the President’s instructions to
move towards more open data formats, including the requirements of the May 9, 2013
Executive Order making “open and machine readable the new default for government
information.”

I believe the current pricing policies do not make any sense for a government
information dissemination service in this century, hence my request for your review.
There are also significant additional issues that the IRS refuses to address, including
substantial privacy problems with their database and a flat-our refusal to even
consider release of the Form 990 E-File data, a format that would greatly increase the
transparency and effectiveness of our non-profit marketplace and is required by law.

It’s not clear at all whether the continued pressure from Malamud, the obvious utility of CitizenAudit.org or the bipartisan budget deal that President Obama signed in December will push the IRS to freely release open government data about the nonprofit sector,

The furor last summer over the IRS investigating the status of conservative groups claimed tax-exempt status, however, could carry over into political pressure to reform. If political groups were tax-exempt and nonprofit e-file data were published about them, it would be possible for auditors, journalists and Congressional investigators to detect patterns. The IRS would need to be careful about scrubbing the data of personal information: last year, the IRS mistakenly exposed thousands of Social Security numbers when it posted 527 forms online — an issue that Malamud, as it turns out, discovered in an audit.

“This data is up there with EDGAR, in terms of its potential,” said Malamud. “There are lots of databases. Few are as vital to government at large. This is not just about jobs. It’s like not releasing patent data.”

If the IRS were to modernize its audit system, inspector generals could use automated predictive data analysis to find aberrations to flag for a human to examine, enabling government watchdogs and investigative journalists to potentially detect similar issues much earlier.

That level of data-driven transparency remains in the future. In the meantime, CitizenAudit.org is currently running on a server in Rosiak’s apartment.

Whether the IRS adopts it as the SEC did EDGAR remains to be seen.

[Image Credit: Meals on Wheels]

IRS enables Americans to download their tax transcripts over the Internet

UPDATE: This service was taken offline after IRS security was compromised.

UPDATE: Learn how to download your tax transcript from IRS.gov.

button_online_transcript

Earlier today, at the White House Education Datapalooza, an official from the United States Department of the Treasury informed a packed theater and livestream that students, parents and citizens would finally be able to do something simple and profoundly useful over the Internet: download a transcript of their tax return from the Internal Revenue Service.

“I am very excited to announce that the IRS has just launched, this week, a transcript application which will give taxpayers the ability to view, print, and download tax transcripts,” said Katherine Sydor, a policy advisor in the Office of Consumer Policy of the Treasury, “making it easier for student borrowers to access tax records he or she might need to submit loan applications or grant applications.” [VIDEO]

Previously, filers could request a copy of the transcript (not the full return) but would have to wait 5-10 business days to receive it in the mail. For people who needed more rapid access for applications, the delay could be critical. A White House fact sheet subsequently confirmed the news, under the rubric of “streamlining application paperwork,” and a quick follow up with an official secured the correct URL for the new IRS Web application to get a tax transcript.

irs-transcript

I created an account, which involved jumping through the  hoops familiar from establishing online access bank accounts — choosing pass phrase, pass image and security questions — and then answered a number of questions that made it pretty clear that the IRS knew exactly who I was and where I had lived. (It’s not clear whether they hold this information or used a credit bureau, from the consumer-side.)

When I tried to actually download the transcript, though, I ran into some issues: first, a browser error in Chrome — “This XML file does not appear to have any style information associated with it. The document tree is shown below.” Using Firefox, however, I was able to at least get the page where I could choose from various years of transcripts.

irs-transcript-purposes

Unfortunately, clicking any of the links delivered a file that my Macbook was unable to parse. I was, however, able to log into IRS.gov and easily download last year’s tax return with one click to my iPhone. Success!

While the technical problems I ran into suggest that Apple computer users might run into some issues, I have a funny feeling that (the vast majority) of people who are running Internet Explorer on a Windows machine will fare better.

The fact that American citizens could not access their own tax returns online in 2014 might seem jarring but, until this week, that was the status quo. This advance represents the sort of somewhat mundane but important shift that the Obama administration’s approach to digital government have enabled over the past five years.

While the troubles behind the botched launch of Healthcare.gov have shaken the confidence of many citizens in the capacity of this administration to deliver effective digital services and months of headlines about digital surveillance by the National Security Agency have diminished trust in government overall, the ability of the “tech surge” to fix the site and the success of the technology team at the Consumer Financial Protection Bureau not only offers a guide for how to avoid similar issues but highlights a less salacious and boring reality that will generate no headlines nor heated rhetoric on cable news shows: most public officials and civil servants are quietly working to deliver better customer service for citizens.

Being able to download a tax transcript online is not, however, without risks. The Internal Revenue Service will need to continue to be vigilant about security. The new functionality will almost certainly inspire fraudsters to create mockups of the government website that look similar and then send phishing emails to consumers, urging them to “log in” to fake websites.

Perhaps most problematically, people will download tax transcripts to mobile devices and laptops and then not take steps to protect them with encryption. If you do download your transcripts or personal health information, make sure to also install full disk encryption on every machine you own. Leaving your files unprotected there is like leaving the door to your house unlocked with your tax returns and medical records on the kitchen table.

I have asked the IRS for comment on the new feature, browser and operating system and security guidance and will update this post if and when I receive any.

Update: comment from the IRS on follows.

How much time and technical resources did the IRS invest in deploying the feature? Has the IRS increased the capacity of the website for more demand?

From establishing the business case and receiving funding plus approval to start the work to implementation took approximately one year. Additional time was spent in ideation, innovation, and confirming requirements of the product prior to receiving approval.

I had trouble downloading my transcript on an Apple computer using Chrome and Firefox. (I was able to get it through my iPhone.) What browsers and operating systems does the new function officially support?

As a web application, Get Transcript is supported on most modern OS/browser combinations. While there may be intermittent issues due to certain end-user configurations, IRS has not implemented any restrictions against certain browsers or operating systems. We are continuing to work open issues as they are identified and validated.

A side note: For the best user experience, taxpayers may want to try up-to-date versions of internet explorer and a supported version of Microsoft windows; however, that is certainly not a requirement.)

Does the IRS have any guidance for ensuring that Americans connect securely to the website and then protect tax returns on their home computers once they have downloaded them?

The IRS has made good progress on oversight and enhanced security controls in the area of information technology. With state-of-the-art technology as the foundation for our portal (e.g. irs.gov), we continue to focus on protecting the PII of all taxpayers when communicating with the IRS.

However, security is a two-way street with both the IRS and users needing to take steps for a secure experience. On our end, our security is comparable to leaders in private industry.

Our IRS2GO app has successfully completed a security assessment and received approval to launch by our cybersecurity organization after being scanned for weaknesses and vulnerabilities.

Any personally identifiable information (PII) or sensitive information transmitted to the IRS through IRS2Go for refund status or tax record requests uses secure communication channels that meet or exceed federal requirements for encryption. No PII is passed back to the taxpayer through IRS2GO and no PII is stored on the smartphone by the application.

When using our popular Where’s My Refund? application, taxpayers may notice just a few of our security measures. The URL for Where’s My Refund? begins with https. Just like in private industry, the “s” is a key indicator that a web user should notice indicating you are in a “secure session.” Taxpayers may also notice our message that we recommend they close their browser when finished accessing your refund status.

As we become a more mobile society and able to link to the internet while we’re on the go, we remind taxpayers to take precautions to protect themselves from being victimized, including using secure networks, firewalls, virus protection and other safeguards.

We always recommend taxpayers check with the Federal Trade Commission for the latest on reporting incidents of identity theft. You can find more information on our website, including tips if you believe you have become the victim of identity theft.

Does the IRS have any plans to provide Americans with access or insight to estimated tax returns online in the future? Now that we have the ability to establish user accounts, would it ever be possible, for instance, for people with simple taxes (1040EZ, etc) to log in, review an estimated return, make any required edits, and then e-file it on IRS.gov?

IRS: The IRS is considering a number of new proposals that may become a part of the online services roadmap some time in the future. This may include a taxpayer account where up to date status could be securely reviewed by the account owner.

Note: This post has been updated throughout to make it clear that the IRS has provided online access to tax transcripts, not the entire return. You can read up on the difference between a tax transcript and tax return here.

United Kingdom looks to put 50 million health records online and increase patient data rights

This Monday, Minister of Parliament Jeremy Hunt, the United Kingdom’s Secretary of State for Health, delivered a keynote address at the fourth annual Health Datapolooza in Washington, DC. In a rhetorical turn that would be anathema for any national conservative politician on this side of the Atlantic, Hunt commended the United States for taking steps towards providing universal health insurance to its people.

Hunt outlined three major elements in a strategy to improve health care in the UK: 1) applying data more effectively 2) improving transactional capabilities and 3) putting patients in the “driver’s seat” of their own health care. He pointed to several initiatives that support that strategy, from extending electronic health records to 50 million people to sequencing the genomes of 100,000 people and developing telemedicine capabilities for 3 million patients. Given the focus of the datapalooza, however, perhaps his most interesting statement came with respect to personal data ownership:

After the keynote, I interviewed Secretary Hunt and Tim Kelsey, the first national director for patients and information in the National Health Service. Our discussion, lightly edited, follows.

What substantive steps has the UK taken to actually putting health data in the hands of patients?

Hunt: Basically, I have given an instruction that everyone should be able to access their own health record online before the next general election, which means that I will be accountable for delivering that promise. There’s no wiggle room for me. That’s a big change, and it’s also a big change for the system because, basically, it means that every hospital and every general practitioner has to get used to the idea that the data they write about patients will be able to be accessed by patients. It’s a small but very significant first step.

There’s sometimes a disconnect between what politicians direct and what systems actually do. What’s happening with the UK’s long-delayed EHR system?

Hunt: I’ve given a pretty accountable timeframe for this: May 2015. I’ll be facing a general election campaign then. If we don’t deliver, then my head’s going to be on the block. I think it is a valid question, because of course once you set these objectives, then you start to look underneath it. One of the questions that we have to ask ourselves is how many have actually used this. We want everyone to be able to use this, but in practice, if the way they use it is they’re going to have to go into their GP, they’ve got to sign a consent form, there’s some complex procedure, then actually it’s not going to change people’s lives. The next question is about take-up, and that’s what we’re exploring at the moment.

Are there any aspects of the U.S. healthcare system that you think might be worth adopting and bringing back to the U.K.? Or vice versa?

Well, it’s quite interesting. We just had a really good meeting with [US CTO] Todd Park. I don’t think the differences are so great. I mean, on one level, yes, hospitals here are private or charitable, so they can’t be mandated by the government to do anything. And yet, they’ve succeeded in getting 80% of them to adopt EHRs through setting a standard and a certain amount of financial incentive. We can tell our hospitals to do things, but actually, as you said earlier, that’s not the same as them actually doing it.

I think in the end, in both countries, what you have to do is make it so that it’s in the hospitals’ own interest. In our case, the way that we’re doing that is trying to demonstrate that sensibly embracing the technology agenda has a massive effect on reducing mortality rates and improving clinical outcomes. By publishing all of the data about those outcomes, we’re creating competition between hospitals. That, I hope, will drive this agenda.

At the same time, we need to change public awareness. This is the big challenge – this sense that you can actually be in charge of your own health is just, surprisingly, absent in large numbers of people. There’s a very strong sense that lots of people have that “health is something that’s done to me” by NHS.

In the U.S has released data on the disparities in pricing for hospital procedures and comparisons of hospital quality — but you still need to go to places that take your health insurance. In the NHS, is that as much of an issue?

Hunt: That’s a really good question to ask because, in the U.K, for virtually any procedure, you have the right to have it done in any hospital in the country — and yet, very few people avail themselves of that right. So, by publishing surgical survival rates, we’re hoping to create pressure, where people actually say “I’m going to have this heart operation, and I’m not going to go to my local hospital, I’m going to go to this one a bit farther away that has higher success rates.” At the moment, people don’t actually do that; they tend to go where they’re recommended to. That’s where this information revolution can take hold.

What is the most unexpected thing that has happened since the U.K. began releasing more open data about health?

Tim Kelsey: I don’t know if this is unexpected or not, but the most startling thing is that we’ve moved from having one of the worst heart surgery survival rates in Europe to being the best. Heart surgery is the only speciality where we’ve published comparative data by heart surgeons across the whole country.

Do you think that’s an accident?

Tim Kelsey: No, I don’t think it’s an accident at all. Within that data, if you look at what has actually happened, the assumption of the geniuses who actually pioneered the program was that the gap between the best surgeons and the worst surgeons would narrow, because the weaker surgeons would raise their game. That didn’t happen. What happened was that the best surgeons got even better, and the underperforming surgeons also raised their game. The truth is that they want to be the winner, and open data has had a massive impact in driving outcomes and standards.

What are the most important principles or substantive steps that you’re applying at the NHS to mitigate risks or harms from privacy breaches?

Hunt: We have to carry the public with us. We have a very strong free press, as you do, and we’re very proud of that. If they believe that people’s data is going to be used to infringe their privacy, then public confidence in the huge revolution that the dataaplooza is all about will be shaken and lack a massive impact. I think that there’s a very simple way that you maintain public confidence, which is by making it absolutely clear that you own that data. You can choose, if you don’t want that data to be used, in even in an anonymized form, you can say I’m not going to share my data. I think once you do that, you create a discipline in the system to make sure that the anonymization of data is credible, because people can withdraw their consent if they don’t believe it.

Also, you put people in the driver’s seat, because I think people’s motives are different. You and I, as young and hopefully healthful individuals, we’re thinking about privacy. If somebody’s got terrible cancer, he’s actually thinking, ‘well, I would really like my data to be used for the benefit of humanity.’ They’re actually very, very happy to have their data shared. They have a different set of concerns.

I don’t think you’ll have any trouble, for example, getting 100,000 people to consent to have their genome sequenced. These will be people who have cancer, and once you have cancer, you think, ‘what can I do to help future generations conquer cancer?’ The mentality changes. We have to maintain people’s confidence.

I think the best analogy, though, is banking. Perhaps the second thing people care about most after their health is their money, and the banks have been able to maintain people’s confidence. They’re actually doing banking online, so that you can access your bank account from any PC, anywhere in the world. It’s something you can do with confidence. They’ve done that because they’ve thought through the procedures.

In the U.S., you’re entitled to access a free copy of your credit report once a year. Consumers, however, still don’t have access to their own data across much of the private sector. Will the British government support “rights to data for its citizens?”

Hunt:: We are hoping to preempt the worry about that by instructing the NHS that everyone has a right of veto over the use of their own data. You own your own medical record. If you don’t want that shared, then that’s your decision, and you’re able to do that. If we didn’t do that, I think the courts might make us do that.

Kelsey: Just to clarify that point: The Data Protection Act, which is effectively a European piece of legislation, says that people have the right to object to data being shared, in any context, private sector or health or otherwise, or to opt out. We’ve said, because of the rights priority we’re giving to patients as the de facto owner of the data, which is different from the American situation so far.

We’re setting a global standard here, which will be interesting experiment for the rest of the world to watch, that people will have the right to say “I don’t want my data shared” — and people will respect that. Now, at the moment that is not a legal right, that is a de facto right that will be expected. It may well be that we’ll need to simply write down a law that this is an individual’s data and rights flow from that. At the moment, there’s no law that gives an individual patient the right to their own data nor to opt out out of its sharing.