Digging in open data dirt, Climate Corporation finds nearly $1 billion in acquisition

“Like the weather information, the data on soils was free for the taking. The hard and expensive part is turning the data into a product.”-Quentin Hardy, in 2011, in a blog post about “big data in the dirt.”

soil-type-graphic_wide-7a4e4709ff8554fc2abafaa342589fccf0524189-s6-c30

The Climate Corporation, acquired by Monsanto for $930 million dollars on Wednesday, was founded using 30 years of government data from the National Weather Service, 60 years of crop yield data and 14 terabytes of information on soil types for every two square miles for the United States from the Department of Agriculture, per David Friedberg, chief executive of the Climate Corporation.

Howsabout that for government “data as infrastructure” and platform for businesses?

As it happens, not everyone is thrilled to hear about that angle or the acquirer. At VentureBeat, Rebecca Grant takes the opportunity to knock “the world’s most evil corporation for the effects of Monsanto’s genetically modified crops, and, writing for Salon, Andrew Leonard takes the position that the Climate Corporation’s use of government data constitutes a huge “taxpayer ripoff.”

Most observers, however, are more bullish. Hamish MacKenzie hails the acquisition as confirmation that “software is eating the world,” signifying an inflection point in data analysis transforming industries far away from Silicon Valley. Liz Gannes also highlighted the application of data-driven analysis to an offline industry. Ashlee Vance focused on the value of Climate Corporation’s approach to scoring risk for insurance in agribusiness. Stacey Higginbotham posits that the acquisition could be a boon to startups that specialize in creating data on soil and climate through sensors.

[Image Credit: Climate Corporation, via NPR]

Hedge fund use of government data for business intelligence shows where the value is

money vortexThis week, I read and shared a notable Wall Street Journal article on the value of government data in that bears highlighting: hedge funds are paying for market intelligence using open government laws as an acquisition vehicle.

Here’s a key excerpt from the story: “Finance professionals have been pulling every lever they can these days to extract information from the government. Many have discovered that the biggest lever of all is the one available to everyone—the Freedom of Information Act—conceived by advocates of open government to shine light on how officials make decisions. FOIA is part of an array of techniques sophisticated investors are using to try to obtain potentially market-moving information about products, legislation, regulation and government economic statistics.”

What’s left unclear by the reporting here is if there’s 1) strong interest in data and 2) deep pocketed hedge funds or well-financed startups are paying for it, why aren’t agencies releasing it proactively?

Notably, the relevant law provides for this, as the WSJ reported:

“The only way investors can get most reports is to send an open-records request to the FDA. Under a 1996 law, when the agency gets frequent requests for the same records—generally more than three—it has to make them public on its website. But there isn’t any specific deadline for doing so, says Christopher Kelly, an FDA spokesman. That means first requesters can get records days or even months before they are posted online.”

Tracking inbound FOIA requests from industry and responding to this market indicator as a means of valuing  “high value data” is a strategy that has been glaringly obvious for years. Unfortunately, it’s an area in which the Obama administration’s open data policies look to have failed over the last 4 years, at least as viewed through the prism of this article.

If data sets that are requested multiple times are not being proactively posted on Data.gov and tracked there, there’s a disconnect between what the market for government data is and the perception by officials. As the Obama administration and agencies prepare to roll out enterprise data inventories later fall as part of the open data policies, here’s hoping agency CIOs are also taking steps to track who’s paying for data and which data sets are requested.

If one of the express goals of the federal governments is to find an economic return on investment on data releases, agencies should focus on open data with business value. It’s just common sense.

[Image Credit: Open Knowledge Foundation, Follow the Money]

Intelligence executive David Bray to become new FCC CIO

david-bray-flack-jacketDavid Bray, a seasoned national intelligence executive (CV), will be the next chief information officer of the Federal Communications Commission. He’s expected to finish his work in the intelligence community at the Office of the Director for National Intelligence and commence work at the FCC in August.

“As the next FCC CIO, I look forward [to] aiding the FCC’s strong workforce in pioneering new IT solutions for spectrum auctions, next-gen cybersecurity, mobile workforce options, real-time enterprise analytics, enhanced open data, and several other vital public-private initiatives,” wrote Bray, in an email sent to staff and partners Monday night.

Bray holds a a PhD in information systems, a MSPH in public health informatics, and a BSCI in computer science and biology from Emory University, alongside a visiting associateship from the University of Oxford’s Oxford Internet Institute, and two post-doctoral associateships with MIT’s Center for Collective Intelligence and the Harvard Kennedy School. He also has served as a visiting associate with the National Defense University. Bray’s career also includes deployments to Afghanistan, projects at the Department of Energy and work at the Center of Disease Control.

Bray will inherit many IT challenges from former FCC CIO Robert Naylor, who announced that he’d be stepping down in December 2012. His background in the intelligence community will serve him well, with respect to network security issues, but he’ll need to continue to transition an agency that has traditionally outsourced much of its technology to 21st century computing standards and approaches to building infrastructure and meeting increasing demand for services.

Bray’s past work in collective intelligence, informatics, public health and data science suggest that he’ll have no shortage of vision to bring to the role. His challenge, as is true for every federal CIO these days, will be to work within limited budgets and under intense scrutiny to deliver on the promise.

To get a sense of Bray, watch his talk on “21st century social institutions at a brunch for Emory University scholars, in 2013:

New York City moves to apply predictive data analytics to preventing fires

“The total embedding of analytics in New York City has just really passed the tipping point,” related Michael Flowers, in an email last week. Flowers, the Big Apple’s first “chief analytics officer, discussed how predictive data analytics were saving lives and taxpayer dollars with me last year. In the months since, he has continued to apply data science to regulatory data in the public sector. The work of “Mayor Bloomberg’s Geek Squad” finally drew major media notice in March, when the New York Times featured their accomplishments.

Last week, New York City went further into its data-driven future when it announced plans to reduce deaths from fires by applying new risk-based fire inspection system. Essentially, NYC is applying the same predictive data analytics to assess and prioritize the buildings that firefighters inspect every year.

“Uniformed firefighters currently perform 50,000 full-building fire safety inspections every year and until now, fire officers had very limited information about how to prioritize buildings for inspection in the districts they protect,” said Mayor Bloomberg, in a statement. “Our new system changes that. Drawing on building information from many sources, the Risk Based Inspection System enables fire companies to prioritize the buildings that pose the greatest fire risk—and that means we’ll stop more fires before they can start.”

Instead of cyclical inspections, the new NYC Fire Department system “tracks, scores, prioritizes, and then automatically schedules a building for inspection.”

While this kind of algorithmic regulation may send off warning bells in some observers, the use of such technology to score risk and, crucially, send trained human beings to investigate.

Flowers pointed out other areas where this kind of complementary action matter.

“To me, the Hurricane Sandy Administration Action Plan released [last week] is the most powerful expression of what’s happened here in the last 18 months or so,” he said. “We essentially served as the primary intelligence center for Sandy response and recovery. It speaks to things we are doing internally or externally with regards to data leveraging, synthesis, analysis and sharing to get to the most critical need the fastest. It shows how quickly we’ve rooted the concept into how the city does business.”

New Yorkers should expect more of this approach to governance in the future — and to gain more insight as the city’s developers and media analyze datasets released to the public.

“Up next is to roll out of the platform to the rest of the city, pushing all this data dynamically to the open data portal,” related Flowers, “which itself is being redone to reflect curated data, a development portal, and risk-based resource allocation over the Departments of Buildings, Fire, Finance, Housing and a few others.”

In the video below, recorded at the Strata Conference in NYC last year, Flowers talks more about his work.

Election 2012: A #SocialElection Driven By The Data

Social media was a bigger part of the election season of 2012 than ever before, from the enormous volume of Facebook updates and tweets to memes during the Presidential debates to public awareness of what the campaigns were doing there in popular culture. Facebook may even have booted President Obama’s vote tally.

While it’s too early to say if any of the plethora of platforms played any sort of determinative role in 2012, strong interest in what social media meant in this election season led me to participate in two panels in the past two weeks: one during DC Week 2012 and another at the National Press Club, earlier today. Storifies of the online conversations during each one are embedded below.

http://storify.com/digiphile/social-media-and-election-2012-at-dc-week-2012.js[View the story “Social media and Election 2012 at DC Week 2012” on Storify]

http://storify.com/digiphile/election-2012-the-socialelection.js[View the story “Election 2012: The #SocialElection?” on Storify]

The big tech story of this campaign, however, was not social media. As Micah Sifry presciently observed last year, it wasn’t (just) about Facebook: “it’s the data, stupid.” And when it came to building for this re-election campaign like an Internet company, the digital infrastructure that the Obama campaign’s team of engineers built helped to deliver the 2012 election.

Do newspapers need to adopt data science to cover campaigns?

Last October, New York Times elections developer Derek Willis was worried about what we don’t know about elections:

While campaigns have a public presence that is mostly recorded and observed, the stuff that goes on behind the scenes is so much more sophisticated than it has been. In 2008 we were fascinated by the Obama campaign’s use of iPhones for data collection; now we’re entering an age where campaigns don’t just collect information by hand, but harvest it and learn from it. An “information arms race,” as GOP consultant Alex Gage puts it.

For most news organizations, the standard approach to campaign coverage is tantamount to bringing a knife to a gun fight. How many data scientists work for news organizations? We are falling behind, and we risk not being able to explain to our readers and users how their representatives get elected or defeated.

Writing for the New York Times today, Slate columnist Sasha Issenberg revisited that theme, arguing that campaign reporters are behind the curve in understanding, analyzing or being able to capably replicate what political campaigns are now doing with data. Whether you’re new to the reality of the role of big data in this campaign or fascinated by it, a recent online conference on the data-driven politics of 2012 will be of interest. I’ve embedded it below:

Issenberg’s post has stirred online debate amongst journalists, academics and at least one open government technologist. I’ve embedded a storify of them below.

http://storify.com/digiphile/should-newspapers-adopt-data-science-to-cover-poli.js[View the story “Should newspapers adopt data science to cover political campaigns?” on Storify]

Social citizenship: CNN and Facebook to partner on “I’m Voting” app in 2012 election

Two years ago, I wondered whether “social voting” on Foursquare would increase voter participation.

That experiment is about to be writ much larger. In a release today, first reported (as far as I can tell) by Mike Allen in Politico Playbook, CNN and Facebook announced that they will be partnering on a “I’m Voting” Facebook app that will display commitments to vote on timelines, newsfeeds and the “real-time ticker” in Facebook.

“Each campaign cycle brings new technologies that enhance the way that important connections between citizens and their elected representatives are made. Though the mediums have changed, the critical linkages between candidates and voters­ remain,” said Joel Kaplan, Facebook Vice President-U.S. Public Policy, in a prepared statement. “Innovations like Facebook can help transform this informational experience into a social one for the American people.”

“By allowing citizens to connect in an authentic and meaningful way with presidential candidates and discuss critical issues facing the country, we hope more voters than ever will get involved with issues that matter most to them,” said Joe Lockhart, Facebook Vice President Corporate Communications, in a prepared statement. “Facebook is pleased to partner with CNN on this uniquely participatory experience.”

“We fundamentally changed the way people consume live event coverage, setting a record for the most-watched live video event in Internet history, when we teamed up with Facebook for the 2009 Inauguration of President Obama,” said KC Estenson, SVP CNN Digital, in a prepared statement. “By again harnessing the power of the Facebook platform and coupling it with the best of our journalism, we will redefine how people engage in the democratic process and advance the way a news organization covers a national election.”

“This partnership doubles down on CNN’s mission to provide the most engaging coverage of the 2012 election season,” said Sam Feist, CNN Washington bureau chief, in a prepared statement. “CNN’s unparalleled political reporting combined with Facebook’s social connectivity will empower more American voters in this critical election season.”

What will ‘social citizenship’ mean?

There’s also a larger question about the effect of these technologies on society: Will social networks encouraging people to share their voting behavior lead to more engagement throughout the year? After all, people are citizens 365 days a year, not just every two years on election day. Will “social citizenship” play a role in Election 2012?

In 2010, Foursquare founder Dennis Crowley said yes. As has often been the case (Dodgeball, anyone?), Crowley may well have been ahead of his time.

“One of the things that we’re finding is that when people send their Foursquare checkins out to Twitter and to Facebook, it can drive behaviors,” said Crowley in 2010. “If I check into a coffee shop all the time, my friends are going to be like, hey, I want to go to that coffee shop. We’re thinking the same thing could happen en masse if you start checking into these polling stations, if you start broadcasting that you voted, it may encourage other friends to go out there and do something.”

The early evidence, at least from healthcare in 2010, was that social sharing can lead to more awareness and promote health. Whether civic health improves, at least as measured in voter participation, is another matter. How you voted used to be a question that each registered citizen could choose to keep to him or herself. In 2012 and the age of social media, that social norm may be shifting.

One clear winner in Election 2012, however, will almost certainly be Facebook, which will be collecting a lot of data about users that participate in this app and associated surveys — and that data will be of great interest to political scientists and future campaigns alike.

“Since both CNN and [Facebook] are commercial entities, and since data collection/tracking practices in these apps are increasingly invasive, I am curious to see how these developments impact the evolution of the currently outdated US privacy regime,” commented Vivian Tero, an IDC analyst focused on governance, risk and compliance.

UPDATE: The Poynter Institute picked up this story and connected it in a tweet with a recent AdWeek interview with CNN digital senior vice president and general manager KC Estenson on “CNN’s digital power play.

Estenson, whose network has been suffering from lower ratings of late, notes that online, CNN is now “regularly getting 60 million unique users,” with an “average 20 million minutes a month across the platforms” and CNN Digital generating 110 million video streams per month.

That kind of traffic could power a lot of Likes.

Full release by Facebook on U.S. Politics over on Facebook.

This post has been updated as more information became available, via Facebook spokesman Andrew Noyes.

U.S. cities form working group to share predictive data analytics skills

Yesterday, I published an interview with Michael Flowers, New York City’s director of analytics for the Office of Policy and Strategic Planning in Mayor Bloomberg’s office. In the interview, “Predictive data analytics is saving lives and taxpayer dollars in New York City,” Flowers talks about how his team of 5 is applying data analysis on the behalf of citizens to improve the efficiency of processes and more effectively detection of crimes, from financial fraud to cigarette bootlegging.

After our interview, Flowers followed up over email to tell me about a new working group on data analytics between New York City, Boston, Chicago and Philadelphia. The working group, which recently launched a website at www.g-analytics.org, is sharing methodologies, ideas and strategies,

“Ultimately we want the group to grow and support as many cities interested in pursuing this approach as possible,” wrote Flowers. “It can get pretty lonely when you pursue something asymmetrical or untraditional in the government space, so we felt it was important to make it as simple as possible for like-minded cities to get started. There’s a great guy I work closely with out in Chicago on this effort – [Chicago chief data officer] Brett Goldstein; we talk at least twice a week.”

What is smart government?

Last month, I traveled to Moldova to speak at a “smart society” summit hosted by the Moldovan national e-government center and the World Bank. I talked about what I’ve been seeing and reporting on around the world and some broad principles for “smart government.” It was one of the first keynote talks I’ve ever given and, from what I gather, it went well: the Moldovan government asked me to give a reprise to their cabinet and prime minister the next day.

I’ve embedded the entirety of the morning session above, including my talk (which is about half an hour long). I was preceded by professor Beth Noveck, the former deputy CTO for open government at The White House. If you watch the entire program, you’ll hear from:

  • Victor Bodiu, General Secretary, Government of the Republic of Moldova, National Coordinator, Governance e-Transformation Agenda
  • Dona Scola, Deputy Minister, Ministry of Information Technology and Communication
  • Andrew Stott, UK Transparency Board, former UK Government Director for Transparency and Digital Engagement
  • Victor Bodiu, General Secretary, Government of the Republic of Moldova
  • Arcadie Barbarosie, Executive Director, Institute of Public Policy, Moldova

Without planning on it, I managed to deliver a one-liner that morning that’s worth rephrasing and reiterating here: Smart government should not just serve citizens with smartphones.

I look forward to your thoughts and comments, for those of you who make it through the whole keynote.

Startup Weekend DC kickoff highlights open data, startups and disruptive innovation

On Friday night, a packed room of eager potential entrepreneurs, developers and curious citizens watched US CTO Todd Park and Bill Eggers kick off Startup Weekend DC in Microsoft’s offices in Chevy Chase, Maryland.

//platform.twitter.com/widgets.jsPark brought his customary energy and geeky humor to his short talk, pitching the assembled crowd on using open government data in their ideas.

 

Park wants to inject open data as a “fuel” into the economy. After talking about the success of the Health Data Initiative and the Health Datapalooza, he shared a series of websites were aspiring entrepreneurs could find data to use:

Park also made an “ask” of the attendees of Startup Weekend DC that I haven’t heard from many government officials: he requested that if they A) use the data and/or B) if they run into any trouble accessing it, to let him know.

“If you had a hard time or found a particular restful API moving, let me know,” he said. “It helps us improve our performance.” And then he gave out his email address at the White House Executive Office of the President, as he did at SXSW Interactive in Austin in March of this year. Asking the public for feedback on data quality — particularly entrepreneurs and developers — and providing contact information to do so is, to put it bluntly, something every city and state official that has stood up and open data platform could and should be doing. In this context, the US CTO has set a notable example for the country.

Examples of startups, gap filling and civic innovation

Following Park, author and Deloitte consultant Bill Eggers talked about innovative startups and the public sector. I’ve embedded video of his talk below:

Eggers cited three different startups in his talk: Recycle Bank, Avego and Kaggle.

1) The outcome of Recycle Bank‘s influence was a 19-fold increase in recycling in some cities from gamification, said Eggers. The startup now has 3 million members and is now setting its sights on New York City.

2) The real-time ridesharing provided by Avego holds the promise to hugely reduce traffic congestion, said Eggers. According to the stats he cited, 80% of people on the road are currently driving in cars by themselves. Avego has raised tens of millions of dollars to try to better optimize transportation.

3) Anthony Goldbloom found a hole in the big data market at Kaggle, said Eggers, where they’re matching data challenges with data scientists. There now some 19,000 registered data scientists in the Kaggle database.

Eggers cited the success of a competition to map dark matter on Kaggle, a problem that had had millions spent on it. The results of open innovation here were better than science had been able to achieve prior to the competition. Kaggle has created a market out of writing better algorithms.

//platform.twitter.com/widgets.jsAfter Eggers spoke, the organizers of Startup Weekend explained how the rest of the weekend would proceed and asked attendees to pitch their ideas. One particular idea, for this correspondent, stood out, primarily because of the young fellows pitching it: