One of the first and best examples of how data could make a difference to ordinary people’s lives was the inspirational PlanningAlerts.com, built by Richard Pope, Mikel Maron, Sam Smith, Duncan Parkes, Tom Hughes and Andy Armstrong.
In doing one simple thing – allowing ordinary people to subscribe to an email alert when there was a planning application near them, regardless of council boundaries – it showed that data mattered, and more than that had the power to improve the interaction between government and the community.
It did so many revolutionary things and fought so many important battles that everyone in the open data world (and not just the UK) owes all those who built it a massive debt of gratitude. Richard Pope and Duncan Parkes in particular put masses of hours writing scrapers, fighting the battle to open postcodes and providing a simple but powerful user experience.
However, over the past year it had become increasingly difficult to keep the site going, with many of the scrapers falling into disrepair (aka scraper rot). Add to that the demands of a day job, and the cost of running a server, and it’s a tribute to both Richard and Duncan that they kept PlanningAlerts going for as long as they did.
So when Richard reached out to OpenlyLocal and asked if we were interested in taking over PlanningAlerts we were both flattered and delighted. Flattered and delighted, but also a little nervous. Could we take this on in a sustainable manner, and do as good a job as they had done?
Well after going through the figures, and looking at how we might architect it, we decided we could – there were parts of the problem that were similar to what we were already doing with OpenlyLocal – but we’d need to make sustainability a core goal right from the get-go. That would mean a business plan, and also a way for the community to help out.
Both of those had been given thought by both us and by Richard, and we’d come to pretty much identical ideas, using a freemium model to generate income, and ScraperWiki to allow the community help with writing scrapers, especially for those councils didn’t use one of the common systems. But we also knew that we’d need to accelerate this process using a bounty model, such as the one that’s been so successful for OpenCorporates.
Now all we needed was the finance to kick-start the whole thing, and we contacted Nesta to see if they were interested in providing seed funding by way of a grant. I’ve been quite critical of Nesta’s processes in the past, but to their credit they didn’t hold this against us, and more than that showed they were capable and eager to working in a fast, lightweight & agile way.
We didn’t quite manage to get the funding or do the transition before Richard’s server rental ran out, but we did save all the existing data, and are now hard at work building PlanningAlerts into OpenlyLocal, and gratifyingly making good progress. The PlanningAlerts.com domain is also in the middle of being transferred, and this should be completed in the next day or so.
We expect to start displaying the original scraped planning applications over the next few weeks, and have already started work on scrapers for the main systems used by councils. We’ll post here, and on the OpenlyLocal and PlanningAlert twitter accounts as we progress.
We’re also liaising with PlanningAlerts Australia, who were originally inspired by PlanningAlerts UK, but have since considerably raised the bar. In particular we’ll be aiming to share a common data structure with them, making it easy to build applications based on planning applications from either source.
As I feared back when it was first announced, the proposed UK Public Data Corporation has got nothing to do with open data, and everything to do with protecting the interests of a few civil servants, turning back the open data clock to the dark ages of derived data and privileged access for the few.
However, the issue I’d like to focus on here, having last week attended a workshop on the PDC consultation is governance. [It's worth mentioning that I was the only one at the workshop without a stake in the existing public sector information structure, telling in itself.] And far from it being a dry, academic, wonkish subject, it is critical to the future of public data in the UK.
The reason this is so contentious is twofold:
- The consultation on the PDC has been drawn very narrowly, trying to get respondants to choose between a set of options that are all bad for open data, and ultimately democracy. “So, open data, would you like a bullet to the back of the head, or to be slowly drained of blood?”
- There are clear conflicts of interest between the wider interests of society, and those of the Shareholder Executive – the trading funds such as the Ordnance Survey and Land Registry who are the very roadblock that open data is supposed to clear, but yet who crucially seem to be driving the PDC.
Now, from their perspective, I can see the appeal of keeping everything cosy and tight, particularly if there’s a chance the organisations being floated off, and with it considerable personal enrichment. But public policy shouldn’t be driven by the personal interests of civil servants, but what is in the interests of society as a whole.
In fact, the governance of the Public Data Corporation, and the rules by which it operates were the one thing that everyone at the workshop I attended agreed upon. In fact more than that, it was agreed that the delivery of its duties should be separate both from the principles by which it operates (which should be for the benefit of society) and the independent body that needs to ensure it sticks to those principles.
But here’s the kicker, the Transition Board for the PDC (which will oversee its membership, structure and governance) is, I understand, meeting on October 25, two days before the consultation ends.
When I asked this meeting, and whether the consultation was a done deal, I was told, “The governance of the PDC is not being consulted on.”
This is both rather shocking, and shameful, and for me means there’s only one viable option if the UK is serious about open data: to send the whole PDC concept back to the drawing board, and this time to come up with a solution that is focused not on civil servants’ narrow personal interests, but on building a ‘more open, more fair and more prosperous‘ society (to quote the Chancellor).
Today, I submitted my response to the UK Cabinet Office’s Open Data Consultation,”Making Open Data Real”, and this is it.
I have been dealing at the sharp end of open data for a couple of years now, co-founding OpenCorporates and founding OpenlyLocal, both of which have massively increased the availability of company and UK local data respectively, and, I hope, in some tiny way have helped give the UK its worldwide reputation of leading the way in open data.
Through sitting on the Local Public Data Panel and countless other government programmes and meetings, I’ve also encountered local and central government bureaucracy in the raw. I’ve seen in detail how too often the bureaucracy subverts complex rules drawn up with the best of intentions to stifle innovation, exclude the most important ‘stakeholders’ of all (the people), and reward those behind big, multimillion-pound projects with promotion and further contracts.
All this experience has, I think, led me to a fairly comprehensive understanding of the issues, the blockages, the hype and the potential of open data. And it is with this understanding that I am responding to the consultation.
The truth is, like it or not, we now live in a ‘Big Data’ world, where our lives are not just governed by data but are data, from bank accounts to loyalty cards, smart phones to smart meters, televisions to travel cards. Even those who have never been on the internet are producing bucketfuls of data as they shop, watch, or catch the bus using free travel cards for the elderly and disabled.
Yet their access to data, both the data they produce and that is produced on their behalf by government and the public sector, is fundamentally restricted. Not only do they have no access to many of the datasets that affects their lives, those who are innovating to help them make sense of it are fatally hobbled by open access to the core public datasets which underly our modern world – for example, geographic data, company data, health data, and democratic & electoral data.
Public sector data is still being treated as an asset to be sold, rather than an underlying infrastructure of a modern democratic society, and with this approach people and the innovators who seek to empower them are marginalised and disenfranchised.
That is why the risk here is not of making changes, but of making no changes, and why what is needed is not a set of rules to be gamed and worked around by the existing ‘stakeholders’ (who after all have a stake in preserving their existing, out-of-date business models), but a core set of principles.
Open data is no silver bullet, and won’t on its own solve these problems, but it is an essential requirement for a ‘more open, more fair and more prosperous‘ society.
Fortunately the consultation provides such a set in Annex 2 of the consultation (The Public Sector Data Principles). These should be issued to every government department, quango, health authority and public sector body (including the PDC), with the order to follow them in letter and spirit. Backing these up, we also need an independent body needs to be appointed with the power and resources to enforce them. With these two things – good public principles, and an effective enforcer – we have a chance to achieve the innovation and fairer society we need.
Chris TaggartCEO & Co-Founder OpenCorporatesFounder OpenlyLocalMember of Local Public Data PanelMember of Mayor of London’s Digital Advisor
This is my presentation to the superb OKCON2011 conference in Berlin last week. It’s obviously openly licensed (CC-BY), so feel free to distribute widely. Comments also welcome.
When the amazing Emer Coleman first approached me a year and a half to get feedback on the plans for the London datastore, I told her that the gold standard for such datastores was that run by the District of Columbia, in the US. It wasn’t just the breadth of the data; it was that DC seemed to have integrated the principles of open data right into its very DNA.
And we had this commitment in mind that when we were thinking which were the US jurisdictions we’d scrape first for OpenCorporates, whose simple (but huge) goal is to build an open global database of every registered company in the world.
While there were no doubt many things that the DC company registry could be criticised for (perhaps it was difficult for the IT department to manage, or problematic for the company registry staff), for the visitors who wanted to get the information it worked pretty well.
What do I mean by worked well? Despite or perhaps because it was quite basic, it meant you could use any browser (or screenreader, for those with accessibility issues) to search for a company and to get the information about it.
It also had a simple, plain structure, with permanent URLs for each company, meaning search engines could easily find the data, so that if you search for a company name on Google there’s a pretty good chance you’ll get a link to the right page. This also means other websites can ‘deep-link’ to the specific company, and that links could be shared by people, in social networking, emails, whatever.
Finally, it meant that it was easy to get the information out of the register, by browsing or by scraping (we even used the scraper we wrote on ScraperWiki as an example of how to scrape a straightforward company register as part of our innovative bounty program).
It was, for the most part, what a public register should be, with the exception of providing a daily dump of the data under an open licence.
So it was a surprise a couple of weeks ago to find that they had redone the website, and taken a massive step back, essentially closing the site down to half the users of the web, and to those search engines and scrapers that wanted to get the information in order to make it more widely available.
In short it went from being pretty much open, to downright closed. How did they do this? First they introduced a registration system. Now, admittedly, it’s a pretty simple registration process, and doesn’t require you to submit any personal details. I registered as ‘Bob’ with a password of ‘password’ just fine. But as well as adding friction to the user experience, it also puts everything behind the signup out of the reach of search engines. Really dumb. Here’s the Google search you get now (a few weeks ago there were hundreds of thousands of results):
The other key point about adding a registration system is that the sole reason is to be able to restrict access to certain users. Let me repeat that, because it goes to the heart of the issue about openness and transparency, and why this is a step back from both by the District of Columbia: it allows them to say who can and can’t see the information.
If open data and transparency is about anything, it’s about giving equal access to information no matter who you are.
The second thing they did was build a site that doesn’t work for those who don’t use Internet Explorer 7 and above, including those who used screenreaders. That’s right. In the year 2011, when even Microsoft are embracing web standards, they decided to ditch them, and with them nearly half the web’s users, and all those who used screenreaders (Is this even allowed? Not covered by Americans With Disabilities Act?).
In the past couple of weeks, I’ve been in an email dialogue with the people in the District of Columbia behind the site, to try to get to the bottom of this, and the bottom seems to be, that the accessibility of the site, the ability for search engines to index it, and for people to reuse the data isn’t a priority.
In particular it isn’t a priority compared with satisfying the needs of their ‘customers’, meaning those companies that file their information (and perhaps more subtly those companies whose business models depend on the data being closed). Apparently some of the companies had complained that they were being listed, contacted and or solicited without their approval.
That’s right, the companies on the public register were complaining that their details were public. Presumably they’d really rather nobody had this information. We’re talking about companies here, remember, who are supposed to thrive or fail in the brutal world of the free market, not vulnerable individuals.
It’s worth mentioning here that this tendency to think that the stakeholders (hate that word) are those you deal with day-to-day is a pervasive problem in government in all countries, and is one of the reasons why they are failing to benefit from open data the way they should and failing too to retool and restructure for the modern world.
Sure, we can work around these restrictions and probably figure out a way to scrape the data, but it’s a sad day to see one of the pioneers of openness and transparency take such a regressive step. What’s next? Will the DC datastore take down its list of business licence holders, or maybe the DC purchase order data, all of which could be used for making unsoliticited requests to these oversensitive and easily upset businesses?
p.s. Apparently this change was in response to an audit report, which I’ve asked for a copy of but which hasn’t yet been sent to me. Any sleuthing or FOI requests gratefully received.
p.p.s. I also understand there’s also new DC legislation that’s been recently been passed that require further changes to the website, although again the details weren’t given to me, and I haven’t had time to search the DC website for them
As a bit of an outsider, reading the government’s pronouncements on open data feels rather like reading official Kremlin statements during the Cold War. Sometimes it’s not what they’re saying, it’s who’s saying it that’s important.
And so it is, I think, with George Osborne’s speech yesterday morning at Google Zeitgeist, at which he stated, “Our ambition is to become the world leader in open data, and accelerate the accountability revolution that the internet age has unleashed“, and “The benefits are immense. Not just in terms of spotting waste and driving down costs, although that consequence of spending transparency is already being felt across the public sector. No, if anything, the social and economic benefits of open data are even greater.“
This is strong, and good stuff, and that it comes from Osborne, who’s not previously taken a high profile position on open data and open government, leaving that variously to the Cabinet Office Minister, Francis Maude, Nick Clegg & even David Cameron himself.
It’s also intriguing that it comes in the apparent burying of the Public Data Corporation, which got just a holding statement in the budget, and no mention at all in Osborne’s speech.
But more than that it shows the Treasury taking a serious interest for the first time, and that’s both to be welcomed, and feared. Welcomed, because with open data you’re talking about sacrificing the narrow interests of small short-term fiefdoms (e.g. some of the Trading Funds in the Shareholder Executive) for the wider interest; you’re also talking about building the essential foundations for the 21st century. And both of these require muscle and money.
It also overseas a number of datasets which have hitherto been very much closed data, particularly the financial data overseen by the Financial Services Authority, the Bank of England and even perhaps some HMRC data, and I’ve started the ball rolling by scraping the FSA’s Register of Mutuals, which we’ve just imported into OpenCorporates, and tying these to the associated entries in the UK Register of Companies.
Feared, because the Treasury is not known for taking prisoners, still less working with the community. And the fear is that rather than leverage the potential that open data allows for a multitude of small distributed projects (many of which will necessarily and desirably fail), rather than use the wealth of expertise the UK has built up in open data, they will go for big, highly centralised projects.
I have no doubt, the good intentions are there, but let’s hope they don’t do a Team America here (and this isn’t meant as a back-handed reference to Beth Noveck, who I have a huge amount of respect for, and who’s been recruited by Osborne), and destroy the very thing they’re trying to save.
Like buses, you wait ages for local councils to publish their spending data, then a whole load come at once… and consequently OpenlyLocal has been importing the data pretty much non-stop for the past month or so.
We’ve now imported spending data for over 140 councils with more being added each day, and now have over a million and a half payments to suppliers, totalling over £10 billion. I think it’s worth repeating that figure: Ten Billion Pounds, as it’s a decent chunk of change, by anybody’s measure (although it’s still only a fraction of all spending by councils in the country).
Along with that we’ve also made loads of improvements to the analysis and data, some visible, other not so much (we’ve made loads of much-needed back-end improvements now that we’ve got so much more data), and to mark breaking the £10bn figure I thought it was worth starting a series of posts looking at the spending dataset.
Let’s start by having a look at those headline figures (we’ll be delving deeper into the data for some more heavyweight data-driven journalism over the next few weeks):
144 councils. That’s about 40% of the 354 councils in England (including the GLA). Some of the others we just haven’t yet imported (we’re adding them at about 2 a day); others have problems with the CSV files they are publishing (corrupted or invalid files, or where there’s some query about the data itself), and where there’s a contact email we’ve notified them of this.
The rest are refusing to publish the CSV files specified in the guidelines, deciding to make it difficult to automatically import by publishing an Excel file or, worse, a PDF (and here I’d like to single out Birmingham council, the biggest in the UK, which shamefully is publishing it’s spending only as a PDF, and even then with almost no detail at all. One wonders what they are hiding).
£10,184,169,404 in 1,512,691 transactions. That’s an average transaction value of £6,732 per payment. However this is not uniform across councils, varying from an average transaction value of £669 for Poole to £46,466 for Barnsley. (In future posts, I’ll perhaps have a look at using the R statistical language to do some histograms on the data, although I’d be more than happy if someone beat me to that).
194,128 suppliers. What does this mean? To be accurate, this is the total number of supplying relationships between the councils and the companies/people/things they are paying.
Sometimes a council may have (or appear to have) several supplier relationships with the same company (charity/council/police authority), using different names or supplier IDs. This is sometimes down to a mistake in keying in the data, or for internal reasons, but either way it means several supplier records are created. It’s also worth noting that redacted payments are often grouped together as a single ‘supplier’, as the council may not have given any identifier to show that a redacted payment of £50,000 to a company (and in general there’s little reason to redact such payments) is to a different recipient than a redacted payment of £800 to a foster parent, for example.
However, using some clever matching and with the help of the increasing number of users who are matching suppliers to companies/charities and other entities on OpenlyLocal (just click on ‘add info’ when you’re looking at a supplier you think you can match to a company or charity)., we’ve matched about 40% of these to real-world organisations such as companies and charities.
While that might not seem very high, a good proportion of the rest will be sole-traders, individuals, or organisations we’ve not yet got a complete list of (Parish and Town councils, for example). And what it does mean is we can start to get a first draft of who supplies local government. And this is what we’ve got:
66,165 companies, with total payments of £3,884,271,203 (£3.88 billion), 38.1% of the total £10bn, in 579,518 transactions, making an average payment of £6,702.
8,236 charities, with total payments of £415,878,177, 4.1% of the total, in 55,370 transactions, making an average payment of £7,511.
Next time, we’ll look at the company suppliers in a little more detail, and later on the charities too, but for the moment, as you can see we’re listing the top 20 matched indivudual companies and charities that supply local government. Bear in mind a company like Capita does business with councils through a variety of different companies, and there’s no public dataset of the relationships between the companies, but that’s another story.
Finally, the whole dataset is available to download as open data under the same share-alike attribution licence as the rest of OpenlyLocal, including the matches to companies/charities that are receiving the money (the link is at the bottom of the Council Spending Data Dashboard). Be warned, however, it’s a very big file (there’s a row for every transaction), and so is too big for Excel (or even Google Fusion tables for that matter), so it’s most use to those using a database, or doing academic research.
* Note: there are inevitably loads of caveats to this data, including that councils are (despite the guidance) publishing the data in different ways, including, occasionally, aggregating payments, and using over-aggressive redaction. It’s also, obviously, only 40% of the councils in England., although that’s a pretty big sample size. Finally there may be errors both in the data as published, and in the importing of it. Please do let us know at firstname.lastname@example.org if you see any errors, or figures that just look wrong.