countculture

Open data and all that

New feature: one-click FoI requests for spending payments

with 4 comments

Thanks to the incredible work of Francis Irving at WhatDoTheyKnow, we’ve now added a feature I’ve wanted on OpenlyLocal since we started imported the local spending data: one-click Freedom of Information requests on individual spending items, especially those large ones.

This further lowers the barriers to armchair auditors wanting to understand where the money goes, and the request even includes all the usual ‘boilerplate’ to help avoid specious refusals. I’ve started it off with one to Wandsworth, whose poor quality of spending data I discussed last week.

And this is the result, the whole process having taken less than a minute:

The requests are also being tagged. This means that in the near future you’ll be able to see on a transaction page if any requests have already been made about it, and the status of those requests (we’re just waiting for WDTK to implement search by tags), which will be the beginning of a highly interconnected transparency ecosystem.

In the meantime it’s worth checking the transaction hasn’t been requested before confirming your request on the WDTK page (there’s a link to recent requests for the council on the WDTK form you get to after pressing the button).

I’m also trusting the community will use this responsibly, digging out information on the big stuff, rather than firing off multiple requests to the same council for hundreds of individual items (which would in any case probably be deemed vexatious under the terms of the FoI Act). At the moment the feature’s only enabled on transactions over £10,000.

Good places to start would be those multi-million-pound monthly payments which indicate big outsourcing deals, or large redacted payments (Birmingham’s got a few). Have a look at the spending dashboard for your council and see if there are any such payments.

A simple demand: let us record council meetings

with 16 comments

A couple of months ago we had the ridiculous situation of a local council hauling up one of their councillors in front of a displinary hearing for posting videos of the council meeting on YouTube.

The video originated from the council’s own webcasts, and the complaint by Councillor Kemble was that in posting these videos on YouTube, another councillor, Jason Kitcat

(i) had failed to treat his fellow councillors with respect, by posting the clips without the prior knowledge or express permission of Councillor Theobald or Councillor Mears; and
(ii) had abused council facilities by infringing the copyright in the webcast images

and in doing so had breached the Members Code of Conduct.

Astonishingly, the standards committee found against Kitcat and ruled he should be suspended for up to six months if he does not write an apology to Cllr Theobald and submit to re-training on the roles and responsibilities of being a councillor, and it is only the fact that he is appealing to the First-Tier Tribunal (which apparently the council has decided to fight using hire outside counsel) that has allowed him to continue.

It’s worth reading the investigator’s report (PDF, of course) in full for a fairly good example of just how petty and ridiculous these issues become, particularly when the investigator writes things such as:

I consider that Cllr Kitcat did use the council’s IT facilities improperly for political purposes. Most of the clips are about communal bins, a politically contentious issue at the time. The clips are about Cllr Kitcat holding the administration politically to account for the way the bins were introduced, and were intended to highlight what the he believed were the administration’s deficiencies in that regard, based on feedback from certain residents.
Most tellingly, clip no. 5 shows the Cabinet Member responsible for communal bins in an unflattering and politically unfavourable light, and it is hard to avoid the conclusion that this highly abridged clip was selected and posted for political gain.

The using IT facilities, refers, by the way, not to using the council’s own computers to upload or edit the videos (it seems agreed by all that he used his own computer for this), but the fact that the webcasts were made and published on the web using the council’s equipment (or at least those of its supplier, Public-i). Presumably it he’d taken an extract from the minutes of a meeting published on the council’s website that would also have been using the council’s IT resources.

However, let’s step back a bit. This, ultimately, is not about councillors not understanding the web, failing to get get new technology and the ways it can open up debate. This is not even about the somewhat restrictive webcasting system which apparently only has the past six month’s meetings and is somewhat unpleasant to use (particularly if you use a Mac, or Linux — see a debate of the issues here).

This is about councillors failing to understand democracy, about the ability to taking the same material and making up your own mind, and critically trying to persuade others of that view.

In fact the investigator’s statement above, taking “a politically contentious issue at the time… holding the administration politically to account for the way the bins were introduced… to highlight what the he believed were the administration’s deficiencies in that regard” is surely a pretty good benchmark for a democracy.

So here’s simple suggestion for those drawing up the local government legislation at the moment, no let’s make that a demand, since that’s what it should be in a democracy (not a subservient request to your ‘betters’):

Give the public the right to record any council meeting using any device using Flip cams, tape recorders, frankly any darned thing they like as long as it doesn’t disrupt the meeting.

Not only would this open up council meetings and their obscure committees to wider scrutiny, it would also be a boost to hyperlocal sites that are beginning to take the place of the local media.

And if councils want to go to the expense of webcasting their meetings, then require them to make the webcasts available to download under an open licence. That way people can share them, convert them into open formats that don’t require proprietary software, subtititle them, and yes, even post them on YouTube.

I can already hear local politicians saying it will reduce the quality of political discourse, that people may use it in ways they don’t like and can’t control.

Does this seem familiar? It should. It’s the same arguments being given against publishing raw data. The public won’t understand. There may be different interpretations. How will people use it?

Well, folks that’s the point of a democracy. And that’s the point of a data democracy. We can use it in any way we damn well please. The public record is not there to make incumbent councillors or senior staff memebers look good. It’s there to allow the to be held to account. And to allow people to make up their own minds. Stop that, and you’re stopping democracy.

Links: For more posts relating to this case, see also Jason Kitcat’s own blog postsBrighton Argus post, and posts form Mark Pack at Liberal Democrat voice, Jim Killock,  Conservative Home, and even a tweet from Local Government minister Grant Shapps.

Written by countculture

September 27, 2010 at 12:46 pm

Making OpenCharities even better… more features, more data, more charities

with 8 comments

I had a fantastic response to the launch of OpenCharities — my little side project to open up the Charity Commission’s Register of Charities — from individuals, from organisations representing the third sector, and from charities themselves.

There were also a few questions:

  • Could we pull out and expose via the api more info about the charities, especially the financial history?
  • How often would OpenCharities be updated and what about new charities added after we’d scraped the register?
  • Was there any possibility that we could add additional information from sources other than the Charity Register?

So, over the past week or so, we’ve been busy trying to answer those questions the best we could, mainly by just trying to get on and solve them.

First, additional info. After a terrifically illuminating meeting with Karl and David from NCVO, I had a much better idea of how the charity sector is structured, and what sort of information would be useful to people.

So the first thing I did was to rewrite the scraper and parser to pull in a lot more information, particularly the past 5 years income and spending and, for bigger charities the breakdown of that income and spending. (I also pulled in the remaining charities that had been missed the first time around, including removed charities.) Here’s what the NSPCC’s entry, for example, looks like now:

Example of financial info for charity

We are also now getting the list of trustees, and links to the accounts and Summary Information Returns, as there’s all sorts of goodness locked up in those PDFs.

However, while we running through the all these charities, we wondered if any of them had social networking info easily available (i.e. on their front page). It turns out some of the bigger ones did, and so we visited their sites and pulled out that info (it’s fairly easy to look for links for twitter/facebook/youtube etc on a home page). Here’s an example social networking info, again for the NSPCC.
Social networking info for charities

[Incidentally, doing this threw up some errors in the Charity Register, most commonly websites that are listed as http://http://some.charity.org.uk, which in itself shows the benefit of opening up the data. All we need now is a way of communicating that to the Charity Commission.]

We also (after way too many hours wasted messing around with cookies and hidden form fields) figured out how to get the list of charities recently added, with the result that we can check every night for new charities added in the past 24 hours, and add those to the database.

Latest charities added to register

This means not only can we keep OpenCharities up to date, it also means we can offer an RSS feed of the latest charities. And if that’s updated a bit too frequently for you (some days there are over 20 charities added), you can always restrict to a given search term, e.g http://OpenCharities/charities.rss?term=children for those charities with children in the title.

Finally, we’ve been looking at what other datasets we could link with the register, and I thought  a good one might be the list of grants given out by the various National Lottery funding bodies (which fortunately had already been scraped by the very talented Julian Todd using ScraperWiki).

Then it was a fairly simple matter of tying together the recipients with the register, and voila, you have something like this:

Example of National Lottery grant info for a charity

Note, at the time of writing, the import and match of the data is still going on, but should be finished by the end of today.

We’ll also add some simple functionality to show payments from local councils that’s being published in the local council spending data. The information’s already in the database (and is actually shown on the OpenlyLocal page for the charity); I just haven’t got around to displaying it on OpenCharities yet. Expect that to appear in the next day or so.

C

p.s. Big thanks to @ldodds and @pigsonthewing for helping with the RDF and microformats respectively

Written by countculture

September 23, 2010 at 2:40 pm

Drawing up the Local Spending Data guidelines… and how Google Docs saved the day

with 2 comments

Last Thursday, the Local Public Data Panel on which I sit approved the final draft of the guidelines for publishing by councils of their spending over £500 (version 1.0 if you like). These started back in June, with a document Will Perrin and I drew up in response to a request from Camden council, and attracted a huge number of really helpful comments.

Since then, things have moved on a bit. The loose guidelines were fine as a starting point, especially as at that time we were talking theoretically, and hadn’t really had any concrete situations or data to deal with, but from speaking to councils, and actually using the data it became clear the something much firmer was needed.

What followed then was the usual public sector drafting nightmare, with various Word documents being emailed around, people getting very territorial, offline conversations, and frankly something that wasn’t getting very far.

However, a week beforehand I’d successfully used a shared Google Spreadsheet to free up a similar problem. In that case there were a bunch of organisations (including OpenlyLocal, the Local Government Association and Department for Communities and Local Government) that needed an up-to-date list of councils publishing spending data, together with the licence, URL and whether it was machine-readable (Basically what Adrian Short was doing here at one time – I’d asked him if he wanted to do it, but he didn’t have the time to keep his up-to-date.) In addition, it was clear that we each knew about councils the others didn’t.

The answer could have been a dedicated web app, a Word document that was added to and emailed around (actually that’s what started to happen). In the end, it was something much simpler – a Google spreadsheet with edit access given to multiple people. I used the OpenlyLocal API to populate the basic structure (including OpenlyLocal URLs, which mean that anyone getting the data via the API, or as a CSV would have a place they could query for more data), and bingo, it was sorted.

So given this success, Jonathan Evans from the LGA and  I agreed to use the Google Docs approach with the spending guidelines. There are multiple advantages to this, but some are particularly relevant for tackling such a problem:

  • We can all work on the document at the same time, messaging each others as we go, avoiding the delays, arguments and territoriality of the document emailing approach.
  • The version tracking means that all your changes, not just those of the saved version are visible to all participants (and to people who subsequently become participants). This seems to lead to a spirit of collaboration rather than position-taking, and at least on this occasion avoided edit-wars.
  • The world can see the product of your work, without having to separately publish it (though see note below)

You can also automatically get the information as data, either through the Google Docs API or more likely in the case of a spreadsheet particularly, as a CSV file. Construct it with this in mind (i.e. 1 header row), and you’ve got something that can be instantly used in mashups and visualisations.

    Important note 1: The biggest problem with this approach in central government is Internet Explorer 6, which the Department of Communities & Local Government are stuck on and have no plans to upgrade. This means the approach only works when people are prepared to make the additions at home, or some other place that have a browser less than 9 years old.

    Important note 2: Despite having put together the spending scoreboard spreadsheet, we were hopeless at telling the wider world about it, meaning that Simon Rogers at the Guardian ended up duplicating much of the work. Interestingly he was missing some that we knew about, and vice versa, and I’ve offered him edit access to the main spreadsheet so we can all work together on the same one.

    Important note 3: A smaller but nevertheless irritating problem with Google Documents (and this seems to be true of Word and OpenOffice too) is that when they contain tables you get a mess of inaccessible HTML, with the result that when the spending guidance was put on the Local Public Data Panel website, the HTML had to be largely rewritten from scratch (by one of the data.gov.uk stars late at night). So Google, if you’re listening, please allow an option to export as accessible HTML.

    Written by countculture

    September 13, 2010 at 8:24 am

    Introducing OpenCharities: Opening up the Charities Register

    with 75 comments

    A couple of weeks ago I needed a list of all the charities in the UK and their registration numbers so that I could try to match them up to the local council spending data OpenlyLocal is aggregating and trying to make sense of. A fairly simple request, you’d think, especially in this new world of transparency and open data, and for a dataset that’s uncontentious.

    Well, you’d be wrong. There’s nothing at data.gov.uk, nothing at CKAN and nothing on the Charity Commission website, and in fact you can’t even see the whole register on the website, just the first 500 results of any search/category. Here’s what the Charities Commission says on their website (NB: extract below is truncated):

    The Commission can provide an electronic copy in discharge of its duty to provide a legible copy of publicly available information if the person requesting the copy is happy to receive it in that form. There is no obligation on the Commission to provide a copy in this form…

    The Commission will not provide an electronic copy of any material subject to Crown copyright or to Crown database right unless it is satisfied… that the Requestor intends to re-use the information in an appropriate manner.

    Hmmm. Time for Twitter to come to the rescue to check that some other independently minded person hasn’t already solved the problem. Nothing, but I did get pointed to this request for the data to be unlocked, with the very recent response by the Charity Commission, essentially saying, “Nope, we ain’t going to release it”:

    For resource reasons we are not able to display the entire Register of Charities. Searches are therefore limited to 500 results… We cannot allow full access to all the data, held on the register, as there are limitations on the use of data extracted from the Register… However, we are happy to consider granting access to our records on receipt of a written request to the Departmental Record Officer

    OK, so it seems as though they have no intention of making this data available anytime soon (I actually don’t buy that there are Intellectual Property or Data Privacy issues with making basic information about charities available, and if there really are this needs to be changed, pronto), so time for some screen-scraping. Turns out it’s a pretty difficult website to scrape, because it requires both cookies and javascript to work properly.

    Try turning off both in your browser, and see how far you get, and then you’ll also get an idea of how difficult it is to use if you have accessibility issues – and check out their poor excuse for accessibility statement, i.e. tough luck.

    Still, there’s usually a way, even if it does mean some pretty tortuous routes, and like the similarly inaccessible Birmingham City Council website, this is just the sort of challenge that stubborn so-and-so’s like me won’t give up on.

    And the way to get the info seems to be through the geographical search (other routes relied upon Javascript), and although it was still problematic, it was doable. So, now we have an open data register of charities, incorporated into OpenlyLocal, and tied in to the spending data being published by councils.

    Charity supplier to Local authority

    And because this sort of thing is so easy, once you’ve got it in a database (Charity Commission take note), there are a couple of bonuses.

    First, it was relatively easy to knock up a quick and very simple Sinatra application, OpenCharities:

    Open Charities :: Opening up the UK Charities Register

    If there’s any interest, I’ll add more features to it, but for now, it’s just a the simplest of things, a web application with a unique URL for every charity based on its charity number, and with the  basic information for each charity is available as data (XML, JSON and RDF). It’s also searchable, and sortable by most recent income and spending, and for linked data people there are dereferenceable Resource URIs.

    This is very much an alpha application: the design is very basic and it’s possible that there are a few charities missing – for two reasons. One: the Charity Commission kept timing out (think I managed to pick up all of those, and they should get picked up when I periodically run the scraper); and two: there appears to be a bug in the Charity Commission website, so that when there’s between 10 and 13 entries, only 10 are shown, but there is no way of seeing the additional ones. As a benchmark, there are currently 150,422 charities in the OpenCharities database.

    It’s also worth mentioning that due to inconsistencies with the page structure, the income/spending data for some of the biggest charities is not yet in the system. I’ve worked out a fix, and the entries will be gradually updated, but only as they are re-scraped.

    The second bonus is that the entire database is available to download and reuse (under an open, share-alike attribution licence). It’s a compressed CSV file, weighing in at just under 20MB for the compressed version, and should probably only attempted by those familiar with manipulating large datasets (don’t try opening it up in your spreadsheet, for example). I’m also in the process of importing it into Google Fusion Tables (it’s still churning away in the background) and will post a link when it’s done.

    Now, back to that spending data.

    Written by countculture

    September 6, 2010 at 1:15 pm

    A Local Spending Data wish… granted

    with 25 comments

    The very wonderful Stuart Harrison (aka pezholio), webmaster at Lichfield District Council, blogged yesterday with some thoughts about the publication of spending data following a local spending data workshop in Birmingham. Sadly I wasn’t able to attend this, but Stuart gives a very comprehensive account, and like all his posts it’s well worth reading.

    In it he made an important observation about those at the workshop who were pushing for linked data from the beginning, and wished there was a solution. First the observation:

    There did seem to be a bit of resistance to the linked data approach, mainly because agreeing standards seems to be a long, drawn out process, which is counter to the JFDI approach of publishing local data… I also recognise that there are difficulties in both publishing the data and also working with it… As we learned from the local elections project, often local authorities don’t even have people who are competent in HTML, let alone RDF, SPARQL etc.

    He’s not wrong there. As someone who’s been publishing linked data for some time, and who conceived and ran the Open Election Data project Stuart refers to, working with numerous councils to help them publish linked data I’m probably as aware of the issues as anyone (ironically and I think significantly none of the councils involved in the local government e-standards body, and now pushing so hard for the linked data, has actually published any linked data themselves).

    That’s not to knock linked data – just to be realistic about the issues and hurdles that need to be overcome (see the report for a full breakdown), and that to expect all the councils to solve all these problems at the same time as extracting the data from their systems, removing data relating to non-suppliers (e.g. foster parents), and including information from other systems (e.g. supplier data, which may be on procurement systems), and all by January, is  unrealistic at best, and could undermine the whole process.

    So what’s to be done? I think the sensible thing, particularly in these straitened times, is to concentrate on getting the raw data out, and as much of it as possible, and come down hard on those councils who publish it badly (e.g. by locking it up in PDFs or giving it a closed licence), or who willfully ignore the guidance (it’s worrying how few councils publishing data at the moment don’t even include the transaction ID or date of the transaction, never mind supplier details).

    Beyond that we should take the approach the web has always done, and which is the reason for its success: a decentralised, messy variety of implementations and solutions that allows a rich eco-system to develop, with government helping solve bottlenecks and structural problems rather than trying to impose highly centralised solutions that are already being solved elsewhere.

    Yes, I’d love it if the councils were able to publish the data fully marked up, in a variety of forms (not just linked data, but also XML and JSON), but the ugly truth is that not a single council has so far even published their list of categories, never mind matched it up to a recognised standard (CIPFA BVACOP, COFOG or that used in their submissions to the CLG), still less done anything like linked data. So there’s a long way to go, and in the meantime we’re going to need some tools and cheap commodity services to bridge the gap.

    [In a perfect world, maybe councils would develop some open-source tools to help them publish the data, perhaps using something like Adrian Short’s Armchair Auditor code as the basis (this is a project that took a single council, WIndsor & Maidenhead, and added a web interface to the figures). However, when many councils don’t even have competent HTML skills (having outsourced much of it), this is only going to happen at a handful of councils at best, unless considerable investment is made.]

    Stuart had been thinking along similar lines, and made a suggestion, almost a wish in fact:

    I think the way forward is a centralised approach, with authorities publishing CSVs in a standard format on their website and some kind of system picking up these CSVs (say, on a monthly basis) and converting this data to a linked data format (as well as publishing in vanilla XML, JSON and CSV format).

    He then expanded on the idea, talking about a single URL for each transaction, standard identifiers, “a human-readable summary of the data, together with links to the actual data in RDF, XML, CSV and JSON”. I’m a bit iffy about that ‘centralised approach’ phrase (the web is all about decentralisation), but I do think there’s an opportunity to help both the community and councils by solving some of these problems.

    And  that’s exactly what we’ve done at OpenlyLocal, adding the data from all the councils who’ve published their spending data, acting as a central repository, generating the URLs, and connecting the data together to other datasets and identifiers (councils with Snac IDs, companies with Companies House numbers). We’ve even extracted data from those councils who unhelpfully try to lock up their data as PDFs.

    There are at time of writing 52,443 financial transactions from 9 councils in the OpenlyLocal database. And that’s not all, there’s also the following features:

    • Each transaction is tied to a supplier record for the council, and increasingly these are linked to company info (including their company number), or other councils (there’s a lot of money being transferred between councils), and users can add information about the supplier if we haven’t matched it up.
    • Every transaction, supplier and company has a permanent unique URL and is available as XML and JSON
    • We’ve sorted out some of the date issues (adding a date fuzziness field for those councils who don’t specify when in the month or quarter a transaction relates to).
    • Transactions are linked to the URL from which the file was downloaded (and usually the line number too, though obviously this is not possible if we’ve had to extract it from a PDF), meaning anyone else can recreate the dataset should they want to.
    • There’s an increasing amount of analysis, showing ordinary users spending by month, biggest suppliers and transactions, for example.
    • The whole spending dataset is available as a single, zipped CSV file to download for anyone else to use.
    • It’s all open data.

    There are a couple of features Stuart mentions that we haven’t yet implemented, for good reason.

    First, we’re not yet publishing it as linked data, for the simple reason that the vocabulary hasn’t yet been defined, nor even the standards on which it will be based. When this is done, we’ll add this as a representation.

    And although we use standard identifiers such as SNAC ids for councils (and wards) on OpenlyLocal, the URL structure Stuart mentions is not yet practical, in part because SNAC ids doesn’t cover all authorities (doesn’t include the GLA, or other public bodies, for example), and only a tiny fraction of councils are publishing their internal transaction ids.

    Also we haven’t yet implemented comments on the transactions for the simple reason that distributed comment systems such as Disqus are javascript-based and thus are problematic for those with accessibility issues, and site-specific ones don’t allow the conversation to be carried on elsewhere (we think we might have a solution to this, but it’s at an early stage, and we’d be interested to hear other idea).

    But all in all, we reckon we’re pretty much there with Stuart’s wish list, and would hope that councils can get on with extracting the raw data, publishing it in an open, machine-readable format (such as CSV), and then move to linked data as their resources allow.

    Written by countculture

    August 3, 2010 at 7:45 am

    Local Spending in OpenlyLocal: what features would you like to see?

    with 2 comments

    As I mentioned in a previous post, OpenlyLocal has now started importing council local spending data to make it comparable across councils and linkable to suppliers. We now added some more councils, and some more features, with some interesting results.

    As well as the original set of Greater London Authority, Windsor & Maidenhead and Richmond upon Thames, we’ve added data from Uttlesford, King’s Lynn & West Norfolk and Surrey County Council (incidentally, given the size of Uttlesford and of King’s Lynn & West Norfolk, if they publish this data, any council should be able to).

    We’ve also added a basic Spending Dashboard, to give an overview of the data we’ve imported so far:

    Of course the data provided is of variable quality and in various formats. Some, like King’s Lynn & Norfolk are in simple, clean CSV files. Uttlesford have done it as a spreadsheet with each payment broken down to the relevant service, which is a bit messy to import but adds greater granularity than pretty much any other council.

    Others, like Surrey, have taken the data that should be in a CSV file and for no apparent reason have put it in a PDF, which can be converted, but which is a bit of a pain to do, and means maunal intervention to what should be a largely automatic process (challenge for journos/dirt-hunters: is there anything in the data that they’d want to hide, or is it just pig-headedness).

    But now we’ve got all that information in there we can start to analyse it, play with it, and ask questions about it, and we’ve started off by showing a basic dashboard for each council.

    For each council, it’s got total spend, spend per month, number of suppliers & transactions, biggest suppliers and biggest transactions. It’s also got the spend per month (where a figure is given for a quarter, or two-month period, we’ve averaged it out over the relevant months). Here, for example, is the one for the Greater London Authority:

    Lots of interesting questions here, from getting to understand all those leasing costs paid via the Amas Ltd Common Receipts Account, to what the £4m paid to Jack Morton Worldwide (which describes itself as a ‘global brand experience agency’) was for. Of course you can click on the supplier name for details of the transactions and any info that we’ve got on them (in this case it’s been matched to a company – but you can now submit info about a company if we haven’t matched it up).

    You can then click on the transaction to find out more info on it, if that info was published, but which is perhaps the start of an FoI request either way:

    It’s also worth looking at the Spend By Month, as a raw sanity-check. Here’s the dashboard for Windsor & Maidenhead:

    See that big gap for July & August 09. My first thought was that there was an error with importing the data, which is perfectly possible, especially when the formatting changes frequently as it does in W&M’s data files, but looking at the actual file, there appear to be no entries for July & August 09 (I’ve notified them and hopefully we’ve get corrected data published soon). This, for me, is one of the advantages of visualizations: being able to easily spot anomalies in the data, that looking at tables or databases wouldn’t show.

    So what further analyses would you like out of the box: average transaction size, number of transactions over £1m, percentage of transactions for a round number (i.e. with a zero at the end),  more visualizations? We’d love your suggestions – please leave them in the comments or tweet me.

    Written by countculture

    July 26, 2010 at 9:44 am

    Some progress on the Local Spending/Spikes Cavell issue

    with 5 comments

    Yesterday I was invited to a meeting at the Department for Communities and Local Government with the key players in the local spending/Spikes Cavell issue that I’ve written about previous (see The open data that isn’t and Update on the local spending data scandal… the empire strikes back).

    The meeting included Luke Spikes from Spikes Cavell as well as Andrew Larner from IESE (the Regional Improvement and Efficiency Partnership for the South East), which helped set up the deal, as well as myself and Nigel Shadbolt, who chairs the Local Public Data Panel and sits on the government’s Transparency Board. I won’t go into all the details, but the meeting was cordial and constructive, produced a lot of information about how the deal works and also potentially made progress in terms of solving some of the key issues.

    We can now, for example, start to understand the deal – it’s called the Transform project – which as I understand it is a package deal to take raw information from the councils accounts and other systems (e.g. purchase & procurement systems) to SC’s specification, clean up and depersonalise the data, then analyse to show the councils potential savings/improvements, and finally to publish a cut of this information on the Spotlight on Spend website. Essentially we have this:

    There are still some details missing from this picture – we haven’t yet seen the Memorandum of Understanding which frames the deal, nor the specification of the raw information that is provided to Spikes Cavell, but we have been promised both of these imminently. This last one in particular will be very useful as it will allow us to refine the advice we are giving councils about the data they should be publishing in order to make the spending information useful and comparable (it’s not been suggested previously, for example, that it would be useful to include details from the council’s procurement systems, though in hindsight this makes a lot of sense).

    Crucially, it was also agreed that all the input data into Spikes Cavell’s proprietary systems (the ‘Cleaned-up but non-proprietary data’ in the diagram above) would be published, so the wider community would be on the same footing as Spikes Cavell as far as access to the raw data goes. This is crucial and worth repeating: it means that anyone else will have access to the same base data as Spikes Cavell, and the playing field is therefore pretty much level.

    There are still issues to be sorted out, the chief of which is that while Spikes Cavell is happy to publish the raw data under a completely open licence, they will require the OK of the council to do so. (However, armed with this knowledge it will be easy to identify those councils that refuse, and then possible to tackle them either through persuasion or ultimately legislation.)

    The other issues are, briefly: liability for depersonalising the data; where the data is published (I think it should be on the council’s own website or a data.gov.uk, or for London councils the London Datastore, not on the Spotlight On Spend website); whether the Spotlight On Spend website itself is necessary and cost-effective (it’s impossible to know how much it costs as it’s bundled in with the whole deal); and whether the data-cleansing should be stripped out from the rest of the deal.

    However, it’s worth saying that this agreement goes beyond just the member councils of the IESE, but to all councils that in the future use a similar agreement (obviously it’s ultimately up to them, but certainly this was the wish of everyone at the meeting).

    Finally, I’d like to thank Andrew Larner at IESE for his open approach, and for Spikes Cavell for their willingness to engage. What we have here isn’t perfect (and I still fundamentally believe that councils should be doing the cleansing and publishing of the data themselves, and exchanging that knowledge with other councils and using it to improve their own data processes), but it’s a big step forward in genuinely opening up raw council data.

    Update: The official notes of the meeting have now been published on the Local Public Data panel blog: http://data.gov.uk/blog/local-public-data-panel-%E2%80%93-sub-group-meeting-spotlight-spend-20-july-2010

    Written by countculture

    July 20, 2010 at 10:57 pm

    Update on the local spending data scandal… the empire strikes back

    with 5 comments

    My blog post on Friday about the local spending information, the open data that isn’t, and the agreements that some councils seem to have struck with Spikes Cavell raised a flurry of tweets, emails, and a reassuringly fast response from the government’s Transparency Board.

    It also, I’m told, generated a huge number of emails among the main protagonists – local and central government bureaucrats and private companies, who spent much of Friday and the weekend shoring up their position and planning counter attacks against those working for open data, and thus threatening the status quo.

    It’s a dangerous game they’re playing, working against the ‘right to data’ policy of the government, but I guess they think their jobs are more important than that, and no doubt there will be further plotting at the LGA conference this week.

    There was also a response from Spikes Cavell itself on the Information Age website. Adrian Short deals with most of the issues in the comments (his deconstruction is well worth reading), but here I wanted to widen the topic, just a little, to expand a little on why this agreement and other like them are so contrary to the principles of open data.

    Photo: chrisspurgeon on Flickr

    The problem with the Spikes Cavell deal comes not from any sort of fundamentalist approach to open data (e.g. you must use this method to publish, and only this method), but that such agreements go against the central point of open data – the openness and the benefits that can bring.

    Because if open public data is about anything, it’s about equal access to the data for everyone – to allow them to draw their own conclusions from it and to use it make new websites and mashups. And, yes, to build businesses too.

    However, the important distinction here is that such businesses are based solely on what value they can add to the data, and not whether you have privileged access. Sadly it seems as if Spikes Cavell’s business model is based on restricting access, not building on open data.

    Lest we forget, Spikes Cavell is not an agent for change here, not part of those pushing to open public data, but in fact has a business model which seems to be predicated on data being closed, and the maintenance of the existing procurement model which has served us so badly.

    Not convinced? How about this quote from the website of a similar company, BiP, who I mentioned in a post about public data held in private databases:

    Public procurement is highly process orientated, and is subject to a wide range of legal requirements and best practice guidelines.

    Our team of skilled and experienced consultants can help ensure public sector buyers become more effective when procuring and can also assist suppliers by providing in-depth advice on how to win public sector tenders.

    So they are making a margin on both sides, and in the process preventing proper scrutiny. Why on earth would companies like this want open data?

    And why on earth would the quangos that  depend on those processes want open data – because that transparency threatens the cosy system that pays their wages, and has eaten up so many of the resources thrown at public services by the previous government.

    So here we have an opportunity for the new administration to follow through its promises to reform the way that the public sector does business, and to start with putting a stop to deals like this one.

    Written by countculture

    July 4, 2010 at 2:53 pm

    The open spending data that isn’t… this is not good

    with 28 comments

    When the coalition announced that councils would have to publish all spending over £500 by January next year, there’s been a palpable excitement in the open data and transparency community at the thought of what could be done with it (not least understanding and improving the balance of councils’ relationships with suppliers).

    Secretary of State for Communities & Local Government Eric Pickles followed this up with a letter to councils saying, “I don’t expect everyone to do it right first time, but I do expect everyone to do it.” Great. Raw Data Now, in the words of Tim-Berners Lee.

    Now, however, with barely the ink dry, the reality is looking not just a bit messy, a bit of a first attempt (which would be fine and understandable given the timescale), but Not Open At All.

    As a member of the Local Public Data Panel, I’ve worked with other members and councils to draw up some clear and pragmatic draft guidelines for publishing the local spending data. We’ve had a great response in the comments and in conversations, and together with some lessons I did on importing the existing data, I think these will allow us to do a second draft soon.

    One thing we weren’t explicit in that first draft – because we took it for granted – was that the data had to be open, and free for reuse by all. Equality of access by all is essential.

    So I’ve been watching the activities of Spikes Cavell’s SpotlightOnSpend with some wariness and now those fears seem to have been borne out, as the company seems to set out not to consume the open data that councils are publishing, but to control this data.

    The idea seems to be that councils should give Spikes Cavell privileged access to their detailed invoice information, which the company then adds to their proprietry and definitely non-open database, and then publishes an extract of this information on the SpotlightOnSpend website. Exactly what information they get, and under what terms isn’t disclosed anywhere.

    The website’s got most of the buzzwords: transparency, accessible, efficiency. It’s even got a friendly .org.uk domain. If that’s not enough to convince councils, liberally sprinkled around the site is an apparent endorsement from the Secretary of State himself:

    I’m really excited about the opportunities of transparency and it’s something this government is utterly committed to. spotlightonspend demonstrates that, when innovative businesses work with far-sighted public bodies, we can inform the public, reduce costs and improve democracy both locally and nationally.

    Eric Pickles
    Secretary of State
    Communities and Local Government

    However, when you go to the data and click on the download link this is what you get:

    Note the “This data is for your personal use only”  (not to mention the fact that the use of a captcha’ to screen out machines downloading the data means, er, you can’t use machines to automatically download the data, which is sort of the point of publishing the data in a machine-readable way).

    Never mind, surely you can just head over to the council’s website and download the data from there? No chance. This is what you get on the Guildford website:

    You can search and view this financial data using a new Spotlight on Spend national website. Just follow the link found in the offsite links section of this page.

    What about Mole Valley Council:

    This data is now available on the spotlight on spend website. You can look at categories and individual suppliers to see how much has been spent in each area or you can download all the data to see individual transactions.

    But what about Windsor & Maidenhead, who are closely affiliated with the project, and who are publishing data on their website? Well, download the data from SpotlightOnSpend and it’s rather different from the published data. Different in that it is missing core data that is in W&M published data (e.g. categories), and that includes data that isn’t in the published data (e.g. data from 2008).

    So the upshot seems to be this, councils hand over all their valuable financial data to a company which aggregates for its own purposes, and, er, doesn’t open up the data, shooting down all those goals of mashing up the data, using the community to analyse and undermining much of the good work that’s been done.

    It’s worth linking here to the Open Knowledge Foundation’s draft guidelines on reporting of Government Finances (disclosure: I helped draw them up), of which the first point is ‘Make data openly available using an explicit license’. And let me be absolutely clear here: this is not open data, not a desirable approach, will not achieve the results of transparency or of equality of access, and is not good for the public sector.

    I’m hoping this is a matter of councils and the Secretary of State not understanding the process and implications of giving this data to Spikes Cavell on a privileged basis. If not, perhaps it could be the first test case for the newly setup of Public Sector Transparency Board to rule on.

    Update: With lightning fast speed, the Transparency Board has issued a statement about this issue reiterating the open data principles, and saying that measures are taking place to rectify the problem.

    There are many questions remaining, not least the nature of the relationship with Spikes Cavell, and the undesirability about their privileged access to the information, but the Board should be congratulated for their quick reaction to the situation, and bodes well for the future issues that will undoubtedly come up.

    In the meantime, I’ll keep on the case, and update with blog/tweet as I get more information

    Written by countculture

    July 2, 2010 at 9:57 am