countculture

Open data and all that

Posts Tagged ‘Councils

A Local Spending Data wish… granted

with 25 comments

The very wonderful Stuart Harrison (aka pezholio), webmaster at Lichfield District Council, blogged yesterday with some thoughts about the publication of spending data following a local spending data workshop in Birmingham. Sadly I wasn’t able to attend this, but Stuart gives a very comprehensive account, and like all his posts it’s well worth reading.

In it he made an important observation about those at the workshop who were pushing for linked data from the beginning, and wished there was a solution. First the observation:

There did seem to be a bit of resistance to the linked data approach, mainly because agreeing standards seems to be a long, drawn out process, which is counter to the JFDI approach of publishing local data… I also recognise that there are difficulties in both publishing the data and also working with it… As we learned from the local elections project, often local authorities don’t even have people who are competent in HTML, let alone RDF, SPARQL etc.

He’s not wrong there. As someone who’s been publishing linked data for some time, and who conceived and ran the Open Election Data project Stuart refers to, working with numerous councils to help them publish linked data I’m probably as aware of the issues as anyone (ironically and I think significantly none of the councils involved in the local government e-standards body, and now pushing so hard for the linked data, has actually published any linked data themselves).

That’s not to knock linked data – just to be realistic about the issues and hurdles that need to be overcome (see the report for a full breakdown), and that to expect all the councils to solve all these problems at the same time as extracting the data from their systems, removing data relating to non-suppliers (e.g. foster parents), and including information from other systems (e.g. supplier data, which may be on procurement systems), and all by January, is  unrealistic at best, and could undermine the whole process.

So what’s to be done? I think the sensible thing, particularly in these straitened times, is to concentrate on getting the raw data out, and as much of it as possible, and come down hard on those councils who publish it badly (e.g. by locking it up in PDFs or giving it a closed licence), or who willfully ignore the guidance (it’s worrying how few councils publishing data at the moment don’t even include the transaction ID or date of the transaction, never mind supplier details).

Beyond that we should take the approach the web has always done, and which is the reason for its success: a decentralised, messy variety of implementations and solutions that allows a rich eco-system to develop, with government helping solve bottlenecks and structural problems rather than trying to impose highly centralised solutions that are already being solved elsewhere.

Yes, I’d love it if the councils were able to publish the data fully marked up, in a variety of forms (not just linked data, but also XML and JSON), but the ugly truth is that not a single council has so far even published their list of categories, never mind matched it up to a recognised standard (CIPFA BVACOP, COFOG or that used in their submissions to the CLG), still less done anything like linked data. So there’s a long way to go, and in the meantime we’re going to need some tools and cheap commodity services to bridge the gap.

[In a perfect world, maybe councils would develop some open-source tools to help them publish the data, perhaps using something like Adrian Short’s Armchair Auditor code as the basis (this is a project that took a single council, WIndsor & Maidenhead, and added a web interface to the figures). However, when many councils don’t even have competent HTML skills (having outsourced much of it), this is only going to happen at a handful of councils at best, unless considerable investment is made.]

Stuart had been thinking along similar lines, and made a suggestion, almost a wish in fact:

I think the way forward is a centralised approach, with authorities publishing CSVs in a standard format on their website and some kind of system picking up these CSVs (say, on a monthly basis) and converting this data to a linked data format (as well as publishing in vanilla XML, JSON and CSV format).

He then expanded on the idea, talking about a single URL for each transaction, standard identifiers, “a human-readable summary of the data, together with links to the actual data in RDF, XML, CSV and JSON”. I’m a bit iffy about that ‘centralised approach’ phrase (the web is all about decentralisation), but I do think there’s an opportunity to help both the community and councils by solving some of these problems.

And  that’s exactly what we’ve done at OpenlyLocal, adding the data from all the councils who’ve published their spending data, acting as a central repository, generating the URLs, and connecting the data together to other datasets and identifiers (councils with Snac IDs, companies with Companies House numbers). We’ve even extracted data from those councils who unhelpfully try to lock up their data as PDFs.

There are at time of writing 52,443 financial transactions from 9 councils in the OpenlyLocal database. And that’s not all, there’s also the following features:

  • Each transaction is tied to a supplier record for the council, and increasingly these are linked to company info (including their company number), or other councils (there’s a lot of money being transferred between councils), and users can add information about the supplier if we haven’t matched it up.
  • Every transaction, supplier and company has a permanent unique URL and is available as XML and JSON
  • We’ve sorted out some of the date issues (adding a date fuzziness field for those councils who don’t specify when in the month or quarter a transaction relates to).
  • Transactions are linked to the URL from which the file was downloaded (and usually the line number too, though obviously this is not possible if we’ve had to extract it from a PDF), meaning anyone else can recreate the dataset should they want to.
  • There’s an increasing amount of analysis, showing ordinary users spending by month, biggest suppliers and transactions, for example.
  • The whole spending dataset is available as a single, zipped CSV file to download for anyone else to use.
  • It’s all open data.

There are a couple of features Stuart mentions that we haven’t yet implemented, for good reason.

First, we’re not yet publishing it as linked data, for the simple reason that the vocabulary hasn’t yet been defined, nor even the standards on which it will be based. When this is done, we’ll add this as a representation.

And although we use standard identifiers such as SNAC ids for councils (and wards) on OpenlyLocal, the URL structure Stuart mentions is not yet practical, in part because SNAC ids doesn’t cover all authorities (doesn’t include the GLA, or other public bodies, for example), and only a tiny fraction of councils are publishing their internal transaction ids.

Also we haven’t yet implemented comments on the transactions for the simple reason that distributed comment systems such as Disqus are javascript-based and thus are problematic for those with accessibility issues, and site-specific ones don’t allow the conversation to be carried on elsewhere (we think we might have a solution to this, but it’s at an early stage, and we’d be interested to hear other idea).

But all in all, we reckon we’re pretty much there with Stuart’s wish list, and would hope that councils can get on with extracting the raw data, publishing it in an open, machine-readable format (such as CSV), and then move to linked data as their resources allow.

Written by countculture

August 3, 2010 at 7:45 am

Local Spending in OpenlyLocal: what features would you like to see?

with 2 comments

As I mentioned in a previous post, OpenlyLocal has now started importing council local spending data to make it comparable across councils and linkable to suppliers. We now added some more councils, and some more features, with some interesting results.

As well as the original set of Greater London Authority, Windsor & Maidenhead and Richmond upon Thames, we’ve added data from Uttlesford, King’s Lynn & West Norfolk and Surrey County Council (incidentally, given the size of Uttlesford and of King’s Lynn & West Norfolk, if they publish this data, any council should be able to).

We’ve also added a basic Spending Dashboard, to give an overview of the data we’ve imported so far:

Of course the data provided is of variable quality and in various formats. Some, like King’s Lynn & Norfolk are in simple, clean CSV files. Uttlesford have done it as a spreadsheet with each payment broken down to the relevant service, which is a bit messy to import but adds greater granularity than pretty much any other council.

Others, like Surrey, have taken the data that should be in a CSV file and for no apparent reason have put it in a PDF, which can be converted, but which is a bit of a pain to do, and means maunal intervention to what should be a largely automatic process (challenge for journos/dirt-hunters: is there anything in the data that they’d want to hide, or is it just pig-headedness).

But now we’ve got all that information in there we can start to analyse it, play with it, and ask questions about it, and we’ve started off by showing a basic dashboard for each council.

For each council, it’s got total spend, spend per month, number of suppliers & transactions, biggest suppliers and biggest transactions. It’s also got the spend per month (where a figure is given for a quarter, or two-month period, we’ve averaged it out over the relevant months). Here, for example, is the one for the Greater London Authority:

Lots of interesting questions here, from getting to understand all those leasing costs paid via the Amas Ltd Common Receipts Account, to what the £4m paid to Jack Morton Worldwide (which describes itself as a ‘global brand experience agency’) was for. Of course you can click on the supplier name for details of the transactions and any info that we’ve got on them (in this case it’s been matched to a company – but you can now submit info about a company if we haven’t matched it up).

You can then click on the transaction to find out more info on it, if that info was published, but which is perhaps the start of an FoI request either way:

It’s also worth looking at the Spend By Month, as a raw sanity-check. Here’s the dashboard for Windsor & Maidenhead:

See that big gap for July & August 09. My first thought was that there was an error with importing the data, which is perfectly possible, especially when the formatting changes frequently as it does in W&M’s data files, but looking at the actual file, there appear to be no entries for July & August 09 (I’ve notified them and hopefully we’ve get corrected data published soon). This, for me, is one of the advantages of visualizations: being able to easily spot anomalies in the data, that looking at tables or databases wouldn’t show.

So what further analyses would you like out of the box: average transaction size, number of transactions over £1m, percentage of transactions for a round number (i.e. with a zero at the end),  more visualizations? We’d love your suggestions – please leave them in the comments or tweet me.

Written by countculture

July 26, 2010 at 9:44 am

Update on the local spending data scandal… the empire strikes back

with 5 comments

My blog post on Friday about the local spending information, the open data that isn’t, and the agreements that some councils seem to have struck with Spikes Cavell raised a flurry of tweets, emails, and a reassuringly fast response from the government’s Transparency Board.

It also, I’m told, generated a huge number of emails among the main protagonists – local and central government bureaucrats and private companies, who spent much of Friday and the weekend shoring up their position and planning counter attacks against those working for open data, and thus threatening the status quo.

It’s a dangerous game they’re playing, working against the ‘right to data’ policy of the government, but I guess they think their jobs are more important than that, and no doubt there will be further plotting at the LGA conference this week.

There was also a response from Spikes Cavell itself on the Information Age website. Adrian Short deals with most of the issues in the comments (his deconstruction is well worth reading), but here I wanted to widen the topic, just a little, to expand a little on why this agreement and other like them are so contrary to the principles of open data.

Photo: chrisspurgeon on Flickr

The problem with the Spikes Cavell deal comes not from any sort of fundamentalist approach to open data (e.g. you must use this method to publish, and only this method), but that such agreements go against the central point of open data – the openness and the benefits that can bring.

Because if open public data is about anything, it’s about equal access to the data for everyone – to allow them to draw their own conclusions from it and to use it make new websites and mashups. And, yes, to build businesses too.

However, the important distinction here is that such businesses are based solely on what value they can add to the data, and not whether you have privileged access. Sadly it seems as if Spikes Cavell’s business model is based on restricting access, not building on open data.

Lest we forget, Spikes Cavell is not an agent for change here, not part of those pushing to open public data, but in fact has a business model which seems to be predicated on data being closed, and the maintenance of the existing procurement model which has served us so badly.

Not convinced? How about this quote from the website of a similar company, BiP, who I mentioned in a post about public data held in private databases:

Public procurement is highly process orientated, and is subject to a wide range of legal requirements and best practice guidelines.

Our team of skilled and experienced consultants can help ensure public sector buyers become more effective when procuring and can also assist suppliers by providing in-depth advice on how to win public sector tenders.

So they are making a margin on both sides, and in the process preventing proper scrutiny. Why on earth would companies like this want open data?

And why on earth would the quangos that  depend on those processes want open data – because that transparency threatens the cosy system that pays their wages, and has eaten up so many of the resources thrown at public services by the previous government.

So here we have an opportunity for the new administration to follow through its promises to reform the way that the public sector does business, and to start with putting a stop to deals like this one.

Written by countculture

July 4, 2010 at 2:53 pm

The open spending data that isn’t… this is not good

with 28 comments

When the coalition announced that councils would have to publish all spending over £500 by January next year, there’s been a palpable excitement in the open data and transparency community at the thought of what could be done with it (not least understanding and improving the balance of councils’ relationships with suppliers).

Secretary of State for Communities & Local Government Eric Pickles followed this up with a letter to councils saying, “I don’t expect everyone to do it right first time, but I do expect everyone to do it.” Great. Raw Data Now, in the words of Tim-Berners Lee.

Now, however, with barely the ink dry, the reality is looking not just a bit messy, a bit of a first attempt (which would be fine and understandable given the timescale), but Not Open At All.

As a member of the Local Public Data Panel, I’ve worked with other members and councils to draw up some clear and pragmatic draft guidelines for publishing the local spending data. We’ve had a great response in the comments and in conversations, and together with some lessons I did on importing the existing data, I think these will allow us to do a second draft soon.

One thing we weren’t explicit in that first draft – because we took it for granted – was that the data had to be open, and free for reuse by all. Equality of access by all is essential.

So I’ve been watching the activities of Spikes Cavell’s SpotlightOnSpend with some wariness and now those fears seem to have been borne out, as the company seems to set out not to consume the open data that councils are publishing, but to control this data.

The idea seems to be that councils should give Spikes Cavell privileged access to their detailed invoice information, which the company then adds to their proprietry and definitely non-open database, and then publishes an extract of this information on the SpotlightOnSpend website. Exactly what information they get, and under what terms isn’t disclosed anywhere.

The website’s got most of the buzzwords: transparency, accessible, efficiency. It’s even got a friendly .org.uk domain. If that’s not enough to convince councils, liberally sprinkled around the site is an apparent endorsement from the Secretary of State himself:

I’m really excited about the opportunities of transparency and it’s something this government is utterly committed to. spotlightonspend demonstrates that, when innovative businesses work with far-sighted public bodies, we can inform the public, reduce costs and improve democracy both locally and nationally.

Eric Pickles
Secretary of State
Communities and Local Government

However, when you go to the data and click on the download link this is what you get:

Note the “This data is for your personal use only”  (not to mention the fact that the use of a captcha’ to screen out machines downloading the data means, er, you can’t use machines to automatically download the data, which is sort of the point of publishing the data in a machine-readable way).

Never mind, surely you can just head over to the council’s website and download the data from there? No chance. This is what you get on the Guildford website:

You can search and view this financial data using a new Spotlight on Spend national website. Just follow the link found in the offsite links section of this page.

What about Mole Valley Council:

This data is now available on the spotlight on spend website. You can look at categories and individual suppliers to see how much has been spent in each area or you can download all the data to see individual transactions.

But what about Windsor & Maidenhead, who are closely affiliated with the project, and who are publishing data on their website? Well, download the data from SpotlightOnSpend and it’s rather different from the published data. Different in that it is missing core data that is in W&M published data (e.g. categories), and that includes data that isn’t in the published data (e.g. data from 2008).

So the upshot seems to be this, councils hand over all their valuable financial data to a company which aggregates for its own purposes, and, er, doesn’t open up the data, shooting down all those goals of mashing up the data, using the community to analyse and undermining much of the good work that’s been done.

It’s worth linking here to the Open Knowledge Foundation’s draft guidelines on reporting of Government Finances (disclosure: I helped draw them up), of which the first point is ‘Make data openly available using an explicit license’. And let me be absolutely clear here: this is not open data, not a desirable approach, will not achieve the results of transparency or of equality of access, and is not good for the public sector.

I’m hoping this is a matter of councils and the Secretary of State not understanding the process and implications of giving this data to Spikes Cavell on a privileged basis. If not, perhaps it could be the first test case for the newly setup of Public Sector Transparency Board to rule on.

Update: With lightning fast speed, the Transparency Board has issued a statement about this issue reiterating the open data principles, and saying that measures are taking place to rectify the problem.

There are many questions remaining, not least the nature of the relationship with Spikes Cavell, and the undesirability about their privileged access to the information, but the Board should be congratulated for their quick reaction to the situation, and bodes well for the future issues that will undoubtedly come up.

In the meantime, I’ll keep on the case, and update with blog/tweet as I get more information

Written by countculture

July 2, 2010 at 9:57 am

Opening up government finances

with one comment

[Cross-posted from https://countculture.wordpress.com]

It’s an exciting time with open data in the UK, and especially in the area of local data. When I first played around with the idea of opening up the basics of local government data (which turned into OpenlyLocal), I never imagined I was entering an area that  little more than a year later would become such an exciting area, combining two of the hottest online trends, open government data and local data.

But still, there’s a hell of a long way to go, and one of the areas where there’s furthest to travel, and most to do is finance, specifically where the money’s being spent, who it’s being spent with, and also where it comes from. As the old journo saw goes: follow the money.

I had my first taste of the problems when I took a pretty much unused (and locked) spreadsheet, the 2006-07 Local Spending Report, and over the course of a weekend, unlocked it cleaned it up, imported it into a database and allowed people to do what the spreadsheet didn’t — make comparisons on local spending across councils and in areas.

However, the information was fairly heavily aggregated, was for just one period, and didn’t allow comparison with other financial reports.

So at the last OKCON a month or so ago, I sat down with some of the good people from Where Does My Money Go to discuss in some general principles for presenting government finances as data, to allow it to be properly analysed, combined with other data, and follow the flow of money to and from all branches of government, central and local. Now, the first draft has now been published.

The hope was that we could establish some general principles that would be applicable not just to government finances in the UK, but also for other countries too. Some of the key points are:

  • Machine-readable — we need the information as data that we can do things with.
  • Fine enough granularity so we can understand what’s going on, both in terms of categories, time periods, and transactions of any sort of size.
  • Using standard IDs to allow definitive identification and matching of bodies, areas and categories.

Obviously we’d welcome comments from both the UK and other countries. It’s also worth noting that there are two overlapping but slightly different areas: the accounts, and the transactions. Ultimately if you have access to the transactions you can work out the accounts, but it may be worth teasing out the distinctions, particularly in light of the UK moves (see below).

At the same time as doing this, in the UK things have been moving on apace, with the new coalition government announcing that by January 2011 all spending by local government over £500 must be published, which in government terms is a blink of an eye.

In addition, Will Perrin and I, who both sit on the Local Public Data Panel, were also asked for advice by Camden Council, who in the best traditions of open data wanted to get on and release their data a lot earlier than this.

Within literally a few days, and with much helpful advice from many of the other Local Public Data Panel members, a first draft was done, and has today been published on data.gov.uk. This clearly is very much a UK document, is concerned with local spending, and is framed by the goal of publishing spending over £500.

However, like the Open Knowledge Foundation document, it’s meant as a first draft, and a focal point for discussion and I’d encourage all, whether open data and transparency advocates, or those working in local government (including police, health authorities etc) to add their comments to this document too.

Written by countculture

June 3, 2010 at 10:25 pm

New feature: search for information by postcode

leave a comment »

Why was it important that the UK government open up the geographic infrastructure? Because it makes so many location-based things that were tortuous, almost trivial.

Previously, getting open data about your local councillors, given just a postcode, was a tortuous business, requiring multiple calls to different sites. Now, it is easy. Just go to http://openlylocal.com/areas/postcodes/%5Byourpostcodehere%5D and, bingo, you’re done.

You can also just put your postcode in the search box on any OpenlyLocal page to do the same thing. And, obviously, you can also download the data as XML or JSON, and with an open data licence that allows reuse by anybody, even commercial reuse.

There’s still a little bit of tweaking to be done. I need to match up postcodes county electoral divisions, and I’m planning on adding RDF to the data types returned. Finally, it’d be great to show the ward boundaries on a map, but I think that may take a little more work.

Written by countculture

April 6, 2010 at 11:45 am

Tweeting councillors, and why open, connected data matters

leave a comment »

Cllr Tweeps twitter directory of UK councillors closes

A couple of days ago I heard that the rather excellent CllrTweeps website was closing down. At its heart, CllrTweeps was a directory of councillors on Twitter, matching them up to council and party. My first thought was, wow that’s a shame to let that all that accumulated data go to waste.

The second was, wouldn’t it be great to put it on OpenlyLocal as open data, then not only would it be available to everyone via the API but it would also link the twitter accounts not just to the council, but also the ward, committees and so on.

So I dropped CllrTweeps a quick note, and Dafydd and James, the guys behind CllrTweeps, were well up for it. Within less than 48 hours, they’d sent me the data, agreed to make it open data, and I’d matched the first batch against the councillor records already on OpenlyLocal. What’s more, as a bonus, they’d also been collating info on councillor blogs, and so we could add that too.

Why is all this important — after all there are other pretty good sites listing councilors on twitter (although I’m not sure they’re as extensive as the CllrTweeps list)? It matters for the same reason as it was worth doing the open data Hyperlocal Directory (which is going gangbusters).

The  point is not who is maintaining the list — whether it’s twitter accounts or hyperlocal sites. What matter is whether the information is open for reuse by hyperlocal sites, bloggers, mashups, or anybody else and whether that information is it able to be connected to other bits of information, or is it — like the government data we often criticise — locked up in its own silo, not able to be matched to or combined with other information.

There’s a few tweeks we’re going to adding over the next couple of weeks, but for now if you’re a tweeting councillor (county/district/borough for the moment; parish and town councillors soon), let us know by tweeting to @OpenlyLocal with the hashtag #ukcouncillors (e.g. like this) and either the URL address of your OpenlyLocal page or your council.

Even better, you’ll automatically be added to the twitter list of UK local councillors we’ve started (see below). Finally if you have a blog and you include the URL address of that in the tweet we can add that to the info on your OpenlyLocal page.

List of UK local councillors who tweet

p.s Because the twitter accounts on OpenlyLocal are open data, there’s obviously no reason why they can’t be combined with other such listings. Hopefully we can get this arrangement to be reciprocal 😉

Written by countculture

February 13, 2010 at 12:37 pm