countculture

Open data and all that

Posts Tagged ‘ONS

Local spending data in OpenlyLocal, and some thoughts on standards

with 3 comments

A couple of weeks ago Will Perrin and I, along with some feedback from the Local Public Data Panel on which we sit, came up with some guidelines for publishing local spending data. They were a first draft, based on a request by Camden council for some guidance, in light of the announcement that councils will have to start publishing details of spending over £500.

Now I’ve got strong opinions about standards: they should be developed from real world problems, by the people using them and should make life easier, not more difficult. It slightly concerned me that in this case I wasn’t actually using any of the spending data – mainly because I hadn’t got around to adding it in to OpenlyLocal yet.

This week, I remedied this, and pulled in the data from those authorities that had published their local spending data – Windsor & Maidenhead, the GLA and the London Borough of Richmond upon Thames. Now there’s a couple of sites (including Adrian Short’s Armchair Auditor, which focuses on spending categories) already pulling the Windsor & Maidenhead data but as far as I’m aware they don’t include the other two authorities, and this adds a different dimension to things, as you want to be able to compare the suppliers across authorities.

First, a few pages from OpenlyLocal showing how I’ve approached it (bear in mind they’re a very rough first draft, and I’m concentrating on the data rather than the presentation). You can see the biggest suppliers to a council right there on the council’s main page (e.g. Windsor & Maidenhead, GLA, Richmond):

Click through to more info gets you a pagination view of all suppliers (in Windsor & Maidenhead’s case there are over 2800 so far):

Clicking any of these will give you the details for that supplier, including all the payments to them:

And clicking on the amount will give you a page just with the transaction details, so it can be emailed to others

But we’re getting ahead of ourselves. The first job is to import the data from the CSV files into a database and this was where the first problems occurred. Not in the CSV format – which is not a problem, but in the consistency of data.

Take Windsor & Maidenhead (you should just be able to open these files an any spreadsheet program). Looking at each data set in turn and you find that there’s very little consistency – the earliest sets don’t have any dates and aggregate across a whole quarter (but do helpfully have the internal Supplier ID as well as the supplier name). Later sets have the transaction date (although in one the US date format is used, which could catch out those not looking at them manually), but omit supplier ID and cost centre.

On the GLA figures, there’ a similar story, with the type of data and the names used to describe changing seemingly randomly between data sets. Some of the 2009 ones do have transaction dates, but the 2010 one generally don’t, and the supplier field has different names, from Supplier to Supplier Name to Vendor.

This is not to criticise those bodies – it’s difficult to produce consistent data if you’re making the rules up as you go along (and given there weren’t any established precedents that’s what they were doing), and doing much of it by hand. Also, they are doing it first and helping us understand where the problems lie (and where they don’t). In short they are failing forward –getting on with it so they can make mistakes from which they (and crucially others) can learn.

But who are these suppliers?

The bigger problem, as I’ve said before, is being able to identify the suppliers, and this becomes particularly acute when you want to compare across bodies (who may name the same company or body slightly differently). Ideally (as we put in the first draft of the proposals), we would have the company number (when we’re talking about a company, at any rate), but we recognised that many accounts systems simply won’t have this information, and so we do need some information that helps use identify them.

Why do we want to know this information? For the same reason we want any ID (you might as well ask why Companies House issues Company Numbers and requires all companies to put that number on their correspondence) – to positively identify something without messing around with how someone has decided to write the name.

With the help of the excellent Companies Open House I’ve had a go at matching the names to company numbers, but it’s only been partially successful. When it is, you can do things like this (showing spend with other councils on a suppliers’ page):

It’s also going to allow me to pull in other information about the company, from Companies House and elsewhere. For other bodies (i.e. those without a company number), we’re going to have to find another way of identifying them, and that’s next on the list to tackle.

Thoughts on those spending data guidelines

In general I still think they’re fairly good, and most of the shortcomings have been identified in the comments, or emailed to us (we didn’t explicitly state that the data should be available under an open licence such as the one at data.gov.uk, and be definitely should have done). However, adding this data to OpenlyLocal (as well as providing a useful database for the community) has crystalised some thoughts:

  • Identification of the bodies is essential, and it think we were right to make this a key point, but it’s likely we will need to have the government provide a lookup table between VAT numbers and Company Numbers.
  • Speaking of Government datasets, there’s no way of finding out the ancestry of a company – what its parent company is, what its subsidiaries are, and that’s essential if we’re to properly make use of this information, and similar information released by the government. Companies House bizarrely doesn’t hold this information, but the Office For National Statistics does, and it’s called the Inter Departmental Business Register. Although this contains a lot of information provided in confidence for statistical reasons, the relationships between companies isn’t confidential (it just isn’t stored in one place), so it would be perfectly feasible to release this information.
  • We should probably be explicit whether the figures should include VAT (I think the Windsor & Maidenhead ones don’t include it, but the GLA imply that theirs might).
  • Categorisation is going to be a tricky one to solve, as can be seen from the raw data for Windsor & Maidenhead – for example the Children’s Services Directorate is written as both Childrens Services & Children’s Services, and it’s not clear how this, or the subcateogries, ties into standard classifications for government spending, making comparison across authorities tricky.
  • I wonder what would be the downside to publishing the description details, even, potentially, the invoice itself. It’s probably FOI-able, after all.

As ever, comments welcome, and of course all the data is available through the API under an open licence.

C

Written by countculture

June 17, 2010 at 9:35 pm

Ward maps on OpenlyLocal (& how I did it)

with 6 comments

Yesterday we added a feature to OpenlyLocal that I’ve been wanting to do since the beginning: ward maps. Why is this important? Simple, because hardly anyone knows what council ward they’re in, and nobody knows where the boundary lies, and the ward is the most basic unit of democratic accountability.

True, some councils have outlines of the wards (but without placing them on a map) and it is visible on the ONS’s Neighbourhood Statistics site, though it’s a pretty challenging user experience:

ONS Neighbourhood Statistics Ward Page

But to me, this should just be something that is there on the OpenlyLocal ward page, where the links to relevant councillors, stats and other data sit… and now it is. Here’s the page for Wembley Central ward in north London, for example, complete with zoomable maps with all the usual Google extras (satellite view etc):

Ward details and map for Wembley Central

Since tweeting about this, I’ve had a few people ask me how I did it, and so here are the details.

Much of the credit goes to the guys at #maptitude (writeup here), who put me in a room with the excellent Stuart Harrison from Lichfield District Council in a room and asked us to do something over the course of an afternoon, and to Stuart himself, who did most of the work.

Over that afternoon (on the suggestion of @danslee and with input from everyone else at the event) we did a ward comparison proof-of-concept using some boundaries Stuart had already imported into a database. The key thing, I think for both of us, was getting a few hours of focus just on the ward mapping problem, checking up on the various bits of code and interface with Google maps, and working out how to draw the outlines. Stuart used a Windows command line program and some PHP to get the boundaries; when I did it for the wards I used Ruby, the language OpenlyLocal is programmed in.

How I did it

The data comes from the newly opened-up OS BoundaryLine dataset (easiest to download from MySociety here), specifically the May 2010 data. The first problem is that this comes in the form of ESRI Shapefiles, which are standard for geo geeks, but not for data mashups, or mixing with online maps. The first stage was to import these into the database, and for this I used the GeoRuby library. Specifically I did this:

shpfile = File.join(RAILS_ROOT, "db/boundary_line/district_borough_unitary_ward_region")
GeoRuby::Shp4r::ShpFile.open(shpfile) do |shp|
    geom = shape.geometry
    attribute_data = shape.data
    #do something with the data
end

This just reads the shapefile (and associated data files), goes through the shapes, extracts the geometries and associated attributes (e.g. IDs, area size etc), and lets you do something with them.

If you’re using the Ruby Spatial Adapter library you should then be able to store them in the database easily. Except…

Except, the geometries are polygons made up not of latitude/longitude points but of OS northings/eastings. More than that, they are using a different model for the shape of the Earth (OSGB36) rather than the more normal (although arguably US-centric WGS84). Now doing these two conversions is not trivial, and while there are quite a few libraries for other languages I didn’t find one for Ruby, and so I wrote one, basically converting a Javascript version line by line into Ruby (gist here).

Like all direct code conversions, it’s not pretty, it isn’t fast, but I have tested it and it seems to work well. After that it was a simple matter to add this to the loop, put it in a ruby command line script called a rake task and let it run (took about an hour or so). Specifically this is what I put in in place of the ‘do something with the data’ line:

wgs84_lat_long_groupings = geom.geometries.first.rings.collect do |ring|
   ring.points.collect{|pt| OsCoordsNewUtilities.convert_os_to_wgs84(pt.x,pt.y).reverse }
   # when creating from collections of coords supplied as x,y (where x is long, y lat)
end
wsg84_polygon = Polygon.from_coordinates(wgs84_lat_long_groupings)
# if you're using the Spatial Adapter you should be able to save this polygon in a 'polygon'
# type field in the database.

So far I’ve done all the district wards; I’ll do the Unitary and County Councils next, but showing them as polygons on a map you quickly start to add a lot of bulk to the page, which for people using screenreaders or mobile devices is not good. So I’m investigating other options, including Google Fusion tables (though I’d like to steer clear of Google-only solutions), and running my own tile server. Suggestions, comments welcome.

In the meantime, hope this helps, and if people will find it helpful, I’ll make the boundaries available via the API as a KML file.

Written by countculture

June 2, 2010 at 7:25 pm

About your local area: ward-level statistics come to OpenlyLocal

leave a comment »

Those who follow me on twitter will know that for the past couple of months I’ve been on-and-off looking at the Official for National Statistics Neighbourhood Statistics, and whether it would be possible and useful to show some of that information on OpenlyLocal.

Usually, when I’ve mentioned it on twitter it has usually been in the context of moaning about the less-than-friendly SOAP interface to the data (even by SOAP standards it’s unwieldy). There’s also the not insignificant issue of getting to grips with the huge amount of data, and how it’s stored on the ONS’s servers (at one stage I looked at downloading the raw data, but we’re talking about tens of thousands of files).

Still, like a person with a loose tooth, I’ve worried the problem on and off in quiet times with occasionally painful results (although the people at the ONS have been very helpful), and have now got to a level where (I think) it’s pretty useful.

Specifically, you can now see general demographic info for pretty much all the councils in England & Wales (unfortunately the ONS database doesn’t include Scotland or Northern Ireland, so if there’s anyone who can help me with those areas, I’d be pleased to hear from them).

Area Statistics for Preston Council on OpenlyLocal

More significantly, however, we’ve added a whole load of ward-level statistics:

Example of ward-level ONS statistics

Inevitably, much of the data comes from the 2001 Census (the next is due in 2011), and so it’s not bang up to date. However, it’s still useful and informative, particularly as you can compare the figures with the other wards in the council, or compare councils of similar type. Want to know which ward has the greatest proportion of people over the age of 90 years old. No prob, just click on the description (‘People aged 90 and over in this case) and you have it:

Doing the same on councils will bring up  a comparison with similar councils (e.g. District councils are compared with other district councils, London Authorities with other London Authorities):

As you can see from the list of ONS datasets, there’s huge amounts of data to be shown, and we’ve only imported a small section, in part while we’re working out the best way of making it manageable. As you can see from the religion graph, where it makes more sense for it to be graphed we’ve done it that way, and you can expect to see more of that in the futrue.

It’s also worth mentioning that there are some gaps in the ONS’s database — principally where ward boundaries have changed, or where new local authorities have been formed, and if there’s only a small amount of info for a ward or council, that’s why.

In the meantime, have a play, and if there’s a dataset you want us to expose sooner rather than later, let me know in the comments or via twitter (or email, of course).

C

p.s. In case you’re wondering the graphs and data are fully accessible so should be fine for screenreaders. The comparison tables are just plain ordinary HTML tables with a bit of CSS styling to make them look like graphs, and the pie charts have the underlying data accompanying them as tables on the page (and can be seen by anyone else just by clicking on the chart).

Written by countculture

January 5, 2010 at 11:36 am