countculture

Open data and all that

Archive for the ‘api’ Category

Planning Alerts: first fruits

with 13 comments

PlanningAlerts is coming soon

Well, that took a little longer than planned…

[I won’t go into the details, but suffice to say our internal deadline got squeezed between the combination of a fast-growing website, the usual issues of large datasets, and that tricky business of finding and managing coders who can program in Ruby, get data, and be really good at scraping tricky websites.]

But I’m pleased to say we’ve now well on our way to not just resurrecting PlanningAlerts in a sustainable, scalable way but a whole lot more too.

Where we’re heading: a open database of UK planning applications

First, let’s talk about the end goal. From the beginning, while we wanted to get PlanningAlerts working again – the simplicity of being able to put in your postcode and email address and get alerts about nearby planning applications is both useful and compelling – we also knew that if the service was going to be sustainable, and serve the needs of the wider community we’d need to do a whole lot more.

Particularly with the significant changes in the planning laws and regulations that are being brought in over the next few years, it’s important that everybody – individuals, community groups, NGOs, other websites, even councils – have good and open access to not just the planning applications in their area, but in the surrounding areas too.

In short, we wanted to create the UK’s first open database of planning applications, free for reuse by all.

That meant not just finding when there was a planning application, and where (though that’s really useful), but also capturing all the other data too, and also keep that information updated as the planning application went through the various stages (the original PlanningAlerts just scraped the information once, when it was found on the website, and even then pretty much just got the address and the description).

Of course, were local authorities to publish the information as open data, for example through an API, this would be easy. As it is, with a couple of exceptions, it means an awful lot of scraping, and some pretty clever scraping too, not to mention upgrading the servers and making OpenlyLocal more scalable.

Where we’ve got to

Still, we’ve pretty much overcome these issues and now have hundreds of scrapers working, pulling the information into OpenlyLocal from well over a hundred councils, and now have well over half a million planning applications in there.

There are still some things to be sorted out – some of the council websites seem to shut down for a few hours overnight, meaning they appear to be broken when we visit them, others change URLs without redirecting to the new ones, and still others are just, well, flaky. But we’ve now got to a stage where we can start opening up the data we have, for people to play around with, find issues with, and start to use.

For a start, each planning application has its own permanent URL, and the information is also available as JSON or XML:

There’s also a page for each council, showing the latest planning applications, and the information here is available via the API too:

There’s also a GeoRSS feed for each council too allowing you to keep up to date with the latest planning applications for your council. It also means you can easily create maps or widgets for the council, showing the latest applications of the council.

Finally, Andrew Speakman, who’d coincidentally been doing some great stuff in this area, has joined the team as Planning editor, to help coordinate efforts and liaise with the community (more on this below).

What’s next

The next main task is to reinstate the original PlanningAlert functionality. That’s our focus now, and we’re about halfway there (and aiming to get the first alerts going out in the next 2-3 weeks).

We’ve also got several more councils and planning application systems to add, and this should bring the number of councils we’ve got on the system to between 150 and 200. This will be an ongoing process, over the next couple of months. There’ll also be some much-overdue design work on OpenlyLocal so that the increased amount of information on there is presented to the user in a more intuitive way – please feel free to contact us if you’re a UX person/designer and want to help out.

We also need to improve the database backend. We’ve been using MySQL exclusively since the start, but MySQL isn’t great at spatial (i.e. geographic) searches, restricting the sort of functionality we can offer. We expect to sort this in a month or so, probably moving to PostGIS, and after that we can start to add more features, finer grained searches, and start to look at making the whole thing sustainable by offering premium services.

We’ll be working too on liaising with councils who want to offer their applications via an API – as the ever pioneering Lichfield council already does – or a nightly data dump. This not only does the right thing in opening up data for all to use, but also means we don’t have to scrape their websites. Lichfield, for example, uses the Idox system, and the web interface for this (which is what you see when you look at a planning application on Lichfield’s website) spreads the application details over 8 different web pages, but the API makes this available on a single URL, reducing the work the server has to do.

Finally, we’re going to be announcing a bounty scheme for the scraper/developer community to write scrapers for those areas that don’t use one of the standard systems. Andrew will be coordinating this, and will be blogging about this sometime in the next week or so (and you can contact him at planning at openlylocal dot com). We’ll also be tweeting progress at @planningalert.

Thanks for your patience.

Opening up council accounts… and open procurement

with 8 comments

Since OpenlyLocal started pulling in council spending data, it’s niggled at me that it’s only half the story. Yes, as more and more data is published you’re beginning to get a much clearer idea of who’s paid what. And if councils publish it at a sufficient level of detail and consistently categorised, we’ll have a pretty good idea of what it’s spent on too.

However, useful though that is, that’s like taking a peak at a company’s bank statement and thinking it tells the whole story. Many of the payments relate to goods or services delivered some time in the past, some for things that have not yet been delivered, and there are all sorts of things (depreciation, movements between accounts, accruals for invoices not yet received) that won’t appear on there.

That’s what the council’s accounts are for — you know, those impenetrable things locked up in PDFs in some dusty corner of the council’s website, all sufficiently different from each other to make comparison difficult:

For some time, the holy grail for projects like OpenlyLocal and Where Does My Money Go has been to get the accounts in a standardized form to make comparison easy not just for accountants but for regular people too.

The thing is, such a thing does exist, and it’s sent by councils to central Government (the Department for Communities and Local Government to be precise) for them to use in their own figures. It’s a fairly hellishly complex spreadsheet called the Revenue Outturn form that must be filled in by the council (to get an idea have a look at the template here).

They’re not published anywhere by the DCLG, but they contain no state secrets or sensitive information; it’s just that the procedure being followed is the same one as they’ve always followed, and so they are not published, even after the statistics have been calculated from the data (the Statistics Act apparently prohibit publication until the stats have been published).

So I had an idea: wouldn’t it be great if we could pull the data that’s sitting in all these spreadsheets into a database and so allow comparison between councils’ accounts, thus freeing it from those forgotten corners of government computers.

This would seem to be a project that would be just about simple enough to be doable (though it’s trickier than it seems) and could allow ordinary people to understand their council’s spending in all sorts of ways (particularly if we add some of those sexy Where Does My Money Go visualisations). It could also be useful in ways that we can barely imagine  – some of the participatory budget experiments going in on in Redbridge and other councils would be even more useful if the context of similar councils spending was added to the mix.

So how would this be funded. Well, the usual route would be for DCLG or perhaps the one of the Local Government Association bodies such as IDeA to scope out a proposal, involving many hours of meetings, reams of paper, and running up thousands of pounds in costs, even before it’s started.

They’d then put the process out to tender, involving many more thousands in admin, and designed to attract those companies who specialise in tendering for public sector work. Each of those would want to ensure they make a profit, and so would work out how they’re going to do it before quoting, running up their own costs, and inflating the final price.

So here’s part two of my plan, instead going down that route, I’d come up with a proposal that would:

  • be a fraction of that cost
  • be specified on a single sheet of paper
  • paid for only if I delivered

Obviously there’s a clear potential conflict of interest here – I sit on the government’s Local Public Data Panel and am pushing strongly for open data, and also stand to benefit (depending on how good I am at getting the information out of those hundreds of spreadsheets, each with multiple worksheets, and matching the classification systems). The solution to that – I think – is to do the whole thing transparently, hence this blog post.

In a sense, what I’m proposing is that I scope out the project, solving those difficult problems of how to do it, with the bonus of instead of delivering a report, I deliver the project.

Is it a good thing to have all this data imported into a database, and shown not just on a website in a way non-accountants can understand, but also available to be combined with other data in mashups and visualisations? Definitely.

Is it a good deal for the taxpayer, and is this open procurement a useful way of doing things? Well you can read the proposal for yourself here, and I’d be really interested in comments both on the proposal and the novel procurement model.

Making OpenCharities even better… more features, more data, more charities

with 8 comments

I had a fantastic response to the launch of OpenCharities — my little side project to open up the Charity Commission’s Register of Charities — from individuals, from organisations representing the third sector, and from charities themselves.

There were also a few questions:

  • Could we pull out and expose via the api more info about the charities, especially the financial history?
  • How often would OpenCharities be updated and what about new charities added after we’d scraped the register?
  • Was there any possibility that we could add additional information from sources other than the Charity Register?

So, over the past week or so, we’ve been busy trying to answer those questions the best we could, mainly by just trying to get on and solve them.

First, additional info. After a terrifically illuminating meeting with Karl and David from NCVO, I had a much better idea of how the charity sector is structured, and what sort of information would be useful to people.

So the first thing I did was to rewrite the scraper and parser to pull in a lot more information, particularly the past 5 years income and spending and, for bigger charities the breakdown of that income and spending. (I also pulled in the remaining charities that had been missed the first time around, including removed charities.) Here’s what the NSPCC’s entry, for example, looks like now:

Example of financial info for charity

We are also now getting the list of trustees, and links to the accounts and Summary Information Returns, as there’s all sorts of goodness locked up in those PDFs.

However, while we running through the all these charities, we wondered if any of them had social networking info easily available (i.e. on their front page). It turns out some of the bigger ones did, and so we visited their sites and pulled out that info (it’s fairly easy to look for links for twitter/facebook/youtube etc on a home page). Here’s an example social networking info, again for the NSPCC.
Social networking info for charities

[Incidentally, doing this threw up some errors in the Charity Register, most commonly websites that are listed as http://http://some.charity.org.uk, which in itself shows the benefit of opening up the data. All we need now is a way of communicating that to the Charity Commission.]

We also (after way too many hours wasted messing around with cookies and hidden form fields) figured out how to get the list of charities recently added, with the result that we can check every night for new charities added in the past 24 hours, and add those to the database.

Latest charities added to register

This means not only can we keep OpenCharities up to date, it also means we can offer an RSS feed of the latest charities. And if that’s updated a bit too frequently for you (some days there are over 20 charities added), you can always restrict to a given search term, e.g http://OpenCharities/charities.rss?term=children for those charities with children in the title.

Finally, we’ve been looking at what other datasets we could link with the register, and I thought  a good one might be the list of grants given out by the various National Lottery funding bodies (which fortunately had already been scraped by the very talented Julian Todd using ScraperWiki).

Then it was a fairly simple matter of tying together the recipients with the register, and voila, you have something like this:

Example of National Lottery grant info for a charity

Note, at the time of writing, the import and match of the data is still going on, but should be finished by the end of today.

We’ll also add some simple functionality to show payments from local councils that’s being published in the local council spending data. The information’s already in the database (and is actually shown on the OpenlyLocal page for the charity); I just haven’t got around to displaying it on OpenCharities yet. Expect that to appear in the next day or so.

C

p.s. Big thanks to @ldodds and @pigsonthewing for helping with the RDF and microformats respectively

Written by countculture

September 23, 2010 at 2:40 pm

Introducing OpenCharities: Opening up the Charities Register

with 75 comments

A couple of weeks ago I needed a list of all the charities in the UK and their registration numbers so that I could try to match them up to the local council spending data OpenlyLocal is aggregating and trying to make sense of. A fairly simple request, you’d think, especially in this new world of transparency and open data, and for a dataset that’s uncontentious.

Well, you’d be wrong. There’s nothing at data.gov.uk, nothing at CKAN and nothing on the Charity Commission website, and in fact you can’t even see the whole register on the website, just the first 500 results of any search/category. Here’s what the Charities Commission says on their website (NB: extract below is truncated):

The Commission can provide an electronic copy in discharge of its duty to provide a legible copy of publicly available information if the person requesting the copy is happy to receive it in that form. There is no obligation on the Commission to provide a copy in this form…

The Commission will not provide an electronic copy of any material subject to Crown copyright or to Crown database right unless it is satisfied… that the Requestor intends to re-use the information in an appropriate manner.

Hmmm. Time for Twitter to come to the rescue to check that some other independently minded person hasn’t already solved the problem. Nothing, but I did get pointed to this request for the data to be unlocked, with the very recent response by the Charity Commission, essentially saying, “Nope, we ain’t going to release it”:

For resource reasons we are not able to display the entire Register of Charities. Searches are therefore limited to 500 results… We cannot allow full access to all the data, held on the register, as there are limitations on the use of data extracted from the Register… However, we are happy to consider granting access to our records on receipt of a written request to the Departmental Record Officer

OK, so it seems as though they have no intention of making this data available anytime soon (I actually don’t buy that there are Intellectual Property or Data Privacy issues with making basic information about charities available, and if there really are this needs to be changed, pronto), so time for some screen-scraping. Turns out it’s a pretty difficult website to scrape, because it requires both cookies and javascript to work properly.

Try turning off both in your browser, and see how far you get, and then you’ll also get an idea of how difficult it is to use if you have accessibility issues – and check out their poor excuse for accessibility statement, i.e. tough luck.

Still, there’s usually a way, even if it does mean some pretty tortuous routes, and like the similarly inaccessible Birmingham City Council website, this is just the sort of challenge that stubborn so-and-so’s like me won’t give up on.

And the way to get the info seems to be through the geographical search (other routes relied upon Javascript), and although it was still problematic, it was doable. So, now we have an open data register of charities, incorporated into OpenlyLocal, and tied in to the spending data being published by councils.

Charity supplier to Local authority

And because this sort of thing is so easy, once you’ve got it in a database (Charity Commission take note), there are a couple of bonuses.

First, it was relatively easy to knock up a quick and very simple Sinatra application, OpenCharities:

Open Charities :: Opening up the UK Charities Register

If there’s any interest, I’ll add more features to it, but for now, it’s just a the simplest of things, a web application with a unique URL for every charity based on its charity number, and with the  basic information for each charity is available as data (XML, JSON and RDF). It’s also searchable, and sortable by most recent income and spending, and for linked data people there are dereferenceable Resource URIs.

This is very much an alpha application: the design is very basic and it’s possible that there are a few charities missing – for two reasons. One: the Charity Commission kept timing out (think I managed to pick up all of those, and they should get picked up when I periodically run the scraper); and two: there appears to be a bug in the Charity Commission website, so that when there’s between 10 and 13 entries, only 10 are shown, but there is no way of seeing the additional ones. As a benchmark, there are currently 150,422 charities in the OpenCharities database.

It’s also worth mentioning that due to inconsistencies with the page structure, the income/spending data for some of the biggest charities is not yet in the system. I’ve worked out a fix, and the entries will be gradually updated, but only as they are re-scraped.

The second bonus is that the entire database is available to download and reuse (under an open, share-alike attribution licence). It’s a compressed CSV file, weighing in at just under 20MB for the compressed version, and should probably only attempted by those familiar with manipulating large datasets (don’t try opening it up in your spreadsheet, for example). I’m also in the process of importing it into Google Fusion Tables (it’s still churning away in the background) and will post a link when it’s done.

Now, back to that spending data.

Written by countculture

September 6, 2010 at 1:15 pm

A Local Spending Data wish… granted

with 25 comments

The very wonderful Stuart Harrison (aka pezholio), webmaster at Lichfield District Council, blogged yesterday with some thoughts about the publication of spending data following a local spending data workshop in Birmingham. Sadly I wasn’t able to attend this, but Stuart gives a very comprehensive account, and like all his posts it’s well worth reading.

In it he made an important observation about those at the workshop who were pushing for linked data from the beginning, and wished there was a solution. First the observation:

There did seem to be a bit of resistance to the linked data approach, mainly because agreeing standards seems to be a long, drawn out process, which is counter to the JFDI approach of publishing local data… I also recognise that there are difficulties in both publishing the data and also working with it… As we learned from the local elections project, often local authorities don’t even have people who are competent in HTML, let alone RDF, SPARQL etc.

He’s not wrong there. As someone who’s been publishing linked data for some time, and who conceived and ran the Open Election Data project Stuart refers to, working with numerous councils to help them publish linked data I’m probably as aware of the issues as anyone (ironically and I think significantly none of the councils involved in the local government e-standards body, and now pushing so hard for the linked data, has actually published any linked data themselves).

That’s not to knock linked data – just to be realistic about the issues and hurdles that need to be overcome (see the report for a full breakdown), and that to expect all the councils to solve all these problems at the same time as extracting the data from their systems, removing data relating to non-suppliers (e.g. foster parents), and including information from other systems (e.g. supplier data, which may be on procurement systems), and all by January, is  unrealistic at best, and could undermine the whole process.

So what’s to be done? I think the sensible thing, particularly in these straitened times, is to concentrate on getting the raw data out, and as much of it as possible, and come down hard on those councils who publish it badly (e.g. by locking it up in PDFs or giving it a closed licence), or who willfully ignore the guidance (it’s worrying how few councils publishing data at the moment don’t even include the transaction ID or date of the transaction, never mind supplier details).

Beyond that we should take the approach the web has always done, and which is the reason for its success: a decentralised, messy variety of implementations and solutions that allows a rich eco-system to develop, with government helping solve bottlenecks and structural problems rather than trying to impose highly centralised solutions that are already being solved elsewhere.

Yes, I’d love it if the councils were able to publish the data fully marked up, in a variety of forms (not just linked data, but also XML and JSON), but the ugly truth is that not a single council has so far even published their list of categories, never mind matched it up to a recognised standard (CIPFA BVACOP, COFOG or that used in their submissions to the CLG), still less done anything like linked data. So there’s a long way to go, and in the meantime we’re going to need some tools and cheap commodity services to bridge the gap.

[In a perfect world, maybe councils would develop some open-source tools to help them publish the data, perhaps using something like Adrian Short’s Armchair Auditor code as the basis (this is a project that took a single council, WIndsor & Maidenhead, and added a web interface to the figures). However, when many councils don’t even have competent HTML skills (having outsourced much of it), this is only going to happen at a handful of councils at best, unless considerable investment is made.]

Stuart had been thinking along similar lines, and made a suggestion, almost a wish in fact:

I think the way forward is a centralised approach, with authorities publishing CSVs in a standard format on their website and some kind of system picking up these CSVs (say, on a monthly basis) and converting this data to a linked data format (as well as publishing in vanilla XML, JSON and CSV format).

He then expanded on the idea, talking about a single URL for each transaction, standard identifiers, “a human-readable summary of the data, together with links to the actual data in RDF, XML, CSV and JSON”. I’m a bit iffy about that ‘centralised approach’ phrase (the web is all about decentralisation), but I do think there’s an opportunity to help both the community and councils by solving some of these problems.

And  that’s exactly what we’ve done at OpenlyLocal, adding the data from all the councils who’ve published their spending data, acting as a central repository, generating the URLs, and connecting the data together to other datasets and identifiers (councils with Snac IDs, companies with Companies House numbers). We’ve even extracted data from those councils who unhelpfully try to lock up their data as PDFs.

There are at time of writing 52,443 financial transactions from 9 councils in the OpenlyLocal database. And that’s not all, there’s also the following features:

  • Each transaction is tied to a supplier record for the council, and increasingly these are linked to company info (including their company number), or other councils (there’s a lot of money being transferred between councils), and users can add information about the supplier if we haven’t matched it up.
  • Every transaction, supplier and company has a permanent unique URL and is available as XML and JSON
  • We’ve sorted out some of the date issues (adding a date fuzziness field for those councils who don’t specify when in the month or quarter a transaction relates to).
  • Transactions are linked to the URL from which the file was downloaded (and usually the line number too, though obviously this is not possible if we’ve had to extract it from a PDF), meaning anyone else can recreate the dataset should they want to.
  • There’s an increasing amount of analysis, showing ordinary users spending by month, biggest suppliers and transactions, for example.
  • The whole spending dataset is available as a single, zipped CSV file to download for anyone else to use.
  • It’s all open data.

There are a couple of features Stuart mentions that we haven’t yet implemented, for good reason.

First, we’re not yet publishing it as linked data, for the simple reason that the vocabulary hasn’t yet been defined, nor even the standards on which it will be based. When this is done, we’ll add this as a representation.

And although we use standard identifiers such as SNAC ids for councils (and wards) on OpenlyLocal, the URL structure Stuart mentions is not yet practical, in part because SNAC ids doesn’t cover all authorities (doesn’t include the GLA, or other public bodies, for example), and only a tiny fraction of councils are publishing their internal transaction ids.

Also we haven’t yet implemented comments on the transactions for the simple reason that distributed comment systems such as Disqus are javascript-based and thus are problematic for those with accessibility issues, and site-specific ones don’t allow the conversation to be carried on elsewhere (we think we might have a solution to this, but it’s at an early stage, and we’d be interested to hear other idea).

But all in all, we reckon we’re pretty much there with Stuart’s wish list, and would hope that councils can get on with extracting the raw data, publishing it in an open, machine-readable format (such as CSV), and then move to linked data as their resources allow.

Written by countculture

August 3, 2010 at 7:45 am

Local spending data in OpenlyLocal, and some thoughts on standards

with 3 comments

A couple of weeks ago Will Perrin and I, along with some feedback from the Local Public Data Panel on which we sit, came up with some guidelines for publishing local spending data. They were a first draft, based on a request by Camden council for some guidance, in light of the announcement that councils will have to start publishing details of spending over £500.

Now I’ve got strong opinions about standards: they should be developed from real world problems, by the people using them and should make life easier, not more difficult. It slightly concerned me that in this case I wasn’t actually using any of the spending data – mainly because I hadn’t got around to adding it in to OpenlyLocal yet.

This week, I remedied this, and pulled in the data from those authorities that had published their local spending data – Windsor & Maidenhead, the GLA and the London Borough of Richmond upon Thames. Now there’s a couple of sites (including Adrian Short’s Armchair Auditor, which focuses on spending categories) already pulling the Windsor & Maidenhead data but as far as I’m aware they don’t include the other two authorities, and this adds a different dimension to things, as you want to be able to compare the suppliers across authorities.

First, a few pages from OpenlyLocal showing how I’ve approached it (bear in mind they’re a very rough first draft, and I’m concentrating on the data rather than the presentation). You can see the biggest suppliers to a council right there on the council’s main page (e.g. Windsor & Maidenhead, GLA, Richmond):

Click through to more info gets you a pagination view of all suppliers (in Windsor & Maidenhead’s case there are over 2800 so far):

Clicking any of these will give you the details for that supplier, including all the payments to them:

And clicking on the amount will give you a page just with the transaction details, so it can be emailed to others

But we’re getting ahead of ourselves. The first job is to import the data from the CSV files into a database and this was where the first problems occurred. Not in the CSV format – which is not a problem, but in the consistency of data.

Take Windsor & Maidenhead (you should just be able to open these files an any spreadsheet program). Looking at each data set in turn and you find that there’s very little consistency – the earliest sets don’t have any dates and aggregate across a whole quarter (but do helpfully have the internal Supplier ID as well as the supplier name). Later sets have the transaction date (although in one the US date format is used, which could catch out those not looking at them manually), but omit supplier ID and cost centre.

On the GLA figures, there’ a similar story, with the type of data and the names used to describe changing seemingly randomly between data sets. Some of the 2009 ones do have transaction dates, but the 2010 one generally don’t, and the supplier field has different names, from Supplier to Supplier Name to Vendor.

This is not to criticise those bodies – it’s difficult to produce consistent data if you’re making the rules up as you go along (and given there weren’t any established precedents that’s what they were doing), and doing much of it by hand. Also, they are doing it first and helping us understand where the problems lie (and where they don’t). In short they are failing forward –getting on with it so they can make mistakes from which they (and crucially others) can learn.

But who are these suppliers?

The bigger problem, as I’ve said before, is being able to identify the suppliers, and this becomes particularly acute when you want to compare across bodies (who may name the same company or body slightly differently). Ideally (as we put in the first draft of the proposals), we would have the company number (when we’re talking about a company, at any rate), but we recognised that many accounts systems simply won’t have this information, and so we do need some information that helps use identify them.

Why do we want to know this information? For the same reason we want any ID (you might as well ask why Companies House issues Company Numbers and requires all companies to put that number on their correspondence) – to positively identify something without messing around with how someone has decided to write the name.

With the help of the excellent Companies Open House I’ve had a go at matching the names to company numbers, but it’s only been partially successful. When it is, you can do things like this (showing spend with other councils on a suppliers’ page):

It’s also going to allow me to pull in other information about the company, from Companies House and elsewhere. For other bodies (i.e. those without a company number), we’re going to have to find another way of identifying them, and that’s next on the list to tackle.

Thoughts on those spending data guidelines

In general I still think they’re fairly good, and most of the shortcomings have been identified in the comments, or emailed to us (we didn’t explicitly state that the data should be available under an open licence such as the one at data.gov.uk, and be definitely should have done). However, adding this data to OpenlyLocal (as well as providing a useful database for the community) has crystalised some thoughts:

  • Identification of the bodies is essential, and it think we were right to make this a key point, but it’s likely we will need to have the government provide a lookup table between VAT numbers and Company Numbers.
  • Speaking of Government datasets, there’s no way of finding out the ancestry of a company – what its parent company is, what its subsidiaries are, and that’s essential if we’re to properly make use of this information, and similar information released by the government. Companies House bizarrely doesn’t hold this information, but the Office For National Statistics does, and it’s called the Inter Departmental Business Register. Although this contains a lot of information provided in confidence for statistical reasons, the relationships between companies isn’t confidential (it just isn’t stored in one place), so it would be perfectly feasible to release this information.
  • We should probably be explicit whether the figures should include VAT (I think the Windsor & Maidenhead ones don’t include it, but the GLA imply that theirs might).
  • Categorisation is going to be a tricky one to solve, as can be seen from the raw data for Windsor & Maidenhead – for example the Children’s Services Directorate is written as both Childrens Services & Children’s Services, and it’s not clear how this, or the subcateogries, ties into standard classifications for government spending, making comparison across authorities tricky.
  • I wonder what would be the downside to publishing the description details, even, potentially, the invoice itself. It’s probably FOI-able, after all.

As ever, comments welcome, and of course all the data is available through the API under an open licence.

C

Written by countculture

June 17, 2010 at 9:35 pm

New feature: search for information by postcode

leave a comment »

Why was it important that the UK government open up the geographic infrastructure? Because it makes so many location-based things that were tortuous, almost trivial.

Previously, getting open data about your local councillors, given just a postcode, was a tortuous business, requiring multiple calls to different sites. Now, it is easy. Just go to http://openlylocal.com/areas/postcodes/%5Byourpostcodehere%5D and, bingo, you’re done.

You can also just put your postcode in the search box on any OpenlyLocal page to do the same thing. And, obviously, you can also download the data as XML or JSON, and with an open data licence that allows reuse by anybody, even commercial reuse.

There’s still a little bit of tweaking to be done. I need to match up postcodes county electoral divisions, and I’m planning on adding RDF to the data types returned. Finally, it’d be great to show the ward boundaries on a map, but I think that may take a little more work.

Written by countculture

April 6, 2010 at 11:45 am

Tweeting councillors, and why open, connected data matters

leave a comment »

Cllr Tweeps twitter directory of UK councillors closes

A couple of days ago I heard that the rather excellent CllrTweeps website was closing down. At its heart, CllrTweeps was a directory of councillors on Twitter, matching them up to council and party. My first thought was, wow that’s a shame to let that all that accumulated data go to waste.

The second was, wouldn’t it be great to put it on OpenlyLocal as open data, then not only would it be available to everyone via the API but it would also link the twitter accounts not just to the council, but also the ward, committees and so on.

So I dropped CllrTweeps a quick note, and Dafydd and James, the guys behind CllrTweeps, were well up for it. Within less than 48 hours, they’d sent me the data, agreed to make it open data, and I’d matched the first batch against the councillor records already on OpenlyLocal. What’s more, as a bonus, they’d also been collating info on councillor blogs, and so we could add that too.

Why is all this important — after all there are other pretty good sites listing councilors on twitter (although I’m not sure they’re as extensive as the CllrTweeps list)? It matters for the same reason as it was worth doing the open data Hyperlocal Directory (which is going gangbusters).

The  point is not who is maintaining the list — whether it’s twitter accounts or hyperlocal sites. What matter is whether the information is open for reuse by hyperlocal sites, bloggers, mashups, or anybody else and whether that information is it able to be connected to other bits of information, or is it — like the government data we often criticise — locked up in its own silo, not able to be matched to or combined with other information.

There’s a few tweeks we’re going to adding over the next couple of weeks, but for now if you’re a tweeting councillor (county/district/borough for the moment; parish and town councillors soon), let us know by tweeting to @OpenlyLocal with the hashtag #ukcouncillors (e.g. like this) and either the URL address of your OpenlyLocal page or your council.

Even better, you’ll automatically be added to the twitter list of UK local councillors we’ve started (see below). Finally if you have a blog and you include the URL address of that in the tweet we can add that to the info on your OpenlyLocal page.

List of UK local councillors who tweet

p.s Because the twitter accounts on OpenlyLocal are open data, there’s obviously no reason why they can’t be combined with other such listings. Hopefully we can get this arrangement to be reciprocal 😉

Written by countculture

February 13, 2010 at 12:37 pm

Yet another UK Hyperlocal Directory… but this time it’s open data

with 13 comments

At OpenlyLocal we’ve long been fans of hyperlocal sites, seeing them as a crucial part of the media future as the traditional local media dies or is cut back to a shadow of its former self.

And for a while I’ve been looking for a good directory of such sites, whether pure community ones such as HaringayOnline, ones with serious journalistic depth such as Pits’N’Pots, The Lichfield Blog, or all-rounders such as VentnorBlog (who do so many things well). Mainly I wanted it for selfish reasons, so I could make OpenlyLocal a better site, by linking to relevant hyperlocal sites on council pages.

Seems to me the community could do with such a thing too, as a way of new sites alerting the community to them and of course help with their google juice. Sure, there are a few — recently the one over at hyperlocal.co.uk has been getting stronger, and is now pretty good — but there are problems, at least from my perspective.

So what are these, and why have I spent the past couple of days doing a UK hyperlocal directory as part of OpenlyLocal. Three reasons:

  1. Most importantly, I thought the directory should be open data which could be reused by anyone and not just by the person or company running the directory. The one at hyperlocal.co.uk isn’t (as far as I can tell), and so if you wanted to to put the information on your website, say to allow people to see the closest hyperlocal sites to them, you couldn’t.
  2. I thought such a directory should be run by someone who wasn’t publishing a hyperlocal site or several hyperlocal sites. Perception is important in these matters, and conflicts of interests have a way of raising their head despite the best intentions.
  3. There lots of useful things we can do when we know the location of a hyperlocal site, not just put it on a map. We can use the info in mashups, we can use it in tweets, and we can find the nearest sites to a given address — if the info is made available as open data.

So after a couple of days of coding we have the first draft of the OpenlyLocal UK Hyperlocal Directory.

Here’s how it’s different:

  1. The information on the OpenlyLocal UK Hyperlocal Directory is licensed under the CC SA licence, and can be reused by anyone.
  2. You can enter you own data. Just go to http://openlylocal.com/hyperlocal_sites, click on “Add your hyperlocal site” and fill in the form. Even specifying the area covered should be a breeze — you just drag the pointer on the map to where the blog is about, and you can also chose the radius of the circle covered by the site. We aim to approve all sites within 24 hours, and you’ll be tweeted automatically on approval from the OpenlyLocal twitter account.
  3. We allow non-commercial and commercial sites. The only sites we won’t allow are those behind a paywall or those that are pure listings sites (and don’t have a significant news or community aspect). So even local newspaper sites can be included as long as there’s free access to them.
  4. People can search for the sites closest to to them — just put an address or postcode in the search form and it’ll give you the nearest ones with distance.
  5. The list can be output as XML or JSON data for mashups or anything else, as can the results of searches for closest sites.
  6. All approved sites also appear on the correct council’s page (just choose a council when you fill in your entry).

There’s more we could do with this, but really it’s about generating a community resource, and one that’s open data. So if you want to help build the first open directory of UK hyperlocal sites first open directory of UK hyperlocal sites , get over to http://OpenlyLocal.com/hyperlocal_sites and click on “Add your hyperlocal site“.

And if you’ve got any suggestions, leave them in the comments or contact me on twitter.

Written by countculture

January 13, 2010 at 12:04 pm

Posted in api, hyperlocal, open data

Online services provided by your council: rewiring LocalDirectGov

with 8 comments

One of the things I’ve had on my ToDo list for OpenlyLocal for a while was providing a a list of links to online services provided by each Local Authority.

Seemed like something that should be on the site, and available as structured data; it also looked like it should be fairly easy to do, as it’s a service that’s sort of provided by central government (LocalDirectGov), though with some shortcomings.

The problem is that from a usability point of view the Local DirectGov interface is a bit clunky. First you choose the service you want the link for, which means using an A-Z (always a bit of a problem). This is the landing page, and as you can see you’re on the A’s.

LocalDirectGov landing page

So let’s say you want Hazardous Waste. Is that under H or W? Actually it’s under W, so click on W, and then on “Waste – Hazardous” and a new window opens (why?). You then need to enter your postcode, town or council in a form and you’ll then be (usually) given a link to click through to get to the council page.

However, depending on what you put in there and what category you want you may be asked to choose a particular council or be told that you council does not provide the service online:

 

LocalDirectGov no service

Frustrating.

Now there is a limited way for external websites to interact with this service, using the ‘white-label’ Local DirectGov application. There’s even a case study. Basically, you download a list of services provided by each type of council, and then build a LocalDirectGov URL, which redirects to the council service.

Terrific. Not hard to do, even for a coder as slow as me. The only problem is that it doesn’t work. For the end user that is.

The thing is, there’s no way of knowing whether the local authority actually provides a given service online, and there’s a fair chance that the URL you’ve just built up will resolve to a bog-standard contact page, or even worse non-existent page resulting in a 404 error. Not great for users, and there appears no way of programmatically finding out if link will work, even though it’s there in Local DirectGov’s database (which is how it says that the service isn’t provided).

So, we’ve tried to fix on OpenlyLocal this and provide a better version. First we’ve collected up the useful data for each authority (i.e. where there’s a specific page to that subject, and not a 404 or generic “contact us” page). Then we’ve put it all on one page, and made it searchable too. It’s clean, simple, and works:

Council Services list

You can also search it from the main council page if you want to in an Ajaxy live-search way (obviously the search also works without javascript, for screenreaders and other text browsers):

Council page with services search

 

Finally, you can access the data through the API as XML or JSON. So far, we’ve done a little over half the local authorities, and should have all the rest done by sometime next week (it’s just a matter of tying the remaining local authorities to their LocalDirectGov IDs, which has to be done manually).

As ever, comments, bug reports and feature requests welcome.

Written by countculture

October 27, 2009 at 4:49 pm