countculture

Open data and all that

Posts Tagged ‘Councils

How to help build the UK’s open planning database: writing scrapers

with 21 comments

This post is by Andrew Speakman, who’s coordinating OpenlyLocal’s planning application work.

As Chris wrote in his last post announcing OpenlyLocal’s progress in building an open database of planning applications, while we can do the importing from the main planning systems, if we’re really going to cover the whole country, we’re going to need the community’s help. I’m going to be coordinating this effort and so I thought it would be useful to explain how we’re going to do this (you can contact me at planning@openlylocal.com).

First, we’re going to use the excellent ScraperWiki as the main platform for writing external scrapers. It supports Python, Ruby and PHP, and has worked well for similar schemes. It also means the scraper is openly available and we can see it in action. We will then use the Scraperwiki API to upload the data regularly into OpenlyLocal.

Second, we’re going to break the job into manageable chunks by focus on target groups of councils, and just to sweeten things – as if building a national open database of planning applications wasn’t enough ;-) – we’re going to offer small bounties (£75) for successful scrapers for these councils.

We have some particular requirements designed to make the system maintainable, and do things the right way, but not many are fixed in stone, so feel free to respond with suggestions if you want to do it in a different way.

For example, the scraper should keep itself current (running on a daily basis), but also behave nicely (not putting an excessive load on Scraperwiki or the target website by trying to get too much data in one go). In addition we propose that the scrapers should operate by updating current applications on a daily basis and also make inroads into the backlog by gathering a batch of previous applications.

We have set up three example scrapers that operate in the way we expect: Brent (Ruby), Nuneaton and Bedworth (Python) and East Sussex (Python). These scrapers perform 4 operations, as follows:

  1. Create new database records for any new applications that have appeared on the site since the last run and store the identifiers (uid and url).
  2. Create new database records of a batch of missing older applications and store the identifiers (uid and url). Currently the scrapers are set up to work backwards from the earliest stored application towards a target date in the past
  3. Update the most current applications by collecting and saving the full application details. At the moment the scrapers update the details of all applications from the past 60 days.
  4. Update the full application details of a batch of older applications where the uid and url has been collected (as above) but the application details are missing. At the moment the scrapers work backwards from the earliest “empty” application towards a target date in the past

The data fields to be gathered for each planning application are defined in this shared Google spreadsheet. Not all the fields will be available on every site, but we want all those that are there.

Note the following:

  • The minimal valid set of fields for an application is: ‘uid’, ‘description’, ‘address’, ‘start_date’ and ‘date_scraped’
  • The ‘uid’ is the database primary key field
  • All dates (except date_scraped) should be stored in ISO8601 format
  • The ‘start_date’ field is set to the earliest of the ‘date_received’ or ‘date_validated’ fields, depending on which is available
  • The ‘date_scraped’ field is a date/time (RFC3339) set to the current time when the full application details are updated. It should be indexed.

So how do you get started? Here’s a list of 10 non-standard authorities that you can choose from. Aberdeen, Aberdeenshire, Ashfield, Bath, Calderdale, Carmarthenshire, Consett, Crawley, Elmbridge, Flintshire. Have a look at the sites and then let me know if you want to reserve one and how long you think it will take to write your scraper.

Happy scraping.

Planning Alerts: first fruits

with 13 comments

PlanningAlerts is coming soon

Well, that took a little longer than planned…

[I won't go into the details, but suffice to say our internal deadline got squeezed between the combination of a fast-growing website, the usual issues of large datasets, and that tricky business of finding and managing coders who can program in Ruby, get data, and be really good at scraping tricky websites.]

But I’m pleased to say we’ve now well on our way to not just resurrecting PlanningAlerts in a sustainable, scalable way but a whole lot more too.

Where we’re heading: a open database of UK planning applications

First, let’s talk about the end goal. From the beginning, while we wanted to get PlanningAlerts working again – the simplicity of being able to put in your postcode and email address and get alerts about nearby planning applications is both useful and compelling – we also knew that if the service was going to be sustainable, and serve the needs of the wider community we’d need to do a whole lot more.

Particularly with the significant changes in the planning laws and regulations that are being brought in over the next few years, it’s important that everybody – individuals, community groups, NGOs, other websites, even councils – have good and open access to not just the planning applications in their area, but in the surrounding areas too.

In short, we wanted to create the UK’s first open database of planning applications, free for reuse by all.

That meant not just finding when there was a planning application, and where (though that’s really useful), but also capturing all the other data too, and also keep that information updated as the planning application went through the various stages (the original PlanningAlerts just scraped the information once, when it was found on the website, and even then pretty much just got the address and the description).

Of course, were local authorities to publish the information as open data, for example through an API, this would be easy. As it is, with a couple of exceptions, it means an awful lot of scraping, and some pretty clever scraping too, not to mention upgrading the servers and making OpenlyLocal more scalable.

Where we’ve got to

Still, we’ve pretty much overcome these issues and now have hundreds of scrapers working, pulling the information into OpenlyLocal from well over a hundred councils, and now have well over half a million planning applications in there.

There are still some things to be sorted out – some of the council websites seem to shut down for a few hours overnight, meaning they appear to be broken when we visit them, others change URLs without redirecting to the new ones, and still others are just, well, flaky. But we’ve now got to a stage where we can start opening up the data we have, for people to play around with, find issues with, and start to use.

For a start, each planning application has its own permanent URL, and the information is also available as JSON or XML:

There’s also a page for each council, showing the latest planning applications, and the information here is available via the API too:

There’s also a GeoRSS feed for each council too allowing you to keep up to date with the latest planning applications for your council. It also means you can easily create maps or widgets for the council, showing the latest applications of the council.

Finally, Andrew Speakman, who’d coincidentally been doing some great stuff in this area, has joined the team as Planning editor, to help coordinate efforts and liaise with the community (more on this below).

What’s next

The next main task is to reinstate the original PlanningAlert functionality. That’s our focus now, and we’re about halfway there (and aiming to get the first alerts going out in the next 2-3 weeks).

We’ve also got several more councils and planning application systems to add, and this should bring the number of councils we’ve got on the system to between 150 and 200. This will be an ongoing process, over the next couple of months. There’ll also be some much-overdue design work on OpenlyLocal so that the increased amount of information on there is presented to the user in a more intuitive way – please feel free to contact us if you’re a UX person/designer and want to help out.

We also need to improve the database backend. We’ve been using MySQL exclusively since the start, but MySQL isn’t great at spatial (i.e. geographic) searches, restricting the sort of functionality we can offer. We expect to sort this in a month or so, probably moving to PostGIS, and after that we can start to add more features, finer grained searches, and start to look at making the whole thing sustainable by offering premium services.

We’ll be working too on liaising with councils who want to offer their applications via an API – as the ever pioneering Lichfield council already does – or a nightly data dump. This not only does the right thing in opening up data for all to use, but also means we don’t have to scrape their websites. Lichfield, for example, uses the Idox system, and the web interface for this (which is what you see when you look at a planning application on Lichfield’s website) spreads the application details over 8 different web pages, but the API makes this available on a single URL, reducing the work the server has to do.

Finally, we’re going to be announcing a bounty scheme for the scraper/developer community to write scrapers for those areas that don’t use one of the standard systems. Andrew will be coordinating this, and will be blogging about this sometime in the next week or so (and you can contact him at planning at openlylocal dot com). We’ll also be tweeting progress at @planningalert.

Thanks for your patience.

Videoing council meetings redux: progress on two fronts

with 18 comments

Tonight, hyperlocal bloggers (and in fact any ordinary members of the public) got two great boosts in their access to council meetings, and their ability to report on them.

Windsor & Maidenhead this evening passed a motion to allow members of the public to video the council meetings. This follows on from my abortive attempt late last year to video one of W&M’s council meeting – see the full story here, video embedded below – following on from the simple suggestion I’d made a couple of months ago to let citizens video council meetings. I should stress that that attempt had been pre-arranged with a cabinet member, in part to see how it would be received – not well as it turned out. But having pushed those boundaries, and with I dare say a bit of lobbying from the transparency minded members, Windsor & Maidenhead have made the decision to fully open up their council meetings.

Separately, though perhaps not entirely coincidentally, the Department for Communities & Local Government tonight issued a press release which called on councils across the country to fully open up their meetings to the public in general and hyperlocal bloggers in particular.

Councils should open up their public meetings to local news ‘bloggers’ and routinely allow online filming of public discussions as part of increasing their transparency, Local Government Secretary Eric Pickles said today.

To ensure all parts of the modern-day media are able to scrutinise Local Government, Mr Pickles believes councils should also open up public meetings to the ‘citizen journalist’ as well as the mainstream media, especially as important budget decisions are being made.

Local Government Minister Bob Neill has written to all councils urging greater openness and calling on them to adopt a modern day approach so that credible community or ‘hyper-local’ bloggers and online broadcasters get the same routine access to council meetings as the traditional accredited media have.

The letter sent today reminds councils that local authority meetings are already open to the general public, which raises concerns about why in some cases bloggers and press have been barred.

Importantly, the letter also tells councils that giving greater access will not contradict data protection law requirements, which was the reason I was given for W&M prohibiting me filming.

So, hyperlocal bloggers, tweet, photograph and video away. Do it quietly, do it well, and raise merry hell in your blogs and local press if you’re prohibited, and maybe we can start another scoreboard to measure the progress. To those councils who videocast, make sure that the videos are downloadable under the Open Government Licence, and we’ll avoid the ridiculousness of councillors being disciplined for increasing access to the democratic process.

And finally if we can collectively think of a way of tagging the videos on Youtube or Vimeo with the council and meeting details, we could even automatically show them on the relevant meeting page on OpenlyLocal.

Written by countculture

February 22, 2011 at 11:32 pm

Videoing council meetings revisited: the limits of openness in a transparent council

with 10 comments

A couple of months ago, I blogged about the ridiculous situation of a local councillor being hauled up in front of the council’s standards committee for posting a council webcast onto YouTube, and worse, being found against (note: this has since been overturned by the First Tier Tribunal for Local Government Standards, but not without considerable cost for the people of Brighton).

At the time I said we should make the following demand:

Give the public the right to record any council meeting using any device using Flip cams, tape recorders, frankly any darned thing they like as long as it doesn’t disrupt the meeting.

Step forward councillor Liam Maxwell from the Royal Borough of Windsor & Maidenhead, who as the cabinet member for transparency has a personal mission to make RBWM the most transparent council in the country. I don’t see why you couldn’t do that our council, he said.

So, last night, I headed over to Maidenhead for the scheduled council meeting to test this out, and either provide a shining example for other councils, or show that even the most ‘transparent’ council can’t shed the pomposity and self-importance that characterises many council meetings, and allow proper open access.

The video below, less than two minutes long, is the result, and as you can see, they chose the latter course:

Interestingly, when asked why videoing was not allowed, they claimed ‘Data Protection’, the catch-all excuse for any public body that doesn’t want to publish, or open up, something. Of course, this is nonsense in the context of a public meeting,  and where all those being filmed were public figures who were carrying out a civic responsibility.

There’s also an interesting bit to the end when a councillor answered that they were ‘transparent’ in response to the observation that they were supposed to be open. This is the same old you-can-look-but-don’t touch attitude that has characterised much of government’s interactions with the public (and works so well at excluding people from the process). Perhaps naively, I was a little shocked to hear this from this particular council.

So there you have it. That, I guess, is where the boundaries of transparency lies at Windsor & Maidenhead. Why not test them out at your council, and perhaps we can start a new scoreboard at OpenlyLocal to go with the open data scoreboard,  and the 10:10 council scoreboard

Written by countculture

December 8, 2010 at 4:33 pm

Open data, fraud… and some worrying advice

with 6 comments

One of the most commonly quoted concerns about publishing public data on the web is the potential for fraud – and certainly the internet has opened up all sorts of new routes to fraud, from Nigerian email scams, to phishing for bank accounts logins, to key-loggers to indentity theft.

Many of these work using two factors – the acceptance of things at face value (if it looks like an email from your bank, it is an email from them), and flawed processes designed to stop fraud but which inconvenience real users while making life easy from criminals.

I mention this because of some pending advice from the Local Government Association to councils regarding the publication of spending data, which strikes me as not just flawed, but highly dangerous and an invitation to fraudsters.

The issue surrounds something that may seem almost trivial, but bear with me – it’s important, and it’s off such trivialities that fraudsters profit.

In the original guidance for councils on publishing spending data we said that councils should publish both their internal supplier IDs and the supplier VAT numbers, as it would greatly aid the matching of supplier names to real-world companies, charities and other organisations, which is crucial in understanding where a local council’s money goes.

When the Local Government Association published its Guidance For Practitioners it removed those recommendations in order to prevent fraud. It has also suggested using the internal supplier ID as a unique key to confirm supplier identity. This betrays a startling lack of understanding, and worse opens up a serious vector to allow criminals to defraud councils of large sums of money.

Let’s take the VAT numbers first. The main issue here appears to be so-called missing trader fraud, whereby VAT is fraudulently claimed back from governments. Now it’s not clear to me that by publishing VAT numbers for supplier names that this fraud is made easier, and you would think the Treasury who recommend publishing the VAT numbers for suppliers in their guidance (PDF) would be alert to this (I’m told they did check with HMRC before issuing their guidance).

However, that’s not the point. If it’s about matching VAT numbers to supplier names there’s already several routes for doing this, with the ability to retrieve tens of thousands of them in the space of an hour or so, including this one:

http://www.google.co.uk/#sclient=psy&hl=en&q=%27vat+number+gb%27+site:com

Click on that link and you’ll get something like this:

Whether you’re a programmer or not, you should be able to see that it’s a trivial matter to go through those thousands of results and extract the company name and VAT number, and bingo, you’ve got that which the LGA is so keen for you not to have. So those who are wanting to match council suppliers don’t get the help a VAT number would give, and fraudsters aren’t disadvantaged at all.

Now, let’s turn to the rather more serious issue of internal Supplier IDs. Let me make it clear here, when matching council or central government suppliers, internal Supplier IDs are useful, make the job easier, and the matching more accurate, and also help with understanding how much in total redacted payees are receiving (you’d be concerned if a redacted person/company received £100,000 over the course of a year, and without some form of supplier ID you won’t know that). However, it’s not some life-or-death battle over principle for me.

The reason the LGA, however, is advising councils not to publish them is much more serious, and dangerous. In short, they are proposing to use the internal Supplier ID as a key to confirm the suppliers identity, and so allow the supplier to change details, including the supplier bank account (the case brought up here to justify this was the recent one of South Lanarkshire, which didn’t involve any information published as open data, just plain old fraudster ingenuity).

Just think about that for a moment, and then imagine that it’s the internal ID number they use for you in connection with paying your housing benefits. If you want to change your details, say you wanted to pay the money into a different bank account, you’d have to quote it – and just how many of us would have somewhere both safe to keep it and easy to find (and what about when you separated from your partner).

Similarly, where and how do we really think suppliers are going to keep this ID (stuck on a post-it note to the accounts receivable’s computer screen?), and what happens when they lose it? How do they identify themselves to find out what it is, and how will a council go about issuing a new one should the old one be compromised – is there any way of doing this except by setting up a new supplier record, with all the problems that brings.

And how easy would it be to do a day or two’s temping in a council’s accounts department and do a dump/printout of all the Supplier IDs, and then pass them onto fraudsters. The possibilities – for criminals – are almost limitless, and the Information Commissioner’s Office should put a stop to this at once if it is not to lose a serious amount of credibility.

But there’s an bigger underlying issue here, and it’s not that organisations such as the LGA don’t get data (although that is a problem), it’s that such bodies think that by introducing processes they can engineer out all risk, and that leads to bad decisions. Tell someone that suppliers changing bank accounts is very rare and should always be treated with suspicion and fraud becomes more difficult; tell someone that they should accept internal supplier IDs as proof of identity and it becomes easy.

Government/big-company bureaucrats not only think like government/big-company bureaucrats, they build processes that assumes everyone else does. The problem is that that both makes more difficult for ordinary citizens (as most encounters with bureaucracy make clear), and also makes it easy for criminals (who by definition don’t follow the rules).

Written by countculture

October 26, 2010 at 11:38 am

New feature: one-click FoI requests for spending payments

with 4 comments

Thanks to the incredible work of Francis Irving at WhatDoTheyKnow, we’ve now added a feature I’ve wanted on OpenlyLocal since we started imported the local spending data: one-click Freedom of Information requests on individual spending items, especially those large ones.

This further lowers the barriers to armchair auditors wanting to understand where the money goes, and the request even includes all the usual ‘boilerplate’ to help avoid specious refusals. I’ve started it off with one to Wandsworth, whose poor quality of spending data I discussed last week.

And this is the result, the whole process having taken less than a minute:

The requests are also being tagged. This means that in the near future you’ll be able to see on a transaction page if any requests have already been made about it, and the status of those requests (we’re just waiting for WDTK to implement search by tags), which will be the beginning of a highly interconnected transparency ecosystem.

In the meantime it’s worth checking the transaction hasn’t been requested before confirming your request on the WDTK page (there’s a link to recent requests for the council on the WDTK form you get to after pressing the button).

I’m also trusting the community will use this responsibly, digging out information on the big stuff, rather than firing off multiple requests to the same council for hundreds of individual items (which would in any case probably be deemed vexatious under the terms of the FoI Act). At the moment the feature’s only enabled on transactions over £10,000.

Good places to start would be those multi-million-pound monthly payments which indicate big outsourcing deals, or large redacted payments (Birmingham’s got a few). Have a look at the spending dashboard for your council and see if there are any such payments.

A simple demand: let us record council meetings

with 16 comments

A couple of months ago we had the ridiculous situation of a local council hauling up one of their councillors in front of a displinary hearing for posting videos of the council meeting on YouTube.

The video originated from the council’s own webcasts, and the complaint by Councillor Kemble was that in posting these videos on YouTube, another councillor, Jason Kitcat

(i) had failed to treat his fellow councillors with respect, by posting the clips without the prior knowledge or express permission of Councillor Theobald or Councillor Mears; and
(ii) had abused council facilities by infringing the copyright in the webcast images

and in doing so had breached the Members Code of Conduct.

Astonishingly, the standards committee found against Kitcat and ruled he should be suspended for up to six months if he does not write an apology to Cllr Theobald and submit to re-training on the roles and responsibilities of being a councillor, and it is only the fact that he is appealing to the First-Tier Tribunal (which apparently the council has decided to fight using hire outside counsel) that has allowed him to continue.

It’s worth reading the investigator’s report (PDF, of course) in full for a fairly good example of just how petty and ridiculous these issues become, particularly when the investigator writes things such as:

I consider that Cllr Kitcat did use the council’s IT facilities improperly for political purposes. Most of the clips are about communal bins, a politically contentious issue at the time. The clips are about Cllr Kitcat holding the administration politically to account for the way the bins were introduced, and were intended to highlight what the he believed were the administration’s deficiencies in that regard, based on feedback from certain residents.
Most tellingly, clip no. 5 shows the Cabinet Member responsible for communal bins in an unflattering and politically unfavourable light, and it is hard to avoid the conclusion that this highly abridged clip was selected and posted for political gain.

The using IT facilities, refers, by the way, not to using the council’s own computers to upload or edit the videos (it seems agreed by all that he used his own computer for this), but the fact that the webcasts were made and published on the web using the council’s equipment (or at least those of its supplier, Public-i). Presumably it he’d taken an extract from the minutes of a meeting published on the council’s website that would also have been using the council’s IT resources.

However, let’s step back a bit. This, ultimately, is not about councillors not understanding the web, failing to get get new technology and the ways it can open up debate. This is not even about the somewhat restrictive webcasting system which apparently only has the past six month’s meetings and is somewhat unpleasant to use (particularly if you use a Mac, or Linux — see a debate of the issues here).

This is about councillors failing to understand democracy, about the ability to taking the same material and making up your own mind, and critically trying to persuade others of that view.

In fact the investigator’s statement above, taking “a politically contentious issue at the time… holding the administration politically to account for the way the bins were introduced… to highlight what the he believed were the administration’s deficiencies in that regard” is surely a pretty good benchmark for a democracy.

So here’s simple suggestion for those drawing up the local government legislation at the moment, no let’s make that a demand, since that’s what it should be in a democracy (not a subservient request to your ‘betters’):

Give the public the right to record any council meeting using any device using Flip cams, tape recorders, frankly any darned thing they like as long as it doesn’t disrupt the meeting.

Not only would this open up council meetings and their obscure committees to wider scrutiny, it would also be a boost to hyperlocal sites that are beginning to take the place of the local media.

And if councils want to go to the expense of webcasting their meetings, then require them to make the webcasts available to download under an open licence. That way people can share them, convert them into open formats that don’t require proprietary software, subtititle them, and yes, even post them on YouTube.

I can already hear local politicians saying it will reduce the quality of political discourse, that people may use it in ways they don’t like and can’t control.

Does this seem familiar? It should. It’s the same arguments being given against publishing raw data. The public won’t understand. There may be different interpretations. How will people use it?

Well, folks that’s the point of a democracy. And that’s the point of a data democracy. We can use it in any way we damn well please. The public record is not there to make incumbent councillors or senior staff memebers look good. It’s there to allow the to be held to account. And to allow people to make up their own minds. Stop that, and you’re stopping democracy.

Links: For more posts relating to this case, see also Jason Kitcat’s own blog postsBrighton Argus post, and posts form Mark Pack at Liberal Democrat voice, Jim Killock,  Conservative Home, and even a tweet from Local Government minister Grant Shapps.

Written by countculture

September 27, 2010 at 12:46 pm

Follow

Get every new post delivered to your Inbox.

Join 79 other followers