countculture

Open data and all that

Archive for the ‘openlylocal’ Category

Not the way to build a Big Society: part1 NESTA

with 9 comments

I took a very frustrating phone call earlier today from NESTA, an organisation I’ve not had any dealings with it before, and don’t actually have a view about it, or at least didn’t.

It followed from an email I’d received a couple of days earlier, which read:

I am contacting you about a project NESTA  are currently working on in partnership with the Big Society Network called Your Local Budget.

Working with 10 pioneer local authorities, we are looking at how you can use participatory budgeting to develop new ways to give people a say in how mainstream local budgets are spent. Alongside this we will also be developing an online platform that enables members of the public to understand and scrutinise their local authority’s spending, and connect with each other to generate ideas for delivering better value for money in public spending.

We would like to share our thinking and get your thoughts on the online tool to get a sense of what is needed and where we can add value. You are invited to a round table discussion on Friday 19 November, 11am – 12.30pm at NESTA that will be chaired by Philip Colligan, Executive Director of the Public Services Lab. Following the meeting we intend to issue an invitation to tender for the online tool.

Apart from the short notice & terrible timing (it clashes with the Open Government Data Camp, to which you’d hope most of the people involved would be going), the main question I had was this:

Why?

I got the phone call because I couldn’t make the round table, and for some feedback, and this was the feedback I gave: I don’t understand why this is being done. At all.

Putting aside the participatory budgeting part (although this problem seems to be getting dealt with by Redbridge council and YouGov, whose solution is apparently being offered to all councils), there’s the question of the “online platform that enables members of the public to understand and scrutinise their local authority’s spending, and connect with each other to generate ideas for delivering better value for money in public spending.

Excuse me? Most of the data hasn’t been published yet, there are several known organisations and groups (including OpenlyLocal) that have publicly stated they going to to be importing this data and doing things with it – visualising it, and allowing different views and analysis. Additionally, OpenlyLocal is already talking with several newspaper groups to help them re-use the data, and we are constantly evolving how we match and present the data.

Despite this, Nesta seems to have decided that it’s going to spend public money on coming up with a tendered solution to solve a problem that may be solved for zero cost by the private sector. Now I’m no roll-back-the-government red-in-tooth-and-claw free marketeer, but this is crazy, and I said as much to the person from Nesta.

Is the roundtable to decide whether the project should be done, or what should be done? I asked. The latter I was told. So, they’ve got some money and  have decided they’re going to spend it, even though the need may not be there. At a time when welfare payments are being cut, essential services are being slashed, for this sort of thing to happen is frankly outrageous.

There are other concerns here too – I personally think websites such as this are not suitable for a tender process, as that doesn’t encourage or often even allow the sort of agile, feedback-led process that produces the best websites. They also favour those who make their living by tendering.

So, Nesta, here’s a suggestion. Park this idea for 12 months, and in the meantime give the money back to the government. If you want to act as an angel funding then act as such (and the ones I’ve come across don’t do tendering). A reminder, your slogan is ‘making innovation flourish’, but sometimes that means stepping back and seeing what happens. This is not the way to building a Big Society

Advertisements

Written by countculture

November 17, 2010 at 2:27 pm

Open data, fraud… and some worrying advice

with 6 comments

One of the most commonly quoted concerns about publishing public data on the web is the potential for fraud – and certainly the internet has opened up all sorts of new routes to fraud, from Nigerian email scams, to phishing for bank accounts logins, to key-loggers to indentity theft.

Many of these work using two factors – the acceptance of things at face value (if it looks like an email from your bank, it is an email from them), and flawed processes designed to stop fraud but which inconvenience real users while making life easy from criminals.

I mention this because of some pending advice from the Local Government Association to councils regarding the publication of spending data, which strikes me as not just flawed, but highly dangerous and an invitation to fraudsters.

The issue surrounds something that may seem almost trivial, but bear with me – it’s important, and it’s off such trivialities that fraudsters profit.

In the original guidance for councils on publishing spending data we said that councils should publish both their internal supplier IDs and the supplier VAT numbers, as it would greatly aid the matching of supplier names to real-world companies, charities and other organisations, which is crucial in understanding where a local council’s money goes.

When the Local Government Association published its Guidance For Practitioners it removed those recommendations in order to prevent fraud. It has also suggested using the internal supplier ID as a unique key to confirm supplier identity. This betrays a startling lack of understanding, and worse opens up a serious vector to allow criminals to defraud councils of large sums of money.

Let’s take the VAT numbers first. The main issue here appears to be so-called missing trader fraud, whereby VAT is fraudulently claimed back from governments. Now it’s not clear to me that by publishing VAT numbers for supplier names that this fraud is made easier, and you would think the Treasury who recommend publishing the VAT numbers for suppliers in their guidance (PDF) would be alert to this (I’m told they did check with HMRC before issuing their guidance).

However, that’s not the point. If it’s about matching VAT numbers to supplier names there’s already several routes for doing this, with the ability to retrieve tens of thousands of them in the space of an hour or so, including this one:

http://www.google.co.uk/#sclient=psy&hl=en&q=%27vat+number+gb%27+site:com

Click on that link and you’ll get something like this:

Whether you’re a programmer or not, you should be able to see that it’s a trivial matter to go through those thousands of results and extract the company name and VAT number, and bingo, you’ve got that which the LGA is so keen for you not to have. So those who are wanting to match council suppliers don’t get the help a VAT number would give, and fraudsters aren’t disadvantaged at all.

Now, let’s turn to the rather more serious issue of internal Supplier IDs. Let me make it clear here, when matching council or central government suppliers, internal Supplier IDs are useful, make the job easier, and the matching more accurate, and also help with understanding how much in total redacted payees are receiving (you’d be concerned if a redacted person/company received £100,000 over the course of a year, and without some form of supplier ID you won’t know that). However, it’s not some life-or-death battle over principle for me.

The reason the LGA, however, is advising councils not to publish them is much more serious, and dangerous. In short, they are proposing to use the internal Supplier ID as a key to confirm the suppliers identity, and so allow the supplier to change details, including the supplier bank account (the case brought up here to justify this was the recent one of South Lanarkshire, which didn’t involve any information published as open data, just plain old fraudster ingenuity).

Just think about that for a moment, and then imagine that it’s the internal ID number they use for you in connection with paying your housing benefits. If you want to change your details, say you wanted to pay the money into a different bank account, you’d have to quote it – and just how many of us would have somewhere both safe to keep it and easy to find (and what about when you separated from your partner).

Similarly, where and how do we really think suppliers are going to keep this ID (stuck on a post-it note to the accounts receivable’s computer screen?), and what happens when they lose it? How do they identify themselves to find out what it is, and how will a council go about issuing a new one should the old one be compromised – is there any way of doing this except by setting up a new supplier record, with all the problems that brings.

And how easy would it be to do a day or two’s temping in a council’s accounts department and do a dump/printout of all the Supplier IDs, and then pass them onto fraudsters. The possibilities – for criminals – are almost limitless, and the Information Commissioner’s Office should put a stop to this at once if it is not to lose a serious amount of credibility.

But there’s an bigger underlying issue here, and it’s not that organisations such as the LGA don’t get data (although that is a problem), it’s that such bodies think that by introducing processes they can engineer out all risk, and that leads to bad decisions. Tell someone that suppliers changing bank accounts is very rare and should always be treated with suspicion and fraud becomes more difficult; tell someone that they should accept internal supplier IDs as proof of identity and it becomes easy.

Government/big-company bureaucrats not only think like government/big-company bureaucrats, they build processes that assumes everyone else does. The problem is that that both makes more difficult for ordinary citizens (as most encounters with bureaucracy make clear), and also makes it easy for criminals (who by definition don’t follow the rules).

Written by countculture

October 26, 2010 at 11:38 am

Opening up council accounts… and open procurement

with 8 comments

Since OpenlyLocal started pulling in council spending data, it’s niggled at me that it’s only half the story. Yes, as more and more data is published you’re beginning to get a much clearer idea of who’s paid what. And if councils publish it at a sufficient level of detail and consistently categorised, we’ll have a pretty good idea of what it’s spent on too.

However, useful though that is, that’s like taking a peak at a company’s bank statement and thinking it tells the whole story. Many of the payments relate to goods or services delivered some time in the past, some for things that have not yet been delivered, and there are all sorts of things (depreciation, movements between accounts, accruals for invoices not yet received) that won’t appear on there.

That’s what the council’s accounts are for — you know, those impenetrable things locked up in PDFs in some dusty corner of the council’s website, all sufficiently different from each other to make comparison difficult:

For some time, the holy grail for projects like OpenlyLocal and Where Does My Money Go has been to get the accounts in a standardized form to make comparison easy not just for accountants but for regular people too.

The thing is, such a thing does exist, and it’s sent by councils to central Government (the Department for Communities and Local Government to be precise) for them to use in their own figures. It’s a fairly hellishly complex spreadsheet called the Revenue Outturn form that must be filled in by the council (to get an idea have a look at the template here).

They’re not published anywhere by the DCLG, but they contain no state secrets or sensitive information; it’s just that the procedure being followed is the same one as they’ve always followed, and so they are not published, even after the statistics have been calculated from the data (the Statistics Act apparently prohibit publication until the stats have been published).

So I had an idea: wouldn’t it be great if we could pull the data that’s sitting in all these spreadsheets into a database and so allow comparison between councils’ accounts, thus freeing it from those forgotten corners of government computers.

This would seem to be a project that would be just about simple enough to be doable (though it’s trickier than it seems) and could allow ordinary people to understand their council’s spending in all sorts of ways (particularly if we add some of those sexy Where Does My Money Go visualisations). It could also be useful in ways that we can barely imagine  – some of the participatory budget experiments going in on in Redbridge and other councils would be even more useful if the context of similar councils spending was added to the mix.

So how would this be funded. Well, the usual route would be for DCLG or perhaps the one of the Local Government Association bodies such as IDeA to scope out a proposal, involving many hours of meetings, reams of paper, and running up thousands of pounds in costs, even before it’s started.

They’d then put the process out to tender, involving many more thousands in admin, and designed to attract those companies who specialise in tendering for public sector work. Each of those would want to ensure they make a profit, and so would work out how they’re going to do it before quoting, running up their own costs, and inflating the final price.

So here’s part two of my plan, instead going down that route, I’d come up with a proposal that would:

  • be a fraction of that cost
  • be specified on a single sheet of paper
  • paid for only if I delivered

Obviously there’s a clear potential conflict of interest here – I sit on the government’s Local Public Data Panel and am pushing strongly for open data, and also stand to benefit (depending on how good I am at getting the information out of those hundreds of spreadsheets, each with multiple worksheets, and matching the classification systems). The solution to that – I think – is to do the whole thing transparently, hence this blog post.

In a sense, what I’m proposing is that I scope out the project, solving those difficult problems of how to do it, with the bonus of instead of delivering a report, I deliver the project.

Is it a good thing to have all this data imported into a database, and shown not just on a website in a way non-accountants can understand, but also available to be combined with other data in mashups and visualisations? Definitely.

Is it a good deal for the taxpayer, and is this open procurement a useful way of doing things? Well you can read the proposal for yourself here, and I’d be really interested in comments both on the proposal and the novel procurement model.

A simple demand: let us record council meetings

with 16 comments

A couple of months ago we had the ridiculous situation of a local council hauling up one of their councillors in front of a displinary hearing for posting videos of the council meeting on YouTube.

The video originated from the council’s own webcasts, and the complaint by Councillor Kemble was that in posting these videos on YouTube, another councillor, Jason Kitcat

(i) had failed to treat his fellow councillors with respect, by posting the clips without the prior knowledge or express permission of Councillor Theobald or Councillor Mears; and
(ii) had abused council facilities by infringing the copyright in the webcast images

and in doing so had breached the Members Code of Conduct.

Astonishingly, the standards committee found against Kitcat and ruled he should be suspended for up to six months if he does not write an apology to Cllr Theobald and submit to re-training on the roles and responsibilities of being a councillor, and it is only the fact that he is appealing to the First-Tier Tribunal (which apparently the council has decided to fight using hire outside counsel) that has allowed him to continue.

It’s worth reading the investigator’s report (PDF, of course) in full for a fairly good example of just how petty and ridiculous these issues become, particularly when the investigator writes things such as:

I consider that Cllr Kitcat did use the council’s IT facilities improperly for political purposes. Most of the clips are about communal bins, a politically contentious issue at the time. The clips are about Cllr Kitcat holding the administration politically to account for the way the bins were introduced, and were intended to highlight what the he believed were the administration’s deficiencies in that regard, based on feedback from certain residents.
Most tellingly, clip no. 5 shows the Cabinet Member responsible for communal bins in an unflattering and politically unfavourable light, and it is hard to avoid the conclusion that this highly abridged clip was selected and posted for political gain.

The using IT facilities, refers, by the way, not to using the council’s own computers to upload or edit the videos (it seems agreed by all that he used his own computer for this), but the fact that the webcasts were made and published on the web using the council’s equipment (or at least those of its supplier, Public-i). Presumably it he’d taken an extract from the minutes of a meeting published on the council’s website that would also have been using the council’s IT resources.

However, let’s step back a bit. This, ultimately, is not about councillors not understanding the web, failing to get get new technology and the ways it can open up debate. This is not even about the somewhat restrictive webcasting system which apparently only has the past six month’s meetings and is somewhat unpleasant to use (particularly if you use a Mac, or Linux — see a debate of the issues here).

This is about councillors failing to understand democracy, about the ability to taking the same material and making up your own mind, and critically trying to persuade others of that view.

In fact the investigator’s statement above, taking “a politically contentious issue at the time… holding the administration politically to account for the way the bins were introduced… to highlight what the he believed were the administration’s deficiencies in that regard” is surely a pretty good benchmark for a democracy.

So here’s simple suggestion for those drawing up the local government legislation at the moment, no let’s make that a demand, since that’s what it should be in a democracy (not a subservient request to your ‘betters’):

Give the public the right to record any council meeting using any device using Flip cams, tape recorders, frankly any darned thing they like as long as it doesn’t disrupt the meeting.

Not only would this open up council meetings and their obscure committees to wider scrutiny, it would also be a boost to hyperlocal sites that are beginning to take the place of the local media.

And if councils want to go to the expense of webcasting their meetings, then require them to make the webcasts available to download under an open licence. That way people can share them, convert them into open formats that don’t require proprietary software, subtititle them, and yes, even post them on YouTube.

I can already hear local politicians saying it will reduce the quality of political discourse, that people may use it in ways they don’t like and can’t control.

Does this seem familiar? It should. It’s the same arguments being given against publishing raw data. The public won’t understand. There may be different interpretations. How will people use it?

Well, folks that’s the point of a democracy. And that’s the point of a data democracy. We can use it in any way we damn well please. The public record is not there to make incumbent councillors or senior staff memebers look good. It’s there to allow the to be held to account. And to allow people to make up their own minds. Stop that, and you’re stopping democracy.

Links: For more posts relating to this case, see also Jason Kitcat’s own blog postsBrighton Argus post, and posts form Mark Pack at Liberal Democrat voice, Jim Killock,  Conservative Home, and even a tweet from Local Government minister Grant Shapps.

Written by countculture

September 27, 2010 at 12:46 pm

Drawing up the Local Spending Data guidelines… and how Google Docs saved the day

with 2 comments

Last Thursday, the Local Public Data Panel on which I sit approved the final draft of the guidelines for publishing by councils of their spending over £500 (version 1.0 if you like). These started back in June, with a document Will Perrin and I drew up in response to a request from Camden council, and attracted a huge number of really helpful comments.

Since then, things have moved on a bit. The loose guidelines were fine as a starting point, especially as at that time we were talking theoretically, and hadn’t really had any concrete situations or data to deal with, but from speaking to councils, and actually using the data it became clear the something much firmer was needed.

What followed then was the usual public sector drafting nightmare, with various Word documents being emailed around, people getting very territorial, offline conversations, and frankly something that wasn’t getting very far.

However, a week beforehand I’d successfully used a shared Google Spreadsheet to free up a similar problem. In that case there were a bunch of organisations (including OpenlyLocal, the Local Government Association and Department for Communities and Local Government) that needed an up-to-date list of councils publishing spending data, together with the licence, URL and whether it was machine-readable (Basically what Adrian Short was doing here at one time – I’d asked him if he wanted to do it, but he didn’t have the time to keep his up-to-date.) In addition, it was clear that we each knew about councils the others didn’t.

The answer could have been a dedicated web app, a Word document that was added to and emailed around (actually that’s what started to happen). In the end, it was something much simpler – a Google spreadsheet with edit access given to multiple people. I used the OpenlyLocal API to populate the basic structure (including OpenlyLocal URLs, which mean that anyone getting the data via the API, or as a CSV would have a place they could query for more data), and bingo, it was sorted.

So given this success, Jonathan Evans from the LGA and  I agreed to use the Google Docs approach with the spending guidelines. There are multiple advantages to this, but some are particularly relevant for tackling such a problem:

  • We can all work on the document at the same time, messaging each others as we go, avoiding the delays, arguments and territoriality of the document emailing approach.
  • The version tracking means that all your changes, not just those of the saved version are visible to all participants (and to people who subsequently become participants). This seems to lead to a spirit of collaboration rather than position-taking, and at least on this occasion avoided edit-wars.
  • The world can see the product of your work, without having to separately publish it (though see note below)

You can also automatically get the information as data, either through the Google Docs API or more likely in the case of a spreadsheet particularly, as a CSV file. Construct it with this in mind (i.e. 1 header row), and you’ve got something that can be instantly used in mashups and visualisations.

    Important note 1: The biggest problem with this approach in central government is Internet Explorer 6, which the Department of Communities & Local Government are stuck on and have no plans to upgrade. This means the approach only works when people are prepared to make the additions at home, or some other place that have a browser less than 9 years old.

    Important note 2: Despite having put together the spending scoreboard spreadsheet, we were hopeless at telling the wider world about it, meaning that Simon Rogers at the Guardian ended up duplicating much of the work. Interestingly he was missing some that we knew about, and vice versa, and I’ve offered him edit access to the main spreadsheet so we can all work together on the same one.

    Important note 3: A smaller but nevertheless irritating problem with Google Documents (and this seems to be true of Word and OpenOffice too) is that when they contain tables you get a mess of inaccessible HTML, with the result that when the spending guidance was put on the Local Public Data Panel website, the HTML had to be largely rewritten from scratch (by one of the data.gov.uk stars late at night). So Google, if you’re listening, please allow an option to export as accessible HTML.

    Written by countculture

    September 13, 2010 at 8:24 am

    Introducing OpenCharities: Opening up the Charities Register

    with 75 comments

    A couple of weeks ago I needed a list of all the charities in the UK and their registration numbers so that I could try to match them up to the local council spending data OpenlyLocal is aggregating and trying to make sense of. A fairly simple request, you’d think, especially in this new world of transparency and open data, and for a dataset that’s uncontentious.

    Well, you’d be wrong. There’s nothing at data.gov.uk, nothing at CKAN and nothing on the Charity Commission website, and in fact you can’t even see the whole register on the website, just the first 500 results of any search/category. Here’s what the Charities Commission says on their website (NB: extract below is truncated):

    The Commission can provide an electronic copy in discharge of its duty to provide a legible copy of publicly available information if the person requesting the copy is happy to receive it in that form. There is no obligation on the Commission to provide a copy in this form…

    The Commission will not provide an electronic copy of any material subject to Crown copyright or to Crown database right unless it is satisfied… that the Requestor intends to re-use the information in an appropriate manner.

    Hmmm. Time for Twitter to come to the rescue to check that some other independently minded person hasn’t already solved the problem. Nothing, but I did get pointed to this request for the data to be unlocked, with the very recent response by the Charity Commission, essentially saying, “Nope, we ain’t going to release it”:

    For resource reasons we are not able to display the entire Register of Charities. Searches are therefore limited to 500 results… We cannot allow full access to all the data, held on the register, as there are limitations on the use of data extracted from the Register… However, we are happy to consider granting access to our records on receipt of a written request to the Departmental Record Officer

    OK, so it seems as though they have no intention of making this data available anytime soon (I actually don’t buy that there are Intellectual Property or Data Privacy issues with making basic information about charities available, and if there really are this needs to be changed, pronto), so time for some screen-scraping. Turns out it’s a pretty difficult website to scrape, because it requires both cookies and javascript to work properly.

    Try turning off both in your browser, and see how far you get, and then you’ll also get an idea of how difficult it is to use if you have accessibility issues – and check out their poor excuse for accessibility statement, i.e. tough luck.

    Still, there’s usually a way, even if it does mean some pretty tortuous routes, and like the similarly inaccessible Birmingham City Council website, this is just the sort of challenge that stubborn so-and-so’s like me won’t give up on.

    And the way to get the info seems to be through the geographical search (other routes relied upon Javascript), and although it was still problematic, it was doable. So, now we have an open data register of charities, incorporated into OpenlyLocal, and tied in to the spending data being published by councils.

    Charity supplier to Local authority

    And because this sort of thing is so easy, once you’ve got it in a database (Charity Commission take note), there are a couple of bonuses.

    First, it was relatively easy to knock up a quick and very simple Sinatra application, OpenCharities:

    Open Charities :: Opening up the UK Charities Register

    If there’s any interest, I’ll add more features to it, but for now, it’s just a the simplest of things, a web application with a unique URL for every charity based on its charity number, and with the  basic information for each charity is available as data (XML, JSON and RDF). It’s also searchable, and sortable by most recent income and spending, and for linked data people there are dereferenceable Resource URIs.

    This is very much an alpha application: the design is very basic and it’s possible that there are a few charities missing – for two reasons. One: the Charity Commission kept timing out (think I managed to pick up all of those, and they should get picked up when I periodically run the scraper); and two: there appears to be a bug in the Charity Commission website, so that when there’s between 10 and 13 entries, only 10 are shown, but there is no way of seeing the additional ones. As a benchmark, there are currently 150,422 charities in the OpenCharities database.

    It’s also worth mentioning that due to inconsistencies with the page structure, the income/spending data for some of the biggest charities is not yet in the system. I’ve worked out a fix, and the entries will be gradually updated, but only as they are re-scraped.

    The second bonus is that the entire database is available to download and reuse (under an open, share-alike attribution licence). It’s a compressed CSV file, weighing in at just under 20MB for the compressed version, and should probably only attempted by those familiar with manipulating large datasets (don’t try opening it up in your spreadsheet, for example). I’m also in the process of importing it into Google Fusion Tables (it’s still churning away in the background) and will post a link when it’s done.

    Now, back to that spending data.

    Written by countculture

    September 6, 2010 at 1:15 pm

    A Local Spending Data wish… granted

    with 25 comments

    The very wonderful Stuart Harrison (aka pezholio), webmaster at Lichfield District Council, blogged yesterday with some thoughts about the publication of spending data following a local spending data workshop in Birmingham. Sadly I wasn’t able to attend this, but Stuart gives a very comprehensive account, and like all his posts it’s well worth reading.

    In it he made an important observation about those at the workshop who were pushing for linked data from the beginning, and wished there was a solution. First the observation:

    There did seem to be a bit of resistance to the linked data approach, mainly because agreeing standards seems to be a long, drawn out process, which is counter to the JFDI approach of publishing local data… I also recognise that there are difficulties in both publishing the data and also working with it… As we learned from the local elections project, often local authorities don’t even have people who are competent in HTML, let alone RDF, SPARQL etc.

    He’s not wrong there. As someone who’s been publishing linked data for some time, and who conceived and ran the Open Election Data project Stuart refers to, working with numerous councils to help them publish linked data I’m probably as aware of the issues as anyone (ironically and I think significantly none of the councils involved in the local government e-standards body, and now pushing so hard for the linked data, has actually published any linked data themselves).

    That’s not to knock linked data – just to be realistic about the issues and hurdles that need to be overcome (see the report for a full breakdown), and that to expect all the councils to solve all these problems at the same time as extracting the data from their systems, removing data relating to non-suppliers (e.g. foster parents), and including information from other systems (e.g. supplier data, which may be on procurement systems), and all by January, is  unrealistic at best, and could undermine the whole process.

    So what’s to be done? I think the sensible thing, particularly in these straitened times, is to concentrate on getting the raw data out, and as much of it as possible, and come down hard on those councils who publish it badly (e.g. by locking it up in PDFs or giving it a closed licence), or who willfully ignore the guidance (it’s worrying how few councils publishing data at the moment don’t even include the transaction ID or date of the transaction, never mind supplier details).

    Beyond that we should take the approach the web has always done, and which is the reason for its success: a decentralised, messy variety of implementations and solutions that allows a rich eco-system to develop, with government helping solve bottlenecks and structural problems rather than trying to impose highly centralised solutions that are already being solved elsewhere.

    Yes, I’d love it if the councils were able to publish the data fully marked up, in a variety of forms (not just linked data, but also XML and JSON), but the ugly truth is that not a single council has so far even published their list of categories, never mind matched it up to a recognised standard (CIPFA BVACOP, COFOG or that used in their submissions to the CLG), still less done anything like linked data. So there’s a long way to go, and in the meantime we’re going to need some tools and cheap commodity services to bridge the gap.

    [In a perfect world, maybe councils would develop some open-source tools to help them publish the data, perhaps using something like Adrian Short’s Armchair Auditor code as the basis (this is a project that took a single council, WIndsor & Maidenhead, and added a web interface to the figures). However, when many councils don’t even have competent HTML skills (having outsourced much of it), this is only going to happen at a handful of councils at best, unless considerable investment is made.]

    Stuart had been thinking along similar lines, and made a suggestion, almost a wish in fact:

    I think the way forward is a centralised approach, with authorities publishing CSVs in a standard format on their website and some kind of system picking up these CSVs (say, on a monthly basis) and converting this data to a linked data format (as well as publishing in vanilla XML, JSON and CSV format).

    He then expanded on the idea, talking about a single URL for each transaction, standard identifiers, “a human-readable summary of the data, together with links to the actual data in RDF, XML, CSV and JSON”. I’m a bit iffy about that ‘centralised approach’ phrase (the web is all about decentralisation), but I do think there’s an opportunity to help both the community and councils by solving some of these problems.

    And  that’s exactly what we’ve done at OpenlyLocal, adding the data from all the councils who’ve published their spending data, acting as a central repository, generating the URLs, and connecting the data together to other datasets and identifiers (councils with Snac IDs, companies with Companies House numbers). We’ve even extracted data from those councils who unhelpfully try to lock up their data as PDFs.

    There are at time of writing 52,443 financial transactions from 9 councils in the OpenlyLocal database. And that’s not all, there’s also the following features:

    • Each transaction is tied to a supplier record for the council, and increasingly these are linked to company info (including their company number), or other councils (there’s a lot of money being transferred between councils), and users can add information about the supplier if we haven’t matched it up.
    • Every transaction, supplier and company has a permanent unique URL and is available as XML and JSON
    • We’ve sorted out some of the date issues (adding a date fuzziness field for those councils who don’t specify when in the month or quarter a transaction relates to).
    • Transactions are linked to the URL from which the file was downloaded (and usually the line number too, though obviously this is not possible if we’ve had to extract it from a PDF), meaning anyone else can recreate the dataset should they want to.
    • There’s an increasing amount of analysis, showing ordinary users spending by month, biggest suppliers and transactions, for example.
    • The whole spending dataset is available as a single, zipped CSV file to download for anyone else to use.
    • It’s all open data.

    There are a couple of features Stuart mentions that we haven’t yet implemented, for good reason.

    First, we’re not yet publishing it as linked data, for the simple reason that the vocabulary hasn’t yet been defined, nor even the standards on which it will be based. When this is done, we’ll add this as a representation.

    And although we use standard identifiers such as SNAC ids for councils (and wards) on OpenlyLocal, the URL structure Stuart mentions is not yet practical, in part because SNAC ids doesn’t cover all authorities (doesn’t include the GLA, or other public bodies, for example), and only a tiny fraction of councils are publishing their internal transaction ids.

    Also we haven’t yet implemented comments on the transactions for the simple reason that distributed comment systems such as Disqus are javascript-based and thus are problematic for those with accessibility issues, and site-specific ones don’t allow the conversation to be carried on elsewhere (we think we might have a solution to this, but it’s at an early stage, and we’d be interested to hear other idea).

    But all in all, we reckon we’re pretty much there with Stuart’s wish list, and would hope that councils can get on with extracting the raw data, publishing it in an open, machine-readable format (such as CSV), and then move to linked data as their resources allow.

    Written by countculture

    August 3, 2010 at 7:45 am