countculture

Open data and all that

Making OpenCharities even better… more features, more data, more charities

with 8 comments

I had a fantastic response to the launch of OpenCharities — my little side project to open up the Charity Commission’s Register of Charities — from individuals, from organisations representing the third sector, and from charities themselves.

There were also a few questions:

  • Could we pull out and expose via the api more info about the charities, especially the financial history?
  • How often would OpenCharities be updated and what about new charities added after we’d scraped the register?
  • Was there any possibility that we could add additional information from sources other than the Charity Register?

So, over the past week or so, we’ve been busy trying to answer those questions the best we could, mainly by just trying to get on and solve them.

First, additional info. After a terrifically illuminating meeting with Karl and David from NCVO, I had a much better idea of how the charity sector is structured, and what sort of information would be useful to people.

So the first thing I did was to rewrite the scraper and parser to pull in a lot more information, particularly the past 5 years income and spending and, for bigger charities the breakdown of that income and spending. (I also pulled in the remaining charities that had been missed the first time around, including removed charities.) Here’s what the NSPCC’s entry, for example, looks like now:

Example of financial info for charity

We are also now getting the list of trustees, and links to the accounts and Summary Information Returns, as there’s all sorts of goodness locked up in those PDFs.

However, while we running through the all these charities, we wondered if any of them had social networking info easily available (i.e. on their front page). It turns out some of the bigger ones did, and so we visited their sites and pulled out that info (it’s fairly easy to look for links for twitter/facebook/youtube etc on a home page). Here’s an example social networking info, again for the NSPCC.
Social networking info for charities

[Incidentally, doing this threw up some errors in the Charity Register, most commonly websites that are listed as http://http://some.charity.org.uk, which in itself shows the benefit of opening up the data. All we need now is a way of communicating that to the Charity Commission.]

We also (after way too many hours wasted messing around with cookies and hidden form fields) figured out how to get the list of charities recently added, with the result that we can check every night for new charities added in the past 24 hours, and add those to the database.

Latest charities added to register

This means not only can we keep OpenCharities up to date, it also means we can offer an RSS feed of the latest charities. And if that’s updated a bit too frequently for you (some days there are over 20 charities added), you can always restrict to a given search term, e.g http://OpenCharities/charities.rss?term=children for those charities with children in the title.

Finally, we’ve been looking at what other datasets we could link with the register, and I thought  a good one might be the list of grants given out by the various National Lottery funding bodies (which fortunately had already been scraped by the very talented Julian Todd using ScraperWiki).

Then it was a fairly simple matter of tying together the recipients with the register, and voila, you have something like this:

Example of National Lottery grant info for a charity

Note, at the time of writing, the import and match of the data is still going on, but should be finished by the end of today.

We’ll also add some simple functionality to show payments from local councils that’s being published in the local council spending data. The information’s already in the database (and is actually shown on the OpenlyLocal page for the charity); I just haven’t got around to displaying it on OpenCharities yet. Expect that to appear in the next day or so.

C

p.s. Big thanks to @ldodds and @pigsonthewing for helping with the RDF and microformats respectively

Written by countculture

September 23, 2010 at 2:40 pm

8 Responses

Subscribe to comments with RSS.

  1. […] This post was mentioned on Twitter by Paul Bradshaw, Glyn Moody, SteveALee, SteveALee, Philip John and others. Philip John said: RT @paulbradshaw: RT @CountCulture: Making OpenCharities even better… more features, more data, more charities http://bit.ly/9ZdgdF […]

  2. Have you had any response from the Charities Commission, yet?

    Andy Mabbett (@pigsonthewing)

    September 29, 2010 at 9:50 am

  3. Excellent work on this Chris.

    I hope you will be making the financial information and/or classification available for download soon.

    Map of UK charities from OpenCharities.org data

    David Pidsley

    October 3, 2010 at 10:59 am

    • David
      I’m aiming to. Just a matter of working out the best way to represent it in a CSV file and finding some time to do it. Open to suggestions. Re the classification, that’s on the list too but unlike the financial info we’re not currently scraping it. It’s not a problem to do so (though it takes a week or so to scrape the entire register), but I’d hoped the Charity Commission was going to respond to the request for a dialogue before then, to make this easier.
      Chris

      countculture

      October 3, 2010 at 12:06 pm

  4. Chris, do you have any plans to scrape the full accounts out of the pdfs, or are they way too inconsistent?

    Michael Grimes

    December 14, 2010 at 1:04 pm

    • Have been wondering about it, but it’s a tricky and messy task and think would need some form of crowdsourcing. However if you have a look at the xml or json view of major charities you see that we’ve got annual report data on them for past five years to quite a good degree. I’m just trying to work out how best to display that.

      countculture

      December 15, 2010 at 9:43 am

      • It occurred to me that they probably can’t be scraped anyway: ours at least appears to be a scanned image of the printed document, not text at all. (Or can scrapers cope with that? Showing my lack of knowledge here..!)

        Michael Grimes

        December 20, 2010 at 5:50 pm

  5. Love what you’ve put together Chris, and how you continually improve and add to an already very impressive resource.

    Would you mind if we have a quick chat about the information you’re providing? We are very interested in making good use of the data, but before we do I have a few questions for you.

    Please email me at your earliest convenience.

    Thanks in advance.

    Cheers!

    – Matthew
    Chief Technology Officer
    KULA Causes, Inc.

    Matthew

    November 14, 2011 at 7:32 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: