Saturday 30 June 2018

Mapping the House of Lords in R


I received an email from the Electoral Reform Society recently. They’d crunched some numbers and discovered that a disproportionate number of the members of our upper house lives in London. Certain regions of the UK are even less represented than it appears from a casual glance at this mostly old, mostly rich, mostly male and totally unelected group of lawmakers.

While the ERS published the aggregated data as tables, I thought it would be cool to have a map of the distribution. It was a good excuse to crack out my new and decidedly weedy R coding skills.

To skip to the punchline, here’s the map I made:

As you can see, lords are concentrated in the south-east of the UK, and especially in London. There are also quite a few in Yorkshire, which is partly due to my poor R skills, which led me to lump Yorkshire together as a single county.

This map is coloured so that each bucket has an equal count, meaning the distribution looks more similar than it actually is.

Here’s the map with an equal interval between buckets:
Bit more of a difference. That’s because, from my highly trimmed down dataset, 141 lords (or baronesses) live in London. The nearest contender – Yorkshire – is home to just 29 lords.

The data is messy so take these with a pinch of salt. More details on methodology below if you’re interested.

Methodology


The figures need to be interpreted with some caution. First, not all lords declare their home county. Second, the approach I’ve used captures only the main ceremonial counties of the UK, so if a lord declared something unconventional (e.g. if they lived in France, say) they’d be excluded. We’re also only looking at England. Several lords live in the other countries of the UK, naturally.

Once the data is cleaned and counted up, we have 459 observations, out of about 800 lords. Not a bad sample, but assuming it’s unbiased would be pretty heroic.

The data comes from the HoL declaration of financial support, correct as of December 2017. Some bright spark saved the data as a pdf, instantly making data analysis awkward. It’s almost like they don’t want the data to be analysed.

But luckily, R is pretty good at cleaning stuff up. I pasted the pdf into a text document and saved it as a .csv, which can be read into an R script.

Since lords aren’t declared in firstname-surname format, but rather in lord-blah-of-blah-blah-blah format, with ‘blah-blah-blah’ varying from one word to several, creating a neat table like the one in the pdf was out of the window. An R wizard could do it no doubt, but a wizard I am not. Handily, we don’t need the names anyway.

Lords declare their home location simply as a county, and as far as I can tell, no lords have an entire county as their fiefdom. So counting up the counties gives us the data we need.

I created a vector of counties from a list I found on the internet, and used this to narrow down the several thousand fields in the .csv to the handful of counties. R is good at counting stuff, so I did that using the table() function. Finally, ordering them by value gives us something that looks fairly tidy.

            lords Freq
1           London  141
2        Yorkshire   29
3           Sussex   25
4      Oxfordshire   23
5        Hampshire   16
6   Cambridgeshire   15
7    Hertfordshire   14
8  Gloucestershire   13
9          Norfolk   13
10        Somerset   13
11            Kent   11
12          Surrey   11
13       Berkshire   10
14       Wiltshire   10
15         Cumbria    8
16  Northumberland    8
17 Buckinghamshire    7
18          Durham    7
19           Essex    7
20    Lincolnshire    7
21          Dorset    6
22  Worcestershire    6
23        Cheshire    5
24        Cornwall    5
25           Devon    5
26      Manchester    5
27         Suffolk    5
28      Lancashire    4
29  Leicestershire    4
30      Merseyside    4
31        Midlands    4
32            Tyne    4
33    Warwickshire    4
34      Derbyshire    3
35   Herefordshire    2
36      Shropshire    2
37   Staffordshire    2
38 Nottinghamshire    1

Next problem: how to put these on a map.

A bit of Googling (ok a lot of Googling) led me to the .geojson filetype, which is a handy way of coding boundaries. Some generous soul had already drawn out the boundaries for the UK’s counties and even neatly labelled each element, before putting it all on Github as a .geojson, so I downloaded that.

I only wanted England, so I found this handy .geojson writing/editing tool, http://geojson.io, and used that to trim the set down to just England. I also converted London into a single district.

An additional pain in the backside is that UK counties as codified by the statistical authorities do not match up with the ceremonial boundaries. So despite many cities being separate districts in my map file, I just treated them as equivalent to the county they sit in.

Great, so we have a map and some data to represent on it. Now what?

There are some great mapping tools in R but I’ve only ever used them to represent data as points. There’s no doubt a way to represent regions, but it was all getting a bit of a faff. So I turned to some GIS software – QGIS happens to be open-source, powerful and easy to use.

It turns out QGIS can read .geojson files. With a bit of poking around I pulled the data in and then discovered I could set colours according to the values attached to each element.

There probably is a way to automatically adjust the .geojson code to put in my lords data, but in the end I just went back to the geojson.io tool and entered the 40-odd fields manually.

And that was it. Everything necessary to map data on lords’ homes to the counties of England. It was a useful exercise in R, and an impressive illustration of just how many superb open-source tools are floating around on the internet. How the hell did we get anything done pre-internet?

For the curious, here’s the R code I wrote:



No comments:

Post a Comment