Data Unlocked are working on a project with Inside Outcomes and Birmingham City University to estimate the rate of diagnosis of over twenty health conditions in England using openly available datasets. In this blog post Simon Whitehouse explains how we have made our estimations.
Earlier this year we spoke with Darren Wright of Inside Outcomes about the possibility of estimating the diagnosis of health conditions in England by area, something we were surprised wasn’t already available.
The National Health Service (NHS) do produce statistics, called Quality Outcome Framework (QOF) measures, that break down the prevalence of diagnosis of twenty five different health conditions from Atrial Fibrillation to Stroke, by GP practice.
They also produce statistics about the number of patients registered in GPs practices, broken down by the Lower Layer Super Output Area (LSOA) they live in.
Lower Layer Super Output Areas (LSOAs) are small areas of the country where between 1000 and 3000 people live. They are the smallest area that we know about that have a whole range of statistics produced about them. For instance, census statistics are broken down by LSOA.
Darren suggested that there might be a way of estimating the prevalence of diagnosis of these health conditions at LSOA level, using these two datasets.
Mike will describe the technical details of the method we used, including the SQL statements, in another post. Here I will give a non-technical overview of how we created the estimated dataset.
Firstly, we calculated the prevalence of each indicator within a GP practice. Here we used the data made available by NHS Digital on the number of Patients Registered at a GP Practice, January 2018
As a made up example, take a GP practice that has the following breakdown of patients:
gp_code | gp_name | lsoa | patients | |
---|---|---|---|---|
AGP001 | THE AVERAGE SURGERY | LSOA001 | 100 | |
AGP001 | THE AVERAGE SURGERY | LSOA002 | 400 | |
AGP001 | THE AVERAGE SURGERY | LSOA003 | 500 |
In total it has 1000 patients. 10% of them live in LSAO001, 40% in LSOA002 and 50% in LSOA003.
Next, we looked at the prevalence of diagnosis of health conditions for GP practices from the QOF measures. Again, here is a made up example:
gp_code | Indicator_group | register | patient_list_type | list_size | |
---|---|---|---|---|---|
AGP001 | AF | 100 | TOTAL | 1000 | |
AGP001 | AST | 300 | TOTAL | 1000 | |
AGP001 | CAN | 125 | TOTAL | 1000 |
From this we have made estimates of how many patients from The AVERAGE SURGERY have Atrial Fibrillation based on where they live.
For example, in the above table there are 100 of The AVERAGE SURGERY’s patients who have Atrial Fibrillation.
We know from our earlier table that 10% of The AVERAGE SURGERY’s patients live in LSOA001.
So, we take 10% of 100 and estimate that there are 10 patients registered at THE AVERAGE SURGERY who also live in LSOA001 who have Atrial Fibrillation.
We then added up all of the number of patients estimated to have Atrial Fibrillation in LSOA001 regardless of the surgery where they are registered.
Using this method we are then able to estimate across all surgeries and all conditions.
It is worth emphasising that we are doing here is a rough estimate. We can’t, and don’t, claim that it is precise and certainly wouldn’t advise making decisions based on it. We do think that it is reasonably accurate (not a statistical term), and we will be producing some analysis over the next few months to describe how we think it can be used.
[…] This post originally appeared on Data Unlocked […]