Skip to content
14th July 2021

Fairness: The importance of diversity in health data for creating better, fairer treatment for all

Jacqui Gath shares her thoughts on the importance of data diversity in healthcare research and the steps that can be taken to address inequalities in healthcare.

In 2019 I read a prize-winning book, Invisible Women by the investigative journalist Caroline Criado-Perez, that changed my views on life somewhat. It was a book about how the needs of 51% (females) of the population had been assumed to be the same as the needs of the other 49% (males). Until then, I had assumed that needs were differentiated, and it was astonishing to discover that crash test dummies are male in proportion, giving rise to data from safety tests not being applicable to generally smaller and lighter women. Consequently, women suffer greater injury and mortality in car accidents. It means that my daughter cannot purchase safety boots for her job as a surveyor – they need to be specially made at huge expense, or small men’s boots need to be padded out to form a rough fit, which is not really safe when inspecting commercial buildings from a cherry picker.

The book goes into the need for data to cater for different groups of people in some detail and has several eye-opening chapters including ‘Going to the doctor’ and ‘The Drugs Don’t Work’. Being a member of an Ethics Committee and providing patient and public input for a wide variety and range of studies since 2003, I had assumed that all this was taken care of in recruitment, the data analysis and follow-up. Not so it seems.

The publication of the book resulted in various articles in the press about how women have been short changed when it comes to use – or non-use – of their data. For example, even when trials take place, women are not always included. Or researchers do not analyse the data for sex differences. This was further illustrated in 2019 when it was revealed that DeepMind’s new AI predicts kidney injury two days before it happens, unless you are a woman. Why? Because only 6.32% of the data was female. So, the AI was inevitably biased through data inequity.

If we think women and their data are often ignored in healthcare and research, what do we think happens to data from other underserved groups, such as ethnic minorities, people living in deprived areas, the elderly, Travellers, the homeless, the disabled, and these just for starters? In 2021 the BMJ Editorial ‘Racial and ethnic health disparities in healthcare settings’ pointed out that diverse groups needed to be approached to understand their unique experiences.

Action can no longer be delayed on health inequalities

Since 1998, the United States has been taking steps to improve data diversity in clinical trials, mandating the demographics of participants are included in annual reports and introducing the ability for the Food and Drug Administration to pause clinical trials if sponsors exclude women only because they may be able to have children. Data gathered in 2016 found this had improved the balances of the sexes involved in clinical trials, but there was still a way to go to achieve data diversity across a wider range of demographics, including ethnicity.

The US are ahead of Britain in this. Meanwhile in the UK, In Feb 2020, a review of progress since the Marmot review in 2010 found that ‘While there has been progress in some areas since 2010, there is growing evidence that health inequalities are widening and life expectancy is stalling.’ Since then, there has been the COVID-19 pandemic, which has exposed health inequalities in Europe, and shown that action can no longer be delayed. Health inequalities include delayed cancer treatment, and four (some say five) times higher maternal death rate in black women than white.

More recently, in May 2020 The Pharmaceutical Journal raised the issue in a paper ‘Why we need to talk about sex and clinical trials’. NIHR has recently published a blog on improving inclusion, but should not all research be inclusive? The value of inclusion is demonstrated by the finding reported in 2019 by Wei Yang, that male patients with Glioblastoma Multiforme GBM survived on average only 10 months, but female patients survive 2.5 times longer when undergoing the same treatment. This knowledge opens up whole new areas of research to benefit men, who suffer more from GBM.

The challenge of enabling population-wide studies

The good news is that the topic of data diversity in research and healthcare has recently come to the fore. Data about sex, ethnicity, age and area in which people live is needed as a minimum in order to analyse how differences in symptoms, diagnosis, response to drugs, dosage, effects of poor nutrition and adverse environmental influences impact people of different demographics.

Such data collection and analyses are fine for clinical trials where permission is given to use personal data, but what about the use of bulk data garnered from GPs, Hospital Episode Statistics, cancer registries, and similar? There are very strict legal controls on the use of this kind of personal data. So, if we need to do population-wide studies, how do we deal with the legal and ethical problems raised? Do we need to change the law governing collection of data? Do we need to enhance security and privacy by using techniques such as Trusted Research Environments? These are challenges we need to address if research is to be fully effective and beneficial to as many different people as possible. That is, fair to everybody.

These are questions to be considered by the public as well as legislators, to ensure acceptability. We need to approach representatives of different underserved communities to discover their concerns and to advise them of the benefits of using their health data for society.

Finding and addressing health inequalities with data

The COVID-19 pandemic has provided us with an opportunity, as well as a challenge. We can use our national data to find and address inequalities. DATA-CAN is taking this on board with a policy of Equity, Diversity and Inclusion for data accessed for cancer research, and are working with other hubs to try and formulate a common written clear intention to be ‘fair’. We are developing an Equality Impact Assessment to remind researchers of what is needed when accessing data and have made contact with the BAME Research Hub Lead South Yorkshire to further these aims.

How can you make a difference?

While national bodies set policy and process in train, what can we do individually to make life more fair and inclusive? If we work on clinical studies in any capacity, depending on the nature of the study, we can ask if people of different sexes and genders are fairly represented, along with other minority groups. Are we taking into account the cultural sensitivities of people who follow a religion or personal philosophy? Are we collecting sufficient data to analyse for these groups? Should we do an Equality Impact Assessment before starting trial design? Can we raise it in Ethics committees, funding panels, study design: all these activities are fair game for the question: are we including women? Are we including ethnic minorities? Are we catering for the disabled, the elderly, children?

Now we have the Levelling Up Health Report published by the All Party Parliamentary Group for Longevity recommending – among other measures – ‘ to maximise the great contributions that science, technology, and data can make to improve our health‘.

I suggest that this is where the use of quality data, as exemplified by DATA-CAN will come into its own.

This blog was originally published by UCLPartners.


  1. Invisible Women: Exposing Data Bias in a World Designed for Men. Caroline Criado-Perez. Pub Chatto and Windus 2019. Winner of the 2019 Royal Society Insight Investment Science Book Prize.
  2. DeepMind’s new AI predicts kidney injury two days before it happens. WIRED Science, 31.07.2019.
  3. Racial and ethnic health disparities in healthcare settings – Community organisations have a crucial role in involving under-represented population groups. Tom Gardiner, 1 Sonya Abraham, 2 Olivia Clymer, 3 Mala Rao, 4 Shamini Gnani. BMJ 2021;372:n605. Published: 08 March 2021
  4. FDA regulation “Presentation of safety and effectiveness data for certain subgroups of the population in investigational new drug application reports and new drug applications”
  5. FDA regulation “Investigational new drug applications: amendment to clinical hold regulations for products intended for life-threatening disease and conditions”
  6. Alice Chen et al. J Womens Health (Larchmt). 2018 Apr;27(4):418-429. doi: 10.1089/jwh.2016.6272. Epub 2017 Oct 19.
  7. Health Equity in England: The Marmot Review 10 Years On. Commissioned by the Health Foundation. Pub. Feb 2020.
  8. The Health Inequalities Portal – EU action against inequalities.
  9. Fernandez Turienzo, C., Newburn, M., Agyepong, A. et al. Addressing inequities in maternal health among women living in communities of social disadvantage and ethnic diversity. BMC Public Health 21, 176 (2021).
  10. Why we need to talk about sex and clinical trials. Rachel Brazil. 20207976.article May 2020
  11. NIHR INCLUDE: Improving inclusion of under-served groups in health research by Guest Author on 18 Nov 2020.
  12. Sex Differences in Glioblastoma Multiforma (GBM) revealed by analysis of patient imaging, transcriptome, and survival data. Wei Yang et al, Science Translatonal Medicine, 02 Jan 2019. Vol 11 issue 473 eaao5253
  13. Data Safe Havens and Trust: Toward a Common Understanding of Trusted Research Platforms for Governing Secure and Ethical Health Research. Nathan C Lee et al. JMIR Publications. Published on 21.6.2016 in Vol 4, No 2 (2016): Apr-Jun
  14. Levelling Up Health Report published by the All Party Parliamentary Group for Longevity. April 2021.