You are here

The search for the world's 'missing' genomes

In the summer of 2020, a 63-year-old African American woman with colon cancer was treated with a common chemotherapy known as fluoropyrimidines at the National Institutes of Health (NIH) Clinical Centre in Bethesda, Maryland. But over the coming weeks, she began to develop a severe side-effect known as pancytopenia – a rapid and sudden decrease of red and white blood cells and platelets – causing her to be rushed into intensive care.

This kind of reaction is surprisingly common. Around 38,000 cancer patients in England and approximately 154,000 patients in the US are initiated on fluoropyrimidine-based treatments every year. While fluoropyrimidines help save lives, between 20% and 30% of the people who receive these drugs require lower doses, because their bodies struggle to process them. If given the standard dose, they experience reactions which can vary from severe to fatal.

Like many adverse drug reactions, this is thought to be at least in part due to variations in the human genome, the strings of billions of letters or chemical bases which comprise our DNA. But while all humans share 99.9% of our genome, the remaining 0.1% varies markedly from one individual to another, or between ethnic groups. Differences in the underlying sequence behind a particular gene – which can be anything from a few hundred to several million bases – can have profound and far-reaching consequences for our health.

In recent years, genetic-sequencing studies have started to get to the bottom of why some people react so badly to fluoropyrimidines, pinpointing four different variations of a gene called DPYD which is involved in metabolism, as the likely cause. Healthcare systems around the world have now begun sequencing the DNA of certain cancer patients and screening for each of these four variants before determining their chemotherapy dose.

The only problem is that these studies were done entirely on white people, or as geneticists say, "individuals of European ancestry". While different variants of DPYD may serve as warning signs for people of other ethnicities, we do not have enough data to be sure of which variants are most applicable to different ethnic groups. "Ethnic minority patients will usually be given conventional doses of the drugs," says Munir Pirmohamed, a pharmacologist at the University of Liverpool in the UK. "Some of these patients will carry other ethnic-specific variants which also affect their ability to metabolise these drugs, but we do not currently genotype for those, largely because we do not know."

This is just one example of what scientists like Pirmohamed call the "genomic gap", a phenomenon which unless addressed, is likely to have wide-ranging impacts across healthcare, exacerbating existing inequalities. While the falling costs of genome sequencing have been predicted to usher in a medical revolution that will make treatments more precise and personalised, it is often overlooked that not everyone stands to benefit.

It is 70 years since the double-helix structure of DNA was first revealed in a grainy, black and white image. Since then our understanding of the genetic information it encodes has advanced enormously, but these insights are from a surprisingly narrow portion of our species. Information from entire ethnic groups is missing. So why is the picture of our genetic make up still so fuzzy?

So far, 86% of existing genomic studies are based on data collected from white Europeans. The reasons for this are myriad and complex. Most large sequencing projects areprimarily located in the western world, for example. Many clinical trials of potential new treatments also often fail to engage with different ethnic groups and there is a tendancy to rely upon biobanks which do not tend to be very diverse. Lindsay Fernández-Rhodes, an epidemiologist at Pennsylvania State University who works with marginalised populations and their families, explains that this is because biobanks have inadvertently ended up predominantly recruiting people from affluent areas, who live close to the medical centre.

"Most biobanks reflect the local catchment area, and in the US, primarily include those with health insurance," says Fernández-Rhodes. "A systematic reinvestment is needed to narrow the current gap between who is being studied in genomics and who is most in need of such information. For example, racial minority groups like African Americans and Latinos shoulder a desperate burden of obesity and remain understudied."

As a result, while many healthcare systems now offer genetic tests as a way of diagnosing inherited cardiac conditions or cancer based on gene variants identified in large sequencing projects, the patient experience with these tests can vary wildly from one ethnic group to another. Some studies ave found that the predictions they make are four-and-a-half-times less accurate in individuals of African than European descent.

"People of non-European ancestry are more likely to have their gene variants be incorrectly called rare when they aren't, potentially leading to a misdiagnosis," says Neil Hanchard, clinical investigator in the US National Human Genome Research Institute (NHGRI).

In the UK, a new NHS pilot scheme called Heart is currently underway to assess whether a technique called polygenic risk scores can be can be used to predict whether a person is likely to be affected by cardiovascular disease within the next decade. This estimates disease risk using combinations of gene variants which have been linked to an illness through large genome-wide association studies (GWAS). But the ethics of using polygenic risk scores have been called into question by a number of researchers, as the data is not as relevant to other ethnic groups.

We must shift from the historic study of primarily populations of European origin to collect medical and genetic information from diverse populations at scale - (David Reese)

Sasha Henriques, a researcher at the Wellcome Sanger Institute in the UK and a counsellor who works with patients and families to help them interpret the information which comes out of genomic tests, says that many communities already harbour a certain distrust of genetic medicine which in turn makes them less likely to participate in studies.

"The genetic odyssey is a huge burden for families particularly for those from already minoritised or disadvantaged groups," she says. "Delays in diagnosis and less accurate predictions can negatively impact health outcomes, leading to slower referrals to specialist services, and therefore the most appropriate treatment or management of symptoms can be delayed or missed."

But it is not only minorities who are impacted by the genomic gap. Because the vast majority of genetic research has taken place in North America or Europe, we still know next to nothing about the genetic landscape of Africa, Central and South America, the Middle East and South East Asia. Studying these populations could not only make genomic medicine more relevant to them, but yield novel discoveries about the human genome which lead to new drugsthat benefit all.

One pertinent example of this is the Dallas Heart Study. By ensuring that African Americans were included in the research, the investigators discovered that a gene called PCSK9 could lower low-density cholesterol. More than a decade later, the pharma company Amgen unveiled the first PCSK9-inhibitor drug, which is now given to patients of all ethnicities who have a family history of cardiovascular disease.

Despite this success story, pharmaceutical executives admit much more needs to be done to bridge the genomic gap if we are to find answers to other chronic illnesses such as hypertension, stroke, diabetes and chronic kidney disease, which disproportionally affect people of African or Latin American heritage.

"Biobanks and sequencing efforts are starting to be established in other parts of the world," says David Reese, executive vice-president ofresearch and development at Amgen. "We must shift from the historic study of primarily populations of European origin to collect medical and genetic information from diverse populations at scale."

The genomic goldmine

Two years ago, Nigerian biotech start-up 54gene launched a groundbreaking initiative to sequence the DNA of 100,000 adults across Nigeria, by far the continent's biggest genome-sequencing effort to date.

Beforehand, genetic research in Africa had been almost negligible compared to the rest of the world. A mere 5,000-10,000 African genomes have been studied, compared to around one million across the rest of the world. As of 2022, just 0.14% of the data in GWAS studies comes from Africans.

Yet, if we are to truly understandthe multi-layered relationship between the human genome and our health, then Africa represents a goldmine of untapped information. As Segun Fatumo, a Nigerian computational geneticist at the London School of Hygiene and Tropical Medicine, explains, there are more than 2,000 linguistic groups across the continent. Each of these corresponds to a tribe where intra-marriage and reproduction has taken place for 200,000 years. This all contributes to perhaps the most varied genetic profile on Earth.

It is this variation which presents opportunity. Studying populations with many different forms of common genes which have never been analysed before, can give new insights into human biology, helping to yield novel drug targets. For example, genetic research on people of African heritage identified that subtle alterations in the code of the PCKS9 gene can have an impact on cholesterol, leading to a new generation of lipid-lowering drugs.

"If you look at Nigeria alone, there are more than 300 different ethnicities and 500 languages," says Fatumo, who has helped to co-ordinate the Nigerian 100,000 genomes project. "So the people are really different and that also shows in their genetics. If we're going to continue making exciting medical discoveries, African people must be studied."

The project hopes to build-up on the promise of existing initiatives such as the Wellcome Sanger Institute's African Genome Variation Project (AGVP) and internationally funded Human Heredity & Health in Africa (H3Africa) Initiative, endeavours which have yielded a handful of intriguing findings.

By sequencing the genomes of thousands of individuals across sub-Saharan Africa,the AGVP has identified new gene variants which, if validated, could be used to pinpoint vulnerability to chemotherapy toxicity in people of African descent around the world. During the Covid-19 pandemic, a study led by H3Africa demonstrated that a substantial proportion of people across the continent had gene variants that would make them at risk of adverse reactions to hydroxychloroquine, which at the time was being pushed as a potential treatment for the virus.

Similar breakthroughs have been made through recent projects in other countries with untapped genetic diversity, particularly Central and South America. A 2021 study funded by the pharma company Regeneron, sequenced the genomes of tens of thousands of individuals in Mexico and identified forms of a gene called GPR75 which appears to have a significant impact on BMI. People with certain variants of GPR75 weigh an average of 5.3kg (12lbs) less than those with conventional copies and were half as likely to be obese. Scientists are now looking at whether this could form a new drug target for obesity medications.

But where the Nigerian 100,000 genomes project is different is that it specifically aims to empower more African scientists to lead the way in both uncovering and commercialising these findings. Fatumo says that too much genetic research in low-income countries has been co-ordinated by international teams or global pharmaceutical companies who jet in to gather samples and rarely give much back to the people they have studied.

Instead he feels that there is a great need for more of these initiatives to be led by scientists who actually understand the local culture,enabling them to gain the trust and engagement of as-yet understudied communities who might be sceptical of genetic science."We want more people in Africa to be in a position to write the continent's own genomics agenda, and help these scientists become leaders in the genomic world," he says.

Aggregating information

A decade ago, a group of scientists at the Broad Institute in Cambridge, Massachusetts, had grown slightly frustrated at the fragmented nature of genetics research. For while it was easier than ever to study a group of individuals and identify gene variants, which appeared to be connected to a particular disease, there was a growing issue with false alarms.

One major outcome of improving the diversity of genetic research is that doctors will soon no longer use ethnicity to make clinical decisions

Geneticists Heidi Rehm and Daniel MacArthur had come across increasing numbers of cases of gene mutations which had been linked to rare childhood diseases, only for the association to later be disproved when a larger and more diverse sample of individuals was used. MacArthur came across one report which assessed 200 different gene variants which had been previously described as pathogenic. Just nine stood up to the test when more rigorous analysis was applied.

"We realised that if we had more diverse data we could rule out pathogenicity for more variants, as well as building evidence of pathogenicity for variants that actually are causal in diverse populations, increasing the rates of genetic diagnosis in those populations," says Rehm.

Both Rehm and MacArthur were alarmed at the real-life implications of these flaws in genomic science. They had read about patients who were aborting babies, all on the basis that variations in DNA suggested the fetus might be prone to developing a serious disease, associations which often turned out to be false alarms.

They decided to create a single source of genomic information, aggregating data collected in studies all around the world, to provide geneticists with a resource where they can study a particular variant and assess how common and pathogenic it really is in various populations. In its current form it is known as the Genome Aggregation Database (gnomAD) and it contains more than 70,000 genomes and over 750,000 exomes, the protein-coding portions of genomes.

Compared to most large-scale genomic studies, gnomAD has been relatively successful at capturing information from a more diverse range of individuals. 43% of its data comes from non-European Asians, 12.5% from Latinos, 8.8% from Africans or African Americans, and 3.7% from Ashkenazi Jews. Geneticists around the world have already been able to use this information to reclassify certain gene variants from pathogenic to benign. 

"Tackling the issues around diversity in datasets is a complex issue, but there have been some successes," says Henriques. "In particular initiatives like gnomAD have yielded large amounts of more diverse data allowing for the reclassification of cardiac genes."

But we are still only scratching the surface when it comes to understanding genetic diversity, with many unique populations around the world remaining virtually unstudied. While new initiatives are being launched all the time – MacArthur is currently leading a project to sequence Aboriginal communities and other diverse populations in Australia – we still have very little genomic data from Oceania, Southeast Asia, northern Africa and the Middle East.

The latter region is thought to be a vital trove of information due to what geneticists describe as high rates of consanguinity – when lots of people descend from the same ancestor. "This can lead to higher rates of genetic disease," says Rehm. "Studying individuals with genetic disease can help identify the causes of disease and the function of genes that are disrupted in disease."

But while scientists have become increasingly aware of this, efforts to collect this information have been hampered by the political tensions, economic crises and conflicts which have devastated many countries across the region. Stigmas attached to genetic diseases have often discouraged families from getting involved in research projects, while international aggregation initiatives like gnomAD have found Middle Eastern countries either unwilling or unable to share any genomic data that exists.

As a result, some of the biggest progress is being made through active efforts to improve the diversity of studies in Europe and the US, with the NIH's new All of Us program particularly focused on recruitment of underrepresented groups. Genomics England has also commissioned a range of research studies to understand and find ways to overcome barriers to participation in genetics research for UK nationals of African or Caribbean descent.

"Some of the reasons for the lack of diversity in genomics studies is due to how the engagement and recruitment process is carried out," says Henriques. "Meaningful dialogue with communities and populations who have been excluded from research is important, along with improved ethnic representation in the clinical and research workforce is important. This has been shown to improve recruitment to studies."

But further afield the future of genomics research appears more fragile, particularly on the African continent. While H3Africa has been supported by a $176m (£147m) grant from the NIH and the Wellcome Trust over the last 10 years, that funding stream ends in 2022 and its future remains unclear. There have been reports of African professors who have received funding as part of H3Africa, choosing to pivot away from human genetics to cheaper alternatives such as studying the genomes of drug-resistant bacteria. Others have even questioned the value of investing in genetic research, arguing that such money might be better spent on projects which offer more immediate public health benefits such as anti-smoking and healthy eating campaigns.

Perceptions may change if Nigeria's 100,000 genomes project delivers data which can be directly used to find and validate new drug targets, but even this flagship initiative may be vulnerable to external forces. While 54gene, who are bankrolling the project, have received significant funding from international investors, the company has begun to struggle amidst the global economic turmoil of 2022, being forced to lay-off 200 employees and seeing their value slashed by $100m (£83m).

But experts say that it is vital that we keep finding money to put into this research, to improve the future of medicine around the world. Rehm predicts that one major outcome of improving the diversity of genetic research is that doctors will soon no longer use ethnicity to make clinical decisions.

"This has been used in the past in certain areas of medicine, and a lot data is now showing that this is bad on many levels," she says. "Instead, we will discover the actual differences, from genetic factors to environmental ones, and use those concrete, objective measures to guide decisions, not inferences from ethnic background which is fraught with problems."

Fatumo feels that we are making steps in the right direction, pointing out that before the Nigerian 100,000 genomes project, the largest sequencing study in Africa included just 6,400 people. But he cautions that there is still a long way to go.

"It's a massive difference but to put things in perspective, there was a Nature paper which calculated that we need three million genomes in Africa to be sequenced to fully capture the genetic diversity of the continent," he says. "We've seen a lot of talk, talk, talk to address the genomic inequality, but we now need more steps to be actually taken."

David Cox