Bridging The Genetic Data Divide
A software engineer’s struggle against politics, privatization and a flood of genetic data. Written by Collin Blinder. Illustrations by Jacob Bettencourt and Natalie Chan.

Illustrated by Jacob Bettencourt ⋆ ˚。⋆୨୧˚
Melissa Cline was at work on the day her life changed. She picked up her ringing cell phone, unprepared for the news at the other end.
“You have a harmful variant in the ATM gene,” a genetic counselor told her.
Cline hadn’t been expecting this call. She had taken a genetic test as part of a research study run by a friend. But she had only signed up for the study to support her friend. With no personal or family history of genetic disease, she didn’t even qualify for a medical genetic screening.
The genetic counselor began explaining Cline’s results. “Now, everybody has an ATM gene…”
Cline cut him off. “Let me tell you a little about me,” she said, lightheartedly.
Cline is an associate research scientist with the Genomics Institute at the University of California, Santa Cruz. At the time of that phone call in 2017, Cline was overseeing the development of a tool that genetic counselors use to access information on genetic variants— malformed genes that can result in disability and disease. So she already knew what it meant to carry a harmful genetic mutation.
After the call, Cline was hit with a mix of emotions. She felt the dread of knowing that her risk of acquiring cancer was more than double that of the general population. But there was also another emotion — something bordering on excitement.
Cline knew that the genetics software she helped build had led to so many of these life-changing calls1, initiating conversations resulting in restrictive diets, mammograms, double mastectomies, hysterectomies, and other efforts to quell the odds of cancer. Now, Cline knew what it felt like to receive that call. She remembers thinking that she would be “eating my own dog food.”
Cline could now empathize with the group that stood to lose the most in medical genetics — the patients. “I’d be able to do this work authentically,” she remembers thinking.
Cline manages the BRCA Exchange, an online tool that stitches together vast amounts of information on genetic variations resulting from errors in the BRCA1 and BRCA2 genes, those most closely associated with breast cancer. These genes are some of the most notorious hazards in the entire human genetic code, or genome.
Cline’s tool provides doctors, clinicians, and carriers of BRCA variants with valuable information to help make some of the hardest decisions in a person’s life. However, the BRCA genes are very long lines of genetic code containing tens of thousands of characters, leaving plenty of room for errors to produce variants. Thanks to advancements in genetic analysis, testing labs are discovering these new BRCA variants faster than researchers can study them. This leads to confusion among patients and their doctors, who don’t know whether these variations are deadly or harmless.
Researchers must review a slew of clinical reports, statistical analyses and research results on the same variant in order to definitively classify it as dangerous, also called pathogenic, or harmless, also called benign. But aggregating that data is no easy task. That’s because the world’s genetic data is siloed, divided up among many different places. There are public databases containing similar information in separate locations; there are protected databases that are safeguarded for privacy and national security concerns; and there are private industry databases that are locked up by companies to protect confidentiality and a competitive edge over rivals.
The result is an abundance of variants, posing unknown risks, that could be better understood if the world’s genetic information was united.
Dangerously broken
Genes are the instruction manuals of the body, written in a molecular alphabet of just four letters: A, C, G and T. They carry directions that tell the body’s cells what to do, from instructing cells in the eyes on which color to paint the iris to directing the mucous cells that line the stomach. If a single letter of a gene is deleted or swapped out, the system can break down disastrously.
For that reason, genetic testing has become a Swiss Army Knife for the medical industry. If a person has a family history of cancer, a test can determine whether they’re genetically predisposed. If a patient presents with symptoms of a disease, genetic testing can provide the last piece of evidence needed for a diagnosis. For someone undergoing treatment, testing can help determine the best course of action.
Although they are powerful tools, genetic tests don’t always provide conclusive results. Many people find out that they carry a genetic variant that poses an unknown risk. These variants of uncertain significance (VUSs) can lead to uncertainty and potentially unnecessary life-changing treatments.
Perhaps no two genes have garnered as much attention as BRCA1 and BRCA2. These genes are tasked with instructing cells to produce proteins that suppress tumor growth, by acting like stop signs for cell multiplication, and repair DNA. BRCA stands for BReast CAncer and, true to their names, errors in the BRCA genes can increase a person’s risk of breast cancer, as well as ovarian, pancreatic and prostate cancers by as much as three to four times that of the general population. While most research into BRCA variants has focused on the risk to females, they pose a risk to all sexes.
The ClinVar database, the most widely used public repository of genetic variants, currently contains over 33,000 BRCA1 and BRCA2 variants. For more than 14,000 of these BRCA variants, researchers either do not know their effects or don’t agree on them. These variants, with inconclusive risks, may be harmful or harmless. This means that if a patient undergoes genetic testing and finds out they carry one of these confusing variants, their preventative treatment options are limited to informed guesses by their doctors.
While Cline’s own pathogenic ATM variant comes with a known risk, she hasn’t been spared from uncertainty. Cline says that she now gets both a mammogram and a breast MRI yearly. However, she made the decision not to get a pancreatic cancer screening given the test’s high likelihood of a false-positive result that might lead to an invasive pancreatic biopsy.
The process of classifying a variant’s risk—as either dangerous, harmful or unknown—usually begins in a genetic testing lab. When a lab encounters a genetic variant, they can upload their report to a database along with their own risk classification. Independent panels of experts specializing in certain genes meet regularly to sift through these variants and provide definitive classifications. ClinGen, the entry point for variants being submitted to ClinVar, currently lists 129 contributing expert panels on its website.
The panel that oversees BRCA1 and BRCA2 variants is called the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium. This international team of experts focuses solely on BRCA1 and BRCA2, but BRCA variants are so abundant that ENIGMA faces a mountain of work, says Amanda Spurdle, a researcher at the Queensland Institute of Medical Research Berghofer Medical Research Institute in Australia and the director of ENIGMA.
“We meet once a month, we have 1000s and 1000s and 1000s of variants to curate,” Spurdle says. She says that, given their minimal financial resources, and the arduous process for updating classifications on ClinVar, ENIGMA is unable to fully compensate its members for their time.
“It’s a community of the willing, the people who turn up to run these things,” says William Foulkes, Chair of the Department of Human Genetics at McGill University. “They’re not being paid to do any of it, it’s all because they care about it.”
An additional layer of complexity comes in the form of the world’s fractured genetic data landscape. When submitting their variant reports, testing labs can choose to submit to multiple databases located around the world. Two examples are ClinVar, managed by the National Institutes of Health in the United States, and the Leiden Open Variation Database (LOVD), operated by a team at the Leiden University Medical Center in the Netherlands.
While data doesn’t need to pass through ports of entry, it isn’t free of political borders. In recent years, both the United States and China have been reassessing their own data-sharing practices when it comes to their citizens’ genetic code.
“Nationalistic tendencies are clearly in the air and that tends to not encourage sharing,” says Foulkes.
Private genetic testing companies, such as Invitae and Myriad Genetics, also maintain proprietary databases. Some of these companies share a portion of their variants with public databases but still try to distinguish themselves from the competition by withholding some of their data, says Spurdle.
“They’re not going to give it to us for all their variants because otherwise they’ve got no edge,” she says.
This cumbersome and fraught data landscape has been 40 years in the making.
An archipelago of data
In the early 1980s, the National Library of Medicine had a pioneering idea in medicine — producing a centralized information hub that could be edited and accessed through the recently created Internet. The first demonstration of this project was publishing a catalog of genetically linked diseases called the Online Mendelian Inheritance in Man.
By that time, many geneticists believed that sequencing the entire human genome, our species’ genetic blueprint, would lead to massive advances in medicine. In 1991, this titanic sequencing effort kicked off when an international constellation of laboratories formed the Human Genome Project. The project aimed to identify the location and function of every gene of the genome. The research spanned generations of technological advancements; its researchers scrapped their microscopes and fax machines for automated sequencing devices and internet databases. The challenges of sequencing the genome shifted from the speed of laboratory research to the speed of data-sharing.
In 1996 the world’s genomicists convened in Bermuda to develop a set of guiding principles for their field. The list included an oath that all labs should promptly share their data in a standardized format to foster collaboration.
When much of the genome had been sequenced, a team at UC Santa Cruz was chosen to compile a complete digital record from the Human Genome Project’s ocean of individual genetic sequences. The Santa Cruz team was racing against a private company that intended to patent parts of the genome, which many feared would hamstring scientific research. The UC Santa Cruz team won out, publishing the genome to the internet through the Genome Browser in 2000, where, to this day, anyone can freely access the sequence.
Fear of genetic patents was not unfounded. In the mid-1990s, Myriad Genetics obtained patents for a key process in the genetic testing of the BRCA1 and BRCA2 genes. They monopolized genetic testing of these genes until the Supreme Court struck down their patents in 2013, after which the cost of BRCA1 and BRCA2 tests dropped from $4,000 to $400 dollars.
The whopping price tag for the Human Genome Project’s years of research and development ran around $3 billion dollars. By 2013, technological advancements cut that cost to around $5,000 dollars and testing smaller groups of genes became inexpensive enough for insurance companies to cover screenings for high-risk individuals. The widespread use of genetic testing led to the rapid growth of information on genetic variants, posing the challenge of where to store all that data.
By 2013, this ocean of data was spread out across an estimated 2,000 genetics databases worldwide. This made understanding genetic variants, which vary in frequency and risk across racial and ethnic groups, especially difficult. With this problem in mind, an international group of researchers, championing data-sharing, formed the Global Alliance for Genomics and Health (GA4GH). In their first white paper, the group warned that, in the absence of a shared solution, the world’s genetic data would remain in a “hodge-podge of balkanized systems.” If public data management solutions were not developed, they cautioned, private systems would fill the void, slowing “the understanding, diagnosis and treatment of disease.”
One of the first pilot projects to come out of the GA4GH was the BRCA Exchange, developed at UC Santa Cruz. Cline was brought onto the small BRCA Exchange team to lead the tool’s software development in 2015. From the outset, she assembled a steering committee of clinicians and researchers, representing the tool’s community of future users, to help guide the development of the BRCA Exchange, says Mary Goldman, a design and usability engineer at the Genomics Institute who previously worked on the BRCA Exchange.
Scientific software is usually developed with minimal input outside of the research team creating the tool, says Goldman. “You shove it out into the world and nobody uses it because you didn’t actually talk to anybody and people are not invested in it,” she says.
For the past decade, Cline has kept the steering committee together, Goldman says. “That’s really unique for the BRCA Exchange project — prioritizing users and prioritizing the community,” she says.
Bridging the data divide
The core mission of the BRCA Exchange is to unify disparate data to provide the most comprehensive and up-to-date information needed to assess the risks posed by variants. Users are met with rows of lab classification reports and statistics. At the top of each variant webpage appears an illustration of a gene, looking like a multicolored, beaded bracelet. This shows users the single imperfection, on a single bead, that turns a BRCA1 or BRCA2 gene into a variant.
The BRCA Exchange collects descriptive information about variants from databases such as ClinVar and LOVD, including lab reports with vital descriptions of each variant and why the lab believes it to be pathogenic or benign. The tool also pulls in statistics from databases that monitor how often variants are observed in certain populations. This is an important piece of information for classifying variants given that a variant’s risk can be either overlooked or overblown based on incomplete population data.
For example, a variant may appear to be relatively common, which would support a classification of benign, as more deadly variants are less likely to be passed through generations. However, closer inspection may show that a seemingly benign variant is actually a pathogenic variant overrepresented in the data because it is known to show up frequently in a certain ethnic group. This is true of multiple pathogenic BRCA variants found in Ashkenazi Jewish populations.
Facilitating ENIGMA’s work may be the most significant role the BRCA Exchange plays, Foulkes says. By collecting information from disparate data sources and working closely with ENIGMA, Cline’s team has expedited the process of classifying BRCA1 and BRCA2 variants. Cline and her team have developed a way for the BRCA Exchange to automatically assign risk classifications to previously unclassified variants. This, Spurdle says, has freed ENIGMA up to prioritize only the most complex risk classifications.
While clinicians and researchers make up the majority of BRCA Exchange users, Cline and her team are currently trying to figure out how to better serve the carriers of BRCA1 and BRCA2 variants who also visit their website. These “seekers,” as Cline calls them, come looking for information about their own variants. “They’ll work their way through six pages of Google hits to learn everything they can,” Cline says.
Her team is developing a survey for these users, hoping to better understand their needs. The challenge is figuring out what information would be useful for people who may not have even known the meaning of the word “variant” until discovering they were carrying one.
Keeping the tool running and expanding its capabilities requires hours of software engineering and user testing. One consistent problem facing the BRCA Exchange is finding funding to support the project, says Goldman. The tool’s usefulness as a hub linking together information from other sources often doesn’t meet the standards of innovation outlined in grants, she says.
“People are wanting to fund more ‘scientific innovation’ or ‘you’re taking this in a completely different direction and doing something new’,” Goldman says. Instead, Cline’s tool is more like herding the cats of the world’s variant data sources. “Herding the cats is not a very sexy thing to fund. But it is so necessary,” she says.
Cline has a grant proposal currently under review at the National Institutes of Health. The grant opportunity is one of the few sources for funding a project like the BRCA Exchange, she says. Cline’s grant officer told her it was “the only program in the NIH that’s really a good fit for me,” she recalls.
Cline says that her proposal, which would provide the BRCA Exchange with four years of runway, got a high score from the review panel just before the Trump administration threw NIH’s operations into disarray.
“In a normal year, we would be feeling very optimistic,” Cline says. The next step would be for an NIH council to meet in May to make the final decision on the grant, “but the White House has been blocking the council meetings,” she says. Now, Cline is forced to spend more time searching for alternative funding sources and less time on the project itself.
Cline is constantly thinking about the next step for the BRCA Exchange — how to make it more accessible and useful. She hopes to expand the platform to include additional genes. The first on her list are PALB2, sometimes called BRCA3 for its similar link to cancer, and ATM, her own erroneous gene.
By developing a single hub for information on a few genes, the BRCA Exchange provides a glimpse into an alternate future, where access to genetic information isn’t limited by where the data is stored.
“The more we can promote these kinds of in-depth, expert-curated, informed databases, particularly for common diseases, then the better off the patients will get,” Foulkes says. “I mean, the patients want the experts to be helping them. And it’s a worldwide effort that is not governed by national borders.”

Collin Blinder
Author
B.A. (psychology; minor in computer science) Pitzer College

Jacob Bettencourt
Illustrator
BS in Biology at University of Hawaiʻi at Mānoa
Internship: National Tropical Botanical Garden
Jacob Bettencourt is an illustrator from the island of Oʻahu and an all-around lover of plants and invertebrates. For several years leading up to and into his undergrad, Jacob involved himself in education through teaching at his local science club, where his love for the biological world, and informing otherʻs about it, flourished. With his art, he hopes to emphasize the importance of native biodiversity and conservation within his work, highlighting the living things that are often ignored. When he’s not drawing, you can find Jacob with music blasting in his ears, squatting to take a picture of a bug on the sidewalk. :)

Natalie Chan 陳良儀
Illustrator
BS in Biology at University of California, Riverside
Internship: Emerging Creatives of Science, Monterey Audubon Society
Natalie Chan is inspired from living in the rich biodiversity of Hong Kong and California. She explores the vibrancy of life through color in her work, using illustration to inspire others about science. Her mother enlightened her how creativity in art can offer new perspectives on existing subjects. Deeply impacted by this insight, Natalie used it in her honors thesis at UC Riverside by visualizing her research. This experience deepened her desire to use art to make scientific discoveries accessible to a broader audience. Ultimately, she hopes to collaborate with researchers to share their discoveries more widely. When not making art, Natalie can be found running around the beach, taking unflattering photos, or staring confused at research papers of a niche she rabbit holed into with edm music at a low volume. :)
