Who owns the human genome information
Soon, every person in the world could explore it — chromosome by chromosome, gene by gene, base by base — on the web. It was a historic moment, says Haussler. If countries or scientists hoarded the data they were producing, it would derail the project.
So in , the HGP researchers got together to lay out what became known as the Bermuda Principles, with all parties agreeing to make the human genome sequences available in public databases, ideally within 24 hours — no delays, no exceptions.
Fast-forward two decades, and the field is bursting with genomic data, thanks to improved technology both for sequencing whole genomes and for genotyping them by sequencing a few million select spots to quickly capture the variation within.
These efforts have produced genetic readouts for tens of millions of individuals, and they sit in data repositories around the globe. The principles laid out during the HGP , and later adopted by journals and funding agencies, meant that anyone should be able to access the data created for published genome studies and use them to power new discoveries.
Credit: The Regents of the University of California. The explosion of data led governments, funding agencies, research institutes and private research consortia to develop their own custom-built databases for handling the complex and sometimes sensitive data sets. Although some researchers are reluctant to share genome data, the field is generally viewed as generous compared with other disciplines.
Still, the repositories meant to foster sharing often present barriers to those uploading and downloading data. Researchers tell tales of spending months or years tracking down data sets, only to find dead ends or unusable files. And journal editors and funding agencies struggle to monitor whether scientists are sticking to their agreements. Clinical genomicist Heidi Rehm says the field has come to recognize that big scientific advances require vast amounts of genomic data linked to disease and health-trait data.
Sequencing the human genome made it easier to study diseases associated with mutations in a single gene — Mendelian disorders such as non-syndromic hearing loss 2. But identifying the genetic roots of more common complex diseases, including cardiovascular disease, cancer and other leading causes of death, required the identification of multiple genetic risk factors throughout the genome.
To do this, researchers in the mids began comparing the genotypes of thousands to hundreds of thousands of individuals with and without a specific disease or condition, in an approach known as genome-wide association studies, or GWAS. The next 20 years of human genomics must be more equitable and more open. The approach proved popular — more than 10, GWAS have been conducted since And that has produced oceans of data, says Chiea Chuen Khor, a group leader at the Genome Institute of Singapore, who studies the genetic basis of glaucoma.
A study with 10, people, looking at 1 million genetic markers in each, for example, says Khor, would generate a spreadsheet with 10 billion entries. So, controlled-access databases vet the researchers seeking access and ensure that the data are used only for the purposes that participants consented to. Similarly, other large generators of genomic data, such as the for-profit company 23andMe in Sunnyvale, California, and the non-profit Genomics England in London, operate their own controlled-access databases.
A first draft of data sharing principles for the Human Genome Project, written by John Sulston on a whiteboard in Bermuda, Credit: Richard Myers. But uploading data into some of these repositories often takes a long time. Sometimes the data get stored in more than one place, and that creates other challenges. Rasika Mathias, a genetic epidemiologist at Johns Hopkins University in Baltimore, Maryland, who studies the genetics of asthma in people of African ancestry, says that decentralization is a huge problem.
It consists of more than , research participants across more than 80 studies and shares its data in several repositories, including dbGaP and some university-based portals.
They must often provide detailed proposals and letters of support. Many look for workarounds. Several years ago, she tried to access a dbGaP data set, filing multiple rounds of digital paperwork, only to be rejected.
But, Sherry says, the NIH is investing in modernizing the system to make it more streamlined and flexible. Carrie Wolinetz, associate director for science policy at the NIH, says it is yet to be determined whether the remedy will be a dbGaP 2.
For all the problems that controlled access causes in sharing genome data, many researchers say databases such as dbGaP and the UK BioBank, which holds genomic data on , people, are still invaluable. Mathias is fiercely protective of the participants in TOPMed and sees merit in the protection that controlled access provides. Like many, she would like to see the repositories better resourced. Craig Venter left of Celera Genomics and Francis Collins centre , then at the National Human Genome Research Institute, met in Washington in to announce the completion of the first drafts of the human genome.
And others are happy to have access, even if it is hard to obtain. Her lab is more than willing to wade through the digital paperwork to use the dbGaP, and has done so for more than ten projects. Twenty years on from the HGP, there is no specific universal policy that says research groups have to share their human-genome data, or share them in a particular format or database. That said, many journals have continued to abide by the Bermuda Principles, requiring that genomic data be shared in approved databases at the time of publication.
At that time, Consortium researchers had confirmed the existence of 19, protein-coding genes in the human genome and identified another 2, DNA segments that are predicted to be protein-coding genes.
The Ensembl genome-annotation system estimated them at 23, Bets ranged from around 26, to more than , genes. Since most gene-prediction programs were estimating the number of protein-coding genes at fewer than 30,, GeneSweep officials decided to declare the contestant with the lowest bet 25, by Lee Rowen of the Institute of Systems Biology in Seattle the winner. Michael P. Cooke, Dr. John B. They theorized in the study that there was incomplete overlap between estimates of predicted genes made by Celera and by the Human Genome Sequencing Consortium.
Hogenesch et al, Daly, This number was arrived at "based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. This lower estimate came as a shock to many scientists because counting genes was viewed as a way of quantifying genetic complexity. With about 30,, the human gene count would be only one-third greater than that of the simple roundworm C. What if There are Only 30, Human Genes?
Lander et al. Venter et al. Rather, they serve as a starting point for broad comparisons across humanity. The knowledge obtained from the sequences applies to everyone because all humans share the same basic set of genes and genomic regulatory regions that control the development and maintenance of their biological structures and processes.
In the international public-sector Human Genome Project HGP , researchers collected blood female or sperm male samples from a large number of donors. Only a few samples were processed as DNA resources. Thus donors' identities were protected so neither they nor scientists could know whose DNA was sequenced.
DNA clones from many libraries were used in the overall project. Technically, it is much easier to prepare DNA cleanly from sperm than from other cell types because of the much higher ratio of DNA to protein in sperm and the much smaller volume in which purifications can be done.
Sperm contain all chromosomes necessary for study, including equal numbers of cells with the X female or Y male sex chromosomes. However, HGP scientists also used white cells from female donors' blood to include samples originating from women.
In the Celera Genomics private-sector project, DNA from a few different genomes was mixed and processed for sequencing. Most SNPs have no physiological effect, although a minority contribute to the beneficial diversity of humanity. Marvin Stodolsky, formerly of the U. A list of the major U. Other individual researchers at numerous colleges, universities, and laboratories throughout the United States also received DOE and NIH funding for human genome research. After the atomic bomb was developed and used, the U.
Congress charged the Department of Energy's DOE predecessor agencies the Atomic Energy Commission and the Energy Research and Development Administration with studying and analyzing genome structure, replication, damage, and repair and the consequences of genetic mutations, especially those caused by radiation and chemical by-products of energy production.
From these studies grew the recognition that the best way to study these effects was to analyze the entire human genome to obtain a reference sequence. Human Genome Project formally began October 1, , after the first joint 5-year plan was written and a memorandum of understanding was signed between the two organizations.
The DOE Human Genome Program ELSI component and the data it generated concentrated on two main areas: 1 privacy and confidentiality of personal genetic information, including its accumulation in large, computerized databases and databanks; and 2 development of educational materials and activities in genome science and ELSI, including curricula and TV documentaries, workshops, and seminars for targeted audiences.
Other areas of interest include data privacy arising from potential uses of genetic testing in the workplace and issues related to commercialization of genome research results and technology transfer. Making the Project Possible Its long-standing mission to understand and characterize the potential health risks posed by energy use and production led DOE to propose, in the mids, that all three billion bases of DNA from an "average" human should be sequenced. Technologies available before that time had not enabled the routine detection of extremely rare and often minute genetic changes resulting from radiation and chemical exposures.
While biotechnology and pharmaceutical companies may not share their data as quickly or openly as academic researchers, however, the patent system often exposes private-sector research to the same scrutiny as the peer-review process.
Patents may be ridiculed as nefarious tools of capitalism, but patents come with a duty to disclose , albeit on an ex post facto basis. Patents are, in a crucial sense, the public governance of private enterprise. In this way, the norms of the science commons are partially transposed from academic to commercial laboratories, even as the reward structures in private versus public genomics remain quite distinct. Since the s, government policy has encouraged university scientists to reap the commercial benefits of their research.
The landmark Bayh-Dole Act of spurred academic institutions to pursue patents on federally-funded research and to license their inventions to private firms for commercialization. Earlier that same year, the U. Supreme Court paved the way for patents on genetically modified organisms, ruling in favor of a scientist who had developed a bacterium by selective breeding Diamond v. Many years later, we are still sorting out the legacy of these policies and court decisions.
Public genomics and private genomics are more intermingled than ever. Universities face decisions about whether to purchase commercial licenses in order to conduct academic research.
Companies support academic researchers in order to preempt rivals by making certain areas of knowledge public and thus unpatentable. Some argue that an excess of exclusive patents restricts genomic research, while others argue that the backlog of patent applications is slowing down genomic research in a maze of fruitless regulation. Within this maze, however, a few patterns seem to have emerged — some showing how the private and the public realms can serve one another, others suggesting that the system stands in need of serious reform.
T he SARS virus offers a recent, telling example of how intellectual property IP can help and hinder research and development simultaneously. By spring , the SARS genome was already sequenced, and the sequence data was quickly made public. Without question, data sharing greatly expedited the characterization of the virus and the ability to diagnose infection. Sharing information saved lives by saving time. Within six months, vaccines were being tested, and a DNA chip harboring the complete SARS genome was available for both research and screening.
At the same time, the institutions that sequenced the SARS genome filed patent applications, and these patents could prove important if a private sector partner someday needs to supply capital for developing a vaccine.
Centers for Disease Control initially filed patent applications; the latter two of these groups contended they were patenting for defensive purposes — that is, to prevent others from locking up SARS data and rights.
Yet in a recent Bulletin of the World Health Organization , the Dutch patent-applicants expressed concern that the SARS patents may deter the development of downstream products such as vaccines. Remarkably, intellectual property can evoke ambivalence even among those who hope to have it. The verdict on the value of SARS patents depends on imponderables that will not be apparent for years.
Or the SARS rights could remain proprietary. Most likely, few people will pay much attention so long as SARS stays off the global radar screen, in which case sorting out the patent situation will probably take longer and cost more than sequencing the virus in the first place.
The fear of mass death will not be there to focus the mind, so to speak. But it is also possible that SARS could return, or that some new pathogen — like bird flu — will raise similar questions about how IP can be deployed in a way that protects public health while keeping financial incentives in place. And ironically, those situations when cooperation is most needed — such as during an epidemic — are also the times when the most money may be at stake.
Disease-related patents are most valuable, after all, if and when the disease in question becomes a true public health concern. T he SARS case demonstrated that information sharing can be a force for social good, and that patents might permit such sharing while still preserving incentives for subsequent development though such sharing is hardly guaranteed.
In the realm of genomics, there are at least two notable examples of such strategic altruism. In , pharmaceutical giant Merck bankrolled an effort to detect regions of the genome that harbored active, protein-coding genes. The company was worried about its freedom to operate in future years, since upstart companies like Human Genome Sciences and Incyte Genomics were filing patent applications on hundreds of thousands of short snippets of genes.
In response, Merck funded Washington University in St. Louis to produce the same partial gene sequence information quickly and publicly. Merck helped create new public information in order to preemptively defeat rival intellectual property. These two cases illustrate the topsy-turvy world of genomics in action, with big companies working with universities and nonprofits to bolster the public domain.
Was it pure self-interest, driven by market demands and exigencies? But the net consequences were largely positive both for the advance of human knowledge and for the development of new products. No doubt many corporate scientists are public-spirited and interested in knowledge for its own sake. But companies are only likely to make significant investments in public knowledge when there is a private incentive to do so.
This is not greed, but reality. K nowledge, after all, is never free. It requires time, talent, and resources — and thus investments that offer the possibility of economic returns. These companies expect that some of this research will lead to useful knowledge, and the general public expects that the incentive system driving such research will produce goods and services of commercial and social value. We recognize that progress always requires risk and usually entails failures along the way, and that risk-takers need the possibility of gain if they are to endure those research avenues indeed, the vast majority that go nowhere fast or nowhere at all.
0コメント