This powerful, dynamic software system is changing how scientists sift through the recent deluge of human genomic data — and deepening our knowledge of human biology, health and disease.
What once seemed impossible — sequencing an entire genome — is now fast, routine and cheap. With sequences from hundreds of thousands of human genomes now available to scientists, as well as growing catalogs of information on genes and their functions, researchers have a new problem: an overwhelming abundance of data that outstrips the traditional tools of analysis. Indeed, analyzing data — not generating it — has become the greatest challenge facing biologists today.
“In human biology, we have knowledge hidden in datasets,” says Olga Troyanskaya, deputy director for genomics at the Center for Computational Biology (CCB) at the Flatiron Institute and a professor of computer science at the Lewis-Sigler Institute for Integrative Genomics at Princeton University. But the deluge of human data demands a different approach than the ones used in traditional genetic studies of yeast and worms, for example. Model organisms such as these are amenable to controlled experiments in which scientists can engineer a genetic mutation in one population and compare the effects against a population without the mutation. By comparison, genomic data for humans contains far more complexity …
…