Researchers in California recently unveiled the World's Largest Repository for Cancer Genomes with a purpose to gather the data in one place and make it easy for researchers to do cross-dataset comparisons.
Highlights of the Project
Cancer Genomics Hub (CGHub), built by a team at the University of California, Santa Cruz (UCSC), will hold raw sequencing data from; The Cancer Genome Atlas (TCGA);
CGHub will not hold data from other international cancer genome projects;
TCGA is an effort of NCI to sequence the DNA of normal cells and tumor cells from 10,000 people with 20 types of Cancer;
CGHub will also hold data from NCI's childhood- and HIV-associated cancer genome projects.
CGHub will take over for NIH's National Center for Biotechnology Information, which had been collecting cancer sequencing data through last August.
CGHub computer system is ready to store 5 petabytes of DNA and RNA data from cancer patients. (TCGA is generating 10 terabytes of data a month, and will eventually produce 10 petabytes [10,000 terabytes] of data.);
TCGA is building a catalog of key cancer-driving genetic changes that researchers can use to develop personalized treatments;
A central database will allow researchers to compare mutations and miswired pathways across cancer types;
UCSC bioinformatician David Haussler is leading the project funded with a $10.3 million contract from NCI;
Though CGHub will not hold data from other international cancer genome projects, but for now, researchers will be able to only download the data and work on the data remotely on CGHub's servers through cloud computing;
Hoping that the database will make it easier for scientists to analyze the vast amounts of sequencing data pouring out of the U.S. National Cancer Institute's (NCI's) genome projects.