Cancer Genome Atlas funds study for analysis and interpretation of tumor genetic data

November 17, 2015

Professor of Bioinformatics and Computational Biology Jonas Almeda, Ph.D., is a leading expert in semantic web, a flexible database infrastructure that's easily expandable to accommodate new types of searches. Searching a standard relational database requires loading the data and then searching it for the desired information, Weinstein explained. A semantic web structure allows searches based on a subject-predicate-object format that can home more directly to the information sought, dramatically speeding up searches. With an appropriate graphics card, more data crunching can be done in the browser of the user's laptop, rather than in a distant server. That also speeds processing time and relieves pressure on the server.

Team member David Kane of SRA International, a longtime collaborator of Weinstein's, is an expert in the Agile software development paradigm. Traditional software development involves extensive planning and consultation with users before any code is written. "The central tenet of agile development is that you get something working quickly via close consultation between biologists and software engineers. The biologists then are the initial testers and users, and the software is grown organically. The initial investment is small, so you don't have to be afraid of changing direction if necessary," Weinstein said.

Kane used the agile approach to develop the Miner Suite of bioinformatics software with Weinstein at the National Cancer Institute. Weinstein worked at the NCI for more than 30 years, and directed what has been considered a precursor project to the Cancer Genome Atlas before coming to M. D. Anderson in January 2008. Advances based on the Miner Suite will be used in this project.

The TCGA grant application was Weinstein's first to his former employer.

Findings and tools generated by the project will be open source, available to other TCGA research teams, and in a format compatible with both The Cancer Genome Atlas and the NCI's Cancer Biomedical Informatics Grid (caBIG).

Weinstein has done a series of calculations to put the challenge of sorting out the many variables of the cancer genome in perspective. "If you unpacked the DNA in every cell of a single person and stretched it end to end, it would circle the equator 917,000 times - the equivalent of 120 round trips to the sun. One error in replicating the genome in one unlucky place - over a length of 120 trips to the sun and back - can lead to cancer. Our challenges are to understand how that happens -- and to know what to do about it if we can't prevent it in the first place."

Source: University of Texas M. D. Anderson Cancer Center