# Intellectual Fallout from Biology's Success: Can Mathematics Help Us Out?

Monday, April 26, 1999 - 9:15am - 10:15am

Keller 3-180

Roger Brent (Molecular Sciences Institute)

By producing biological information systematically, the genome project and its follow-on projects have transformed the way biology is performed. Bigger changes are coming. I'll review the types of genomic data that are becoming available and those that we can reasonably anticipate. Then I'll try to talk about how we might use these torrents of data to do what we want, which is to understand living things.

To start with, I'll give two examples of analysis done for different genomic data types. Both studies reveal at least shallow mathematical issues. Both analyses were somewhat informative.

Then I'll get to the problem we face as biologists: barring breakthroughs, the inferences we can make from systematically generated biological data will often be disappointing in that these inferences will not be of sufficient insight or probability to interest the majority of contemporary biologists.

A mid-term approach to this problem is to learn how to integrate different genomic data types and perhaps to integrate these with natural language data. A longer term approach to the problem is to bring into being technologies to systematically generate new types of biological information. At the Institute, we are working on both of these.

All of the data types, and all of the future approaches, will have associated with them analytical, statistical, and perhaps shallow mathematical issues. I also deem it somewhere between possible and likely that these efforts may reveal deeper mathematical issues.

Moreover, any effort to make predictive models from biological information will necessarily involve computational and mathematical issues. As an example, I'll introduce one such promising exploration, now being done by Dr. Larry Lok. Dr. Lok is constructing Markov models of the expression of some genes in individual cells from data on their expression in a population of cells.

The development of a predictive biology will likely be one of the major creative enterprises of the 21st century. People who understand mathematics, computation, and statistics and who are willing to apply their understanding to this quest are in a position to make substantial contributions to human knowledge. We are self-consciously trying to create the scientific and institutional frameworks to allow them to do so.

To start with, I'll give two examples of analysis done for different genomic data types. Both studies reveal at least shallow mathematical issues. Both analyses were somewhat informative.

Then I'll get to the problem we face as biologists: barring breakthroughs, the inferences we can make from systematically generated biological data will often be disappointing in that these inferences will not be of sufficient insight or probability to interest the majority of contemporary biologists.

A mid-term approach to this problem is to learn how to integrate different genomic data types and perhaps to integrate these with natural language data. A longer term approach to the problem is to bring into being technologies to systematically generate new types of biological information. At the Institute, we are working on both of these.

All of the data types, and all of the future approaches, will have associated with them analytical, statistical, and perhaps shallow mathematical issues. I also deem it somewhere between possible and likely that these efforts may reveal deeper mathematical issues.

Moreover, any effort to make predictive models from biological information will necessarily involve computational and mathematical issues. As an example, I'll introduce one such promising exploration, now being done by Dr. Larry Lok. Dr. Lok is constructing Markov models of the expression of some genes in individual cells from data on their expression in a population of cells.

The development of a predictive biology will likely be one of the major creative enterprises of the 21st century. People who understand mathematics, computation, and statistics and who are willing to apply their understanding to this quest are in a position to make substantial contributions to human knowledge. We are self-consciously trying to create the scientific and institutional frameworks to allow them to do so.