Department of Electrical Engineering and Computer Sciences
University of California, Berkeley
Moos Tower, Room 2-650
515 Delaware Street SE
University of Minnesota, East Bank
The first completely sequenced genome, the virus Lambda at 50,000 nucleotides, was sequenced via the shotgun method by Sanger and coworkers at Cambridge in 1981. The shotgun method consists of randomly sampling and determining 500-700 nucleotide "reads" and then assembling them to reconstruct the sampled sequence. It was long believed that this approach could not be applied to genomes over 100,000 nucleotides long, so a long period followed where laborious directed approaches that involved breaking down larger genomes into a set of localized subsegments were pursued. In 1996 Jim Weber and I proposed sequencing the human genome with a paired-end shotgun approach that entails randomly sampling segments of say 10,000 nucleotides and then directly determining 500-700 nucleotide reads at both ends of the segment. It has became overwhelmingly clear that the whole-genome paired-end shotgun sequencing approach is more rapid and economical than the directed methods, enabling the production of high-quality reconstructions of Drosophila (2000), Human (2001) and Mouse (2001), in quick succession at Celera by a team of roughly 80. We discuss the overall strategy and the results one can expect by comparing these reconstructions to the same sequences obtained by alternative, independent methods.
In the near term, semi-directed yet highly parallel methods could portend a further economy of a factor of 2 or 3 in effectiveness. In the mid term, high-density pyro-sequencing and single molecule detection systems have the potential to permit the de novo sequencing of a large genome for under $10,000 in a matter of several hours. We survey the range of possibilities and the implications of inexpensive whole genome sequencing on the future of biotechnology and medicine.
IMA Tutorial: Data Analysis and Optimization, Monday, May 5, 2003
IMA Workshop: Data Analysis and Optimization, May 6-9, 2003