From February 2 to February 17, the IMA held a workshop on Group Testing Designs, Algorithms, and Applications to Biology. The organizers’ goal was to connect biologists and those studying bioinformatics with various mathematical communities because of the strong connections between group testing and coding theory.
Group testing, a method that was used in World War II to weed out syphilitic men called for service, pooled the recruits' blood samples into groups, and the mixed blood samples are then tested. Because the infection rate was so small, an efficient algorithm was devised to identify infected recruits. The number of tests needed to be performed is just a little more than the number of groups. Whereas, coding theory involves encoding data in such a way that even if parts are lost in the transmission, the entire message can be reconstructed by what is received. One of the most celebrated codes is the Reed-Solomon (R.S.) code which is used in encoding music onto a CD.
One tutorial, given by Atri Rudra (University of Buffalo) offered a computer scientist’s view of codes, group testing, and algorithms for identifying the significant items in a population through group testing. What surprised the group was that what Atri Rudra presented was essentially one of the designs that the biologists had been using. The biologists had worked out, in their own way, mathematical descriptions of the properties they were using of the design.
When the two groups were brought together, they realized that what they had been working on was essentially the same thing.
A large group of the participants got together during the workshop, fleshed out the exact relationship between the two models, started to figure out what properties of the design were useful, began proving what those properties translated to in coding theory, did some coding theory/combinatorial analysis to enlarge the collections of codes that would meet both group’s desires, and then implemented one of the small R.S. designs that had previously not generated before.
"We had a couple of people write little bits of code in a couple different languages to generate these R.S. codes. We were able to get really nice, clean ways of generating these matrices that the biologists have been using to pool their samples,” organizer Anna Gilbert said. "We generated them according to the right mathematical description, one that makes it easy to modify various design constraints," she added.
All of this transpired in time for one presenter to give a short talk on their findings on the last day of the workshop.
"It took getting two or three communities in the same room to figure out that we know a lot about this mathematically and we know a lot about how it works in the lab because we’ve tested it already," said Gilbert. And, according to Gilbert, the IMA is so successful because of its ability to bring these various communities together to stimulate collaboration and foster these types of interdisciplinary research efforts. And the group now has a nine-page manuscript that they’re working on, in addition to software that the biologists can use a starting point for generating good group testing designs.
In terms of outcomes, Gilbert explained that this will assist one of the large-scale projects Yaniv Erlich (Whitehead Institute for Biomedical Research), among others, is doing on rare allele detection in certain populations. For example, conducting a pooled screening for carriers of certain genetic diseases, cystic fibrosis, Tay-Sachs, etc. Their discovery well help him to more efficiently pool more people and test them all at once.