Earlier this year, Myriad Genetics Inc., Hitachi Ltd. and Oracle Corp. announced that they will map the proteome by 2004, producing a set of all possible interactions between all proteins expressed by human cells, and predicting protein pathways based on the interaction data. But just what this will mean in practice, and how useful it will be, remains to be seen.

Outside of the public sector, MYGN and its partners appear to be the only companies pursuing this course, arguing that it will lead to the discovery of novel pathways of protein interactions. But their competitors in the proteomics space say that truly mapping the proteome is neither feasible nor useful. They prefer to tackle proteomics in bite-sized chunks in what they consider to be physiologically relevant systems.

Mapping the proteome is not like sequencing the human genome, which at least is finite. In contrast, the human proteome for all practical purposes is almost infinite, unless it is defined simply as all the proteins coded for in the genome, in which case putting together a catalog should be possible. But the problem rapidly becomes more complex as splice variants, phosphorylation, and other post-translational modifications are added.

But even that is not truly daunting. The really hard part is trying to look at protein interactions in every cell type at every point in time in every state (healthy and diseased), at every stage of development and for every disease. Thus the question of how to pursue proteomics is far more complex than it was for the genome, and generating data on all protein interactions actually risks creating a situation in which researchers drown in irrelevant data.

Indeed, MYGN recognizes that its system won't map the entire proteome, and isn't worried about it. "There are always closure questions," said Sudhir Sahasrabudhe, executive vice president of research. "When one sequenced the genome, one knows you've got over 90%, but you haven't exhausted every single base, and there's still 1% variation. So in a manner of speaking, when we say we'll map the proteome, we're saying we'll get broad, comprehensive coverage."

Thus MYGN (Salt Lake City, Utah) is defining its mapping project based on what is possible using its technology. "Whether you can map the proteome depends on how you define it," Sahasrabudhe said. "Our definition of mapping the proteome is to discover and describe all the protein-protein interactions that can be found using random yeast two-hybrid and mass spec technology," (see BioCentury, April 9).

The company's strategy is similar to the shotgun sequencing approach used by Celera Genomics Group (CRA, Rockville, Md.) to map the genome. "We're taking a human genome-type approach to begin to describe the random interactions between different proteins - like randomly sequencing the genome," Sahasrabudhe said. "It would be useful and important to understand what I might miss - post-translational modifications, what happens when things are phosphorylated, and the like. But as long as one knows what one will miss, that's ok."

Too much forest?

Nor is MYGN worried that its project will create masses of irrelevant data - not just from specious interactions, but from interactions among proteins that have no pharmaceutical interest