Our group studies the large-scale organization of proteins, essentially trying to reconstruct the 'wiring diagrams' of cells by learning how all of the proteins encoded by a genome are associated into functional pathways, systems, and networks. We are interested both in discovering the functions of the proteins as well as in learning the underlying organizational principles of the networks. The work is evenly split between computational and experimental approaches, with the latter tending to be high-throughput functional genomics and proteomics approaches for studying thousands of genes/proteins in parallel.
Bioinformatics of protein function and interactionsWe've discovered a number of features of genomes that allow us to predict functions for proteins that have never been experimentally characterized. Using these techniques and information from over 30 fully sequenced genomes, we were able to calculate some of the first genome-wide predictions of protein function, finding very preliminary function for over half the 2,500 uncharacterized genes of yeast. Now, with thousands of genomes in hand, we're extending these techniques, as well as asking fundamental questions about the evolution of protein interactions and the evolution of genomes.
Some of our recent papers on gene networks and the systematic discovery of gene function include:
Wang, Hwang, et al., RIDDLE: Reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network, Genome Biology, 13(12):R125 (2012) PubMed Link
Link to our large-scale gene networks for yeast, worms, mouse, Arabidopsis: http://www.functionalnet.org. An illustration of our Arabidopsis gene network just won Honorable Mention in the 2010 Science Visualization Challenge & was featured by the New York Times
Link to some of our public bioinformatics resources: http://bioinformatics.icmb.utexas.edu
Rational identification of genes affecting traits and diseasesUsing the gene networks and other computational tools, we've now gained some ability to rationally predict the consequences to an organism of mutating or interrupting a specific gene. This means that by using these tools, we can often select a small set of candidate genes to be implicated in a particular disease or trait. We've now experimentally validated >100 such candidate genes for diverse traits in a wide range of organisms, including yeast, worms, Arabidopsis, C. elegans, frogs, mice, and humans. For example, in yeast we've used network models to discover a large number of new ribosome biogenesis genes (collaborating with Arlen Johnson), as well as genes controlling such features as cell size. In animals, e.g. using our worm gene network models developed with collaborators Ben Lehner and Andy Fraser, we could successfully identify new genes controlling longevity, as well as genes capable of suppressing the loss of the Retinoblastoma tumor suppressor, thus 'curing' worms of model tumors. In Arabidopsis, with now ex-postdoc Insuk Lee and collaborator Sue Rhee, we could rationally identify new genes regulating root growth, drought resistance, and seedling pigmentation. In vertebrates, working with the Wallingford and Finnell labs, we've been able to use gene network models to help assign functions to a birth defect gene, as well as to identify entirely new birth defect genes, confirming their roles in vivo.
Some of our recent papers on the rational association of genes with traits and diseases:
Cha et al., Evolutionarily Repurposed Networks Reveal the Well-Known Antifungal Drug Thiabendazole to Be a Novel Vascular Disrupting Agent, PLoS Biology, 10(8):e1001379 (2012) PubMed Link PDF Synopsis NY Times NIGMS video
McGary, Park et al., Systematic discovery of nonobvious human disease models through orthologous phenotypes, Proc Natl Acad Sci U S A, 107(14):6544-9 (2010) PubMed Link Carl Zimmer wrote a nice story about this work for The New York Times.
Gray et al., The planar cell polarity effector protein Fuzzy is essential for targeted membrane trafficking, ciliogenesis, and mouse embryonic development, Nature Cell Biology, 11(10):1225-32 (2009) PubMed Link
Use our phenolog method to link genes to traits: http://www.phenologs.org
Proteomics: High-throughput protein expression and interaction profiling
From our work and others, it is apparent that proteins in the cell participate in extended protein interaction networks involving thousands of proteins. By defining these networks, we can not only discover the functions of specific proteins based on their connections, but also use these networks as tools to predict the outcome of perturbing the cell. As part of our research efforts in this area, we have been developing high-throughput methods to measure protein abundances in complex biological samples (e.g., by quantitative shotgun proteomics mass spectrometry) and protein localization with cells (e.g., by high-throughput automated fluorescence microcopy, such as of cell microarrays). These sorts of data help us build a catalog of protein, mRNA and metabolite expression from cells grown under many different conditions, forming a quantitative picture of these molecular events inside cells. We expect that data of these sorts will put us on the road to developing predictive, rather than merely descriptive, theories of biology.
Vogel et al., Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line, Molecular Systems Biology, 6:article 400 (2010) PubMed Link PDF News and Views
Lu, Vogel, Wong et al., Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation, Nature Biotechnology, 25(1):117-24 (2007) PubMed Link
Links to our MS/MS data repositories: http://www.marcottelab.org/MSdata/ and http://www.marcottelab.org/index.php/Category:MSdata
Link to the Open Proteomics Database: http://bioinformatics.icmb.utexas.edu/OPD/
Link to the APEX Protocol website: http://marcottelab.org/APEX_Protocol/
Link to the APEX software tool: http://pfgrc.jcvi.org/index.php/bioinformatics/apex.html
Link to the MSpresso website: http://www.marcottelab.org/MSpresso/
Link to the MSblender website: http://www.marcottelab.org/index.php/MSblender MSblender is proving to be a particularly powerful tool for interpreting mass spectrometry proteomics datasets; we've now used it for >5,000 individual datasets.