Bioinformatics Tools & Resources
The large amount of experimental data generated in Center labs are stored in the PUMAdb, part of the Microarray Core resource. The PUMAdb is a critical resource for archiving, analyzing, and publishing our experimental results in their entirety. Once published, all the data are made public through the PUMAdb.
Externally generated data are combined with our experimental results and are used for data integration methods such as those employed by the Troyanskaya lab. For example, the Troyanskaya lab has developed BioPIXIE, a system to discover interaction networks and pathways using a Bayesian approach.
Researchers at the Center for Quantitative Biology have generated several other public tools and resources for data analysis and visualization. Data and source code can be downloaded here. Tools implemented for the web are listed below. Several of these have been implemented into an integrated analysis pipeline, called Integrated Tools.
| bioPIXIE : bioPIXIE is a novel system for biological data integration and visualization. It allows you to discover interaction networks and pathways in which your gene(s) of interest participate. |
| COALESCE : COALESCE uses large collections of genomic data and Bayesian integration to predict coregulated gene modules, the conditions of regulation, and the consensus binding motifs for regulation. It uses a synthesis of gene expression biclustering, motif prediction, and data integration (including expression, sequence, nucleosome positioning, and evolutionary conservation). It is available as part of the Sleipnir library. |
| ChARM: Chromosomal Aberration Region Miner : Chromosomal aberration detection tool |
| FIRE : FIRE is a motif discovery and characterization program based on mutual information. |
| GOLEM : GOLEM is a userful tool which allows the viewer to navigate and explore a local portion of the Gene Ontology (GO) hierarchy. Users can also load annotations for various organisms into the ontology in order to search for particular genes, or to limit the display to show only GO terms relavent to a particular organism, or to quickly search for GO terms enriched in a set of query genes |
| Generic Gene Ontology (GO) Term Finder : This generic ("multi-organism") GO Term Finder web tool finds significant GO terms shared among a list of genes from your organism of choice, helping you discover what these genes may have in common. The implementation of this Generic GO Term Finder depends on the GO-TermFinder software written by Gavin Sherlock and Shuai Weng at Stanford University, made publicly available through the GMOD project. |
| Generic Gene Ontology (GO) Term Mapper : This generic ("multi-organism") GO Term Mapper web tool maps the granular GO annotations for genes in a list to a set of GO slim terms, allowing you to bin your genes into broad categories. The implementation of this Generic GO Term Mapper uses map2slim.pl script written by Chris Mungall at Berkeley Drosophila Genome Project, and some of the modules included in the GO-TermFinder distribution written by Gavin Sherlock and Shuai Weng at Stanford University, made publicly available through the GMOD project. |
| GRIFn : GRIFn is a novel system for interactive evaluation of functional genomic data and methods. It allows you to upload your own data, view evaluations in multiple contexts, and compare it with other published high throughput data. |
| growthrate.princeton.edu : Provides an analysis of yeast genes and their growth rate correlations. |
| MAVEN: An open source cross platform metabolomics data analyser. MAVEN aims to reduce complexity of metabolomics analysis by developing a highly intuitive interface for exploring and validating metabolomics data. It performs multi-file chromatographic alignment, peak detection, isotope and adduct calculation, formula prediction, pathway visualizion, and isotopic flux animation. |
| MEFIT : a Microarray Experiment Functional Integration Technology : MEFIT is a Microarray Experiment Functional Integration Technology. Given any amount of microarray data, it predicts the probability of pairwise functional relationship for any gene pair within individual biological functions. |
| Multiplexed Shotgun Genotyping (MSG): Genotyping approach based on multiplexed shotgun sequencing that can identify recombination breakpoints in a large number of individuals simultaneously at a resolution sufficient for most mapping purposes, such as quantitative trait locus (QTL) mapping and mapping of induced mutations. |
| Nearest Neighbor Networks (NNN) : Nearest Neighbor Networks (NNN) is a graph-based algorithm used to cluster genes with similar microarray expression profiles. The NNN clustering method is an alternative to classical techniques such as hierarchical and K-means clustering. NNN generates clusters of functionally related genes with high precision, and the clusters generally represent a broader selection of biological processes than those produced by other methods; NNN performs best on data sets with many conditions and on datasets that are modular (i.e. contain several grouped subsets of conditions). The NNN algorithm is described in Huttenhower et al. 2007 (http://www.biomedcentral.com/1471-2105/8/250) and was developed in the Troyanskaya and Coller labs, and the web tool was implemented by Juan Alvarez in the Bioinformatics group at Princeton. |
| PVIEW : PVIEW is an open source software tool for the visualization and analysis of high resolution quantitative proteomics and metabolomics LC-MS and LC-MS/MS data. PVIEW enables quantification of complex mixtures of proteins and metabolites. |
| P-POD : Princeton Protein Orthology Database : P-POD displays families of predicted orthologs from P. falciparum, H. sapiens, D. melanogaster, M. musculus, A. thaliana, C. elegans, D. rerio, and S. cerevisiae with an emphasis on providing information about disease-related genes and experimental confirmation of orthology from the literature. |
| PUMAdb publications page : Published microarray data and web supplements from Princeton researchers |
| Princeton University Microarray Database (PUMAdb) : The Princeton University MicroArray database (PUMAdb) stores raw and normalized data from microarray experiments, as well as their corresponding image files. In addition, PUMAdb provides interfaces for data retrieval, analysis and visualization. Princeton researchers and their collaborators should register for a database account. |
| SiteSifter : SiteSifter finds highly conserved DNA motifs embedded within coding regions. Each instance of a motif is scored based on the chance that its constituent codons are conserved over and above that required for amino acid conservation. |
| SPELL : Serial Pattern of Expression Levels Locator : SPELL (Serial Pattern of Expression Levels Locator) is a query-driven search engine for large gene expression microarray compendia. Given a small set of query genes, SPELL identifies which datasets are most informative for these genes, then within those datasets additional genes are identified with expression profiles most similar to the query set. |
| Virus Infection Project : The Virus Infection Project (VIP) is a web tool that provides a way to look at information about transcripts during CMV infections. |
| Yeast Functional Genomics Database (YFGdb) : The goal of YFGdb is to collect and freely disseminate all available yeast functional genomics data, along with requisite analysis tools, to the yeast community and the biomedical research community at large. YFGdb contains data sets from microarray as well as many other genomics/proteomics studies including large-scale interaction and phenotype experiments. YFGdb has been implemented using the Generic Model Organism Database Construction Set as part of the GMOD project. |

