Novel methodology:
We developed a novel methodology for predicting protein expression from mRNA data, incorporating ribosome density, ribosome occupancy, codon usage, gene copy number, and mRNA free folding energy alongside mRNA measurements in a generalized linear model to learn a predictive model for protein expression. Two key assumptions underlie this methodology: (1) additional, unknown factors relating mRNA to protein will be similar for proteins involved in the same biological process, and (2) it is possible to learn this relationship using protein and mRNA levels collected from “control” conditions, the same across multiple experiments. We then build separate generalised linear models for individual functional groups of proteins (e.g., pathways) using multiple mRNA experiments plus the other information listed above to predict protein expression in the “control” condition.
We applied this methodology to budding yeast, S. cerevisiae (below). For others to apply this methodology to their own system what is required is: (1) the additional genetic information about each gene (e.g., ribosome density and occupancy, etc.), (2) categorisation of genes into functional processes (e.g., KEGG pathways), (3) at least one, preferably more, high-throughput quantitative protein expression datasets taken under the relevant “control” conditions, and (4) multiple high-throughput quantitative mRNA expression datasets containing the same “control” conditions.
Predictive models:
We developed predictive models for protein expression from mRNA expression for 38 KEGG pathways (all pathways containing enough proteins for learning represented in the protein expression datasets) in budding yeast, S. cerevisiae.
Evolved synthetic constructs:
We evolved a single ancestral synthetic construct of budding yeast, S. cerevisiae, (with GFP-tagged osmostic stress response protein TPS2) to mild osmotic stress produced by liquid media containing 0.3M NaCl. The construct was evolved in 16 replicates, 8 each in either shaking or static culture, for 90-150 days (600-1000 generations) with stocks saved every 3 days (20 generations), resulting in 480+ “snapshots” of evolution.