Why transcription factor binding sites are ten nucleotides long

Alexander J Stewart, Sridhar Hannenhalli, Joshua B Plotkin

Research output: Contribution to journalArticlepeer-review

Abstract

Gene expression is controlled primarily by transcription factors, whose DNA binding sites are typically 10 nt long. We develop a population-genetic model to understand how the length and information content of such binding sites evolve. Our analysis is based on an inherent trade-off between specificity, which is greater in long binding sites, and robustness to mutation, which is greater in short binding sites. The evolutionary stable distribution of binding site lengths predicted by the model agrees with the empirical distribution (5-31 nt, with mean 9.9 nt for eukaryotes), and it is remarkably robust to variation in the underlying parameters of population size, mutation rate, number of transcription factor targets, and strength of selection for proper binding and selection against improper binding. In a systematic data set of eukaryotic and prokaryotic transcription factors we also uncover strong relationships between the length of a binding site and its information content per nucleotide, as well as between the number of targets a transcription factor regulates and the information content in its binding sites. Our analysis explains these features as well as the remarkable conservation of binding site characteristics across diverse taxa.

Original languageEnglish
Pages (from-to)973-85
Number of pages13
JournalGenetics
Volume192
Issue number3
DOIs
Publication statusPublished - 7 Nov 2012

Keywords

  • Algorithms
  • Animals
  • Binding Sites
  • Computer Simulation
  • DNA/chemistry
  • Evolution, Molecular
  • Genetics, Population
  • Humans
  • Models, Biological
  • Mutation Rate
  • Nucleotide Motifs
  • Protein Binding
  • Reproduction/genetics
  • Transcription Factors/metabolism

Fingerprint

Dive into the research topics of 'Why transcription factor binding sites are ten nucleotides long'. Together they form a unique fingerprint.

Cite this