1.1 Cell Structure; DNA; RNA; transcription; translation; proteins. COMS 4761 --
2007. 2 ... B. Alberts et al, “Molecular Biology of The Cell”, 4th edition, Garland.
Science. ... J.D. Watson et al, “Molecular Biology of The Gene”, 5th edition,.
Pearson ...
Chapter 1: Bio Primer
1.1 Cell Structure; DNA; RNA; transcription; translation; proteins
Prof. Yechiam Yemini (YY)
Computer Science Department Columbia University COMS 4761 --2007
Overview
Cell structure and mechanisms DNA; RNA; Transcription; Regulation Translation; protein; sequence & structure References: B. Alberts et al, “Molecular Biology of The Cell”, 4th edition, Garland Science. R. Horton et al, “Principles of Biochemistry”, 3rd Edition, Prentice Hall. J.D. Watson et al, “Molecular Biology of The Gene”, 5th edition, Pearson Benjamin Cummings. NCBI Introductory overview: http://www.ncbi.nih.gov/About/primer/index.html Animation sites: o http://www.johnkyrk.com/ o http://vcell.ndsu.nodak.edu/~christjo/vcell/animationSite
COMS 4761 --2007
2
1
Organisms Are Made of Cells
COMS 4761 --2007
3
Prokaryotes & Eukaryotes Have Different Cells Prokaryotes: single cell organisms without nucleus E.g., Bacteria: E-coli, H-Pylori
Eukaryotes: single/multi-cell organisms with nucleus E.g., Yeast, plants, drosophila, humans Earth formed -4.5B yrs Prokaryotic bacteria
-3.5B yrs
-1.5B yrs Nucleated cells Multi-cellular -0.5B yrs eukaryotes
© Pearson; Benjamin COMS Cummings 4761 --2007
4
2
Prokaryotes Single cell; size 0.2-2µm
Eukaryotes Single or multi cell; cell size 10-100µm
No nucleus Nucleus Structure One membrane at cell boundary Multiple membranes/compartments
DNA
No organelles No cytoskeleton
Organelles: mitochondria, Golgi, chloroplasts Cytoskeleton
Single circular DNA
Two or more chromosomes
Genes code proteins Genes have large non-coding regions (introns) 90% of DNA encodes proteins 95-97% non-coding DNA
Proteins
~105-6 base pairs
~107-9 base pairs
DNA is loosely organized
DNA is tightly packed (chromatin + histones)
Cell division through fission 1-2k protein species
Mitosis 5-20k protein species
~106 proteins per cell
~109 proteins per cell
COMS 4761 --2007
5
Cells Are Made of Macromolecules Small molecules: 3%
Macromolecules: 26%
Sugars
Polysaccharides
Fatty Acids
Fats, Lipids, Membranes
Amino Acids
Proteins
Nucleotides
Nucleic Acids (DNA, RNA) Molecules
% weight
Water Inorganic ions Sugars Amino acids Nucleotides Fatty acids Other small molecules Macromolecules (proteins, DNA, RNA, polysaccharides) COMS 4761 --2007
70% 1% 1% 0.4% 0.4% 1% 0.2% 26% 6
3
DNA Structure
COMS 4761 --2007
7
The Central Dogma of Biology
DNA
Transcription
RNA
Translation
Protein
DNA stores hereditary information DNA is transcribed into RNA RNA is translated into proteins Proteins perform the key functions of cells
COMS 4761 --2007
8
4
DNA Consists of Sequences of Nucleotides DNA strands are sequences of nucleotides Backbone
T
+
T
Sugar Phosphate Base
Nucleotide
A
C
T
T
A
C
G
C
Bases: Adenine, Guanine, Thymine, Cytosine
DNA is organized in complementary double strands Hydrogen bonds hybridize complementary pairs: AT, CG 5’-end Hydrogen bonds 3’-end
T A
G C
A T
T A
T A
G C
C G
COMS 4761 --2007
G
C 9
DNA Forms A Double Helix Helix full turn: 10.5bp Vertical hydrogen bonds support the structure Major and minor grooves provide access by proteins (e.g., transcription factors)
COMS 4761 --2007
10
5
DNA Is Tightly Packed DNA is 2m long; needs to fold into 10-6m nucleus Chromatin beads fold around 4 histones Transcription needs to unpack the DNA to copy it
COMS 4761 --2007
11
Sample Bioinformatics Challenges Sequencing the genome Discovering sequence similarity Discovering genes Analyzing evolutionary relationships Discovering other important structures Distinguishing exons from introns Regulatory structures: (promoters & transcription factors) Regions expressing micro RNA ….
COMS 4761 --2007
12
6
Transcription
COMS 4761 --2007
13
Schematics
DNA Transcription mRNA Translation Protein COMS 4761 --2007
14
7
Overview
A. Assembling transcription complex
B. Transcribing DNA to mRNA
C. Removing introns
COMS 4761 --2007
15
Animation The Transcription Process
COMS 4761 --2007
16
8
Transcription Details http://cwx.prenhall.com/horton/medialib/
From PDB
COMS 4761 --2007
17
Transcription Factors TFs bind to promoters regions and to RNA polymerases TFs regulate the rate of transcription (up/down) Regulation is yet to be well understood
COMS 4761 --2007
18
9
Transcription Is Regulated
COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 19
Example The Lac Operon Lac consists of 3 genes; commonly transcribed Used by bacteria to transport and metabolize lactose
cAMP activates transcription to initiate transport & metabolism of lactose
COMS 4761 --2007
20
10
Lac Activation Low-level sugar generate cAMP cAMP binds with CRP; adjusts its alpha helix to fit the DNA grooves and binds with it CRP-cAMP accelerates polymerase binding
Lac
Lac
COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 21
Splicing The Introns
COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 22
11
From Genes To Networks Regulation is organized in networks Top: gene network regulating the body development of sea urchin Middle: a promoter region Bottom: interaction of two modules
COMS 4761 --2007
23
Regulatory Networks Can Be Complex Genetic regulatory network controlling the development of the body plan of the sea urchin embryo Davidson et al., Science, 295(5560):1669-1678.
COMS 4761 --2007
24
12
Sample Bioinformatics Challenges Discovering and analyzing transcription factors Evolutionary analysis; motifs finding
Discovering the structure of regulatory networks Analyzing the operations of regulatory networks Designing synthetic regulatory networks
COMS 4761 --2007
25
Translation
COMS 4761 --2007
26
13
RNA Encodes Protein Sequences DNA
Transcription
RNA
Translation
Protein
Proteins are sequences of amino-acids (AA) Translation uses RNA sequence as a template to construct AA sequence
The coding problem: Code sequence of 20 amino-acids using 4 nucleic acids 2 nucleic acids can code only 42=16 amino-acids Codon: sequence of 3 nucleic acids; encodes amino acid
Translation: translate mRNA codons to amino acids Start/Stop codons define an open reading frame(ORF) Translation requires reading/identifying codons and forming a respective protein sequence
COMS 4761 --2007
27
The Genetic Code U U C A
A
G
UUU Phenylalanine UUC Phe UUA Leucine UUG Leu
UCU Serine UCC Ser UCA Ser UCG Ser
UAU Tyrosine UAC Ty
CUU Leu CUC Leu CUA Leu CUG Leu
CCU Proline CCC Pro CCA Pro CCG Pro
CAU Histidine CAC His CAA Glutamine CAG Gln
CGU Arginine CGC Arg CGA Arg CGG Arg
AAU Asparagine AAC Asn AAA Lysine AAG Lys
AGU Serine AGC Ser AGA Arg AGG Arg
GAU Aspartate GAC Asp GAA Glutamate GAG Glu
GGU Glycine GGC Gly GGA Gly GGG Gly
AUU Isoleucine AUC Ile AUA Ile
AUG
G
C
ACU Threonine ACC Thr ACA Thr Methionine ACG Thr
GUU Valine GUC Val GUA Val GUG Val
GCU Alanine GCC Ala GCA Ala GCG Ala
UAA Stop UAG Stop
COMS 4761 --2007
UGU Cysteine UGC Cys
UGA Stop UGG Tryptophan
28
14
tRNA Provides Translation Units Anticodon 3’ CGA 5’ binds to codon 5’ GCU 3’ of mRNA It translates GCU to Alanine
COMS 4761 --2007 http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html
29
Translation Basics Initiation: Ribosome binds to mRNA; moves in 5’3’ until it finds Start codon AUG
Elongation Ribosome recruits tRNA to match next codon tRNA binds its AA into peptide bond with protein Ribosome releases tRNA and moves to next codob
Termination Until a Stop codon is reached Release factor releases polypeptide from ribosome COMS 4761 --2007 http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html
30
15
Animation Translation of RNA into proteins
COMS 4761 --2007
31
Proteins Are Sequences of Amino Acids Proteins are constructed through peptide bonds Proteins are folded into complex conformations Proteins perform functions by binding Transcription factors and polymerase bind to DNA Enzymes bind to molecules to accelerate their reactions Globins bind to oxygen to transport it Antibodies bind to pathogens
COMS 4761 --2007
32
16
Example: Hemoglobin
COMS 4761 --2007
33
Sickle-Cell Anemia: A Single Nucleotide Change
Codon 6 in β-globin
COMS 4761 --2007
Sickle structure 34
17
Evolution of β-Globin
(α-globin cluster is coded by chromosome 16 )
COMS 4761 --2007
35
The Evolution of α-Globin Across Species
COMS 4761 --2007
36
18
Protein Structures
COMS 4761 --2007
37
Protein Structure Is Of Central Importance Structure is found through complex crystallography X-ray diffraction; NMR
The holy-grail: compute structure from sequence Ab-initio: compute structure directly from sequence Homology techniques: use similarity to known proteins
Structure is conserved across wide variations Small number of fold families (α-helix, β-sheets…) There are rules (e.g., hydrophobic AA are packed inside) Nature folds proteins very fast
So why is it so difficult to predict structure?
COMS 4761 --2007
38
19
SwissProt vs. PDB Statistics PDB ~30k structures
COMS 4761 --2007
39
Proteins Interact Via Active Sites Protein interactions are defined by active sites E.g., antibody with pathogen E.g., drug design
Proteins use geometry: ligands latch with holes Proteins use physics: electrical fields How can protein-protein interactions be computed?
COMS 4761 --2007
40
20
Sample Bioinformatics Challenges Analyzing protein sequence similarity Evolutionary conservation/changes
Computing structure from sequences Analyzing structure homologies Analyzing protein-2-protein interactions Inferring function from structure
COMS 4761 --2007
41
The Cell Cycle
COMS 4761 --2007
42
21
Cells Operate In Cycles G0 Phase cell is at rest
G1 Phase (4hrs) Cell either progresses into synthesis or leaves cell cycle to differentiate S Phase (10hrs) DNA Synthesis Checkpoint determines integrity of DNA G2 Phase (4hrs) Cell prepares for Mitosis Checkpoint determines integrity of DNA DNA is repaired or cell dies (Apoptosis) Mitosis (2hrs) Chromosomes are separated Cell divides
COMS 4761 --2007
43
The Cell Cycle is Regulated Transition among phases is controlled by a regulatory network Checkpoints are used to assure quality
COMS 4761 --2007
44
22
Evolution
COMS 4761 --2007
45
Optimizing Functionality DNA is substantially conserved through evolution Evolution = mutation + selection Mutation = single nucleotide polymorphism (SNP); duplication of entire DNA segments mating; recombination Selection = optimize fitness of species
Examples Metabolic nets learn to optimize energy budget (Alon 05)
Functional similarity Sequence similarity
COMS 4761 --2007
46
23