Chapter 1: Bio Primer - Columbia University

17 downloads 2004 Views 7MB Size Report
1.1 Cell Structure; DNA; RNA; transcription; translation; proteins. COMS 4761 -- 2007. 2 ... B. Alberts et al, “Molecular Biology of The Cell”, 4th edition, Garland. Science. ... J.D. Watson et al, “Molecular Biology of The Gene”, 5th edition,. Pearson ...
Chapter 1: Bio Primer

1.1 Cell Structure; DNA; RNA; transcription; translation; proteins

Prof. Yechiam Yemini (YY)

Computer Science Department Columbia University COMS 4761 --2007

Overview    

Cell structure and mechanisms DNA; RNA; Transcription; Regulation Translation; protein; sequence & structure References:  B. Alberts et al, “Molecular Biology of The Cell”, 4th edition, Garland Science.  R. Horton et al, “Principles of Biochemistry”, 3rd Edition, Prentice Hall.  J.D. Watson et al, “Molecular Biology of The Gene”, 5th edition, Pearson Benjamin Cummings.  NCBI Introductory overview: http://www.ncbi.nih.gov/About/primer/index.html  Animation sites: o http://www.johnkyrk.com/ o http://vcell.ndsu.nodak.edu/~christjo/vcell/animationSite

COMS 4761 --2007

2

1

Organisms Are Made of Cells

COMS 4761 --2007

3

Prokaryotes & Eukaryotes Have Different Cells  Prokaryotes: single cell organisms without nucleus E.g., Bacteria: E-coli, H-Pylori

 Eukaryotes: single/multi-cell organisms with nucleus E.g., Yeast, plants, drosophila, humans Earth formed -4.5B yrs Prokaryotic bacteria

-3.5B yrs

-1.5B yrs Nucleated cells Multi-cellular -0.5B yrs eukaryotes

© Pearson; Benjamin COMS Cummings 4761 --2007

4

2

Prokaryotes Single cell; size 0.2-2µm

Eukaryotes Single or multi cell; cell size 10-100µm

No nucleus Nucleus Structure One membrane at cell boundary Multiple membranes/compartments

DNA

No organelles No cytoskeleton

Organelles: mitochondria, Golgi, chloroplasts Cytoskeleton

Single circular DNA

Two or more chromosomes

Genes code proteins Genes have large non-coding regions (introns) 90% of DNA encodes proteins 95-97% non-coding DNA

Proteins

~105-6 base pairs

~107-9 base pairs

DNA is loosely organized

DNA is tightly packed (chromatin + histones)

Cell division through fission 1-2k protein species

Mitosis 5-20k protein species

~106 proteins per cell

~109 proteins per cell

COMS 4761 --2007

5

Cells Are Made of Macromolecules Small molecules: 3%

Macromolecules: 26%

Sugars

Polysaccharides

Fatty Acids

Fats, Lipids, Membranes

Amino Acids

Proteins

Nucleotides

Nucleic Acids (DNA, RNA) Molecules

% weight

Water Inorganic ions Sugars Amino acids Nucleotides Fatty acids Other small molecules Macromolecules (proteins, DNA, RNA, polysaccharides) COMS 4761 --2007

70% 1% 1% 0.4% 0.4% 1% 0.2% 26% 6

3

DNA Structure

COMS 4761 --2007

7

The Central Dogma of Biology

DNA

Transcription

RNA

Translation

Protein

 DNA stores hereditary information  DNA is transcribed into RNA  RNA is translated into proteins  Proteins perform the key functions of cells

COMS 4761 --2007

8

4

DNA Consists of Sequences of Nucleotides  DNA strands are sequences of nucleotides Backbone

T

+

T

Sugar Phosphate Base

Nucleotide

A

C

T

T

A

C

G

C

 Bases: Adenine, Guanine, Thymine, Cytosine

 DNA is organized in complementary double strands  Hydrogen bonds hybridize complementary pairs: AT, CG 5’-end Hydrogen bonds 3’-end

T A

G C

A T

T A

T A

G C

C G

COMS 4761 --2007

G

C 9

DNA Forms A Double Helix Helix full turn: 10.5bp Vertical hydrogen bonds support the structure Major and minor grooves provide access by proteins (e.g., transcription factors)

COMS 4761 --2007

10

5

DNA Is Tightly Packed  DNA is 2m long; needs to fold into 10-6m nucleus  Chromatin beads fold around 4 histones  Transcription needs to unpack the DNA to copy it

COMS 4761 --2007

11

Sample Bioinformatics Challenges Sequencing the genome Discovering sequence similarity Discovering genes Analyzing evolutionary relationships Discovering other important structures Distinguishing exons from introns Regulatory structures: (promoters & transcription factors) Regions expressing micro RNA ….

COMS 4761 --2007

12

6

Transcription

COMS 4761 --2007

13

Schematics

DNA Transcription mRNA Translation Protein COMS 4761 --2007

14

7

Overview

A. Assembling transcription complex

B. Transcribing DNA to mRNA

C. Removing introns

COMS 4761 --2007

15

Animation The Transcription Process

COMS 4761 --2007

16

8

Transcription Details http://cwx.prenhall.com/horton/medialib/

From PDB

COMS 4761 --2007

17

Transcription Factors  TFs bind to promoters regions and to RNA polymerases  TFs regulate the rate of transcription (up/down)  Regulation is yet to be well understood

COMS 4761 --2007

18

9

Transcription Is Regulated

COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 19

Example The Lac Operon Lac consists of 3 genes; commonly transcribed Used by bacteria to transport and metabolize lactose

cAMP activates transcription to initiate transport & metabolism of lactose

COMS 4761 --2007

20

10

Lac Activation Low-level sugar  generate cAMP  cAMP  binds with CRP; adjusts its alpha helix to fit the DNA grooves and binds with it CRP-cAMP  accelerates polymerase binding

Lac

Lac

COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 21

Splicing The Introns

COMS 4761 --2007 http://cwx.prenhall.com/horton/medialib/ 22

11

From Genes To Networks Regulation is organized in networks Top: gene network regulating the body development of sea urchin Middle: a promoter region Bottom: interaction of two modules

COMS 4761 --2007

23

Regulatory Networks Can Be Complex Genetic regulatory network controlling the development of the body plan of the sea urchin embryo Davidson et al., Science, 295(5560):1669-1678.

COMS 4761 --2007

24

12

Sample Bioinformatics Challenges  Discovering and analyzing transcription factors Evolutionary analysis; motifs finding

Discovering the structure of regulatory networks Analyzing the operations of regulatory networks Designing synthetic regulatory networks

COMS 4761 --2007

25

Translation

COMS 4761 --2007

26

13

RNA Encodes Protein Sequences DNA

Transcription

RNA

Translation

Protein

 Proteins are sequences of amino-acids (AA)  Translation uses RNA sequence as a template to construct AA sequence

 The coding problem:  Code sequence of 20 amino-acids using 4 nucleic acids  2 nucleic acids can code only 42=16 amino-acids  Codon: sequence of 3 nucleic acids; encodes amino acid

 Translation: translate mRNA codons to amino acids  Start/Stop codons define an open reading frame(ORF)  Translation requires reading/identifying codons and forming a respective protein sequence

COMS 4761 --2007

27

The Genetic Code U U C A

A

G

UUU Phenylalanine UUC Phe UUA Leucine UUG Leu

UCU Serine UCC Ser UCA Ser UCG Ser

UAU Tyrosine UAC Ty

CUU Leu CUC Leu CUA Leu CUG Leu

CCU Proline CCC Pro CCA Pro CCG Pro

CAU Histidine CAC His CAA Glutamine CAG Gln

CGU Arginine CGC Arg CGA Arg CGG Arg

AAU Asparagine AAC Asn AAA Lysine AAG Lys

AGU Serine AGC Ser AGA Arg AGG Arg

GAU Aspartate GAC Asp GAA Glutamate GAG Glu

GGU Glycine GGC Gly GGA Gly GGG Gly

AUU Isoleucine AUC Ile AUA Ile

AUG

G

C

ACU Threonine ACC Thr ACA Thr Methionine ACG Thr

GUU Valine GUC Val GUA Val GUG Val

GCU Alanine GCC Ala GCA Ala GCG Ala

UAA Stop UAG Stop

COMS 4761 --2007

UGU Cysteine UGC Cys

UGA Stop UGG Tryptophan

28

14

tRNA Provides Translation Units  Anticodon 3’ CGA 5’ binds to codon 5’ GCU 3’ of mRNA  It translates GCU to Alanine

COMS 4761 --2007 http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html

29

Translation Basics  Initiation:  Ribosome binds to mRNA; moves in 5’3’ until it finds Start codon AUG

 Elongation  Ribosome recruits tRNA to match next codon  tRNA binds its AA into peptide bond with protein  Ribosome releases tRNA and moves to next codob

 Termination  Until a Stop codon is reached  Release factor releases polypeptide from ribosome COMS 4761 --2007 http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html

30

15

Animation Translation of RNA into proteins

COMS 4761 --2007

31

Proteins Are Sequences of Amino Acids  Proteins are constructed through peptide bonds  Proteins are folded into complex conformations  Proteins perform functions by binding Transcription factors and polymerase bind to DNA Enzymes bind to molecules to accelerate their reactions Globins bind to oxygen to transport it Antibodies bind to pathogens

COMS 4761 --2007

32

16

Example: Hemoglobin

COMS 4761 --2007

33

Sickle-Cell Anemia: A Single Nucleotide Change

Codon 6 in β-globin

COMS 4761 --2007

Sickle structure 34

17

Evolution of β-Globin

(α-globin cluster is coded by chromosome 16 )

COMS 4761 --2007

35

The Evolution of α-Globin Across Species

COMS 4761 --2007

36

18

Protein Structures

COMS 4761 --2007

37

Protein Structure Is Of Central Importance  Structure is found through complex crystallography  X-ray diffraction; NMR

 The holy-grail: compute structure from sequence  Ab-initio: compute structure directly from sequence  Homology techniques: use similarity to known proteins

 Structure is conserved across wide variations  Small number of fold families (α-helix, β-sheets…)  There are rules (e.g., hydrophobic AA are packed inside)  Nature folds proteins very fast

 So why is it so difficult to predict structure?

COMS 4761 --2007

38

19

SwissProt vs. PDB Statistics PDB ~30k structures

COMS 4761 --2007

39

Proteins Interact Via Active Sites  Protein interactions are defined by active sites E.g., antibody with pathogen E.g., drug design

 Proteins use geometry: ligands latch with holes  Proteins use physics: electrical fields  How can protein-protein interactions be computed?

COMS 4761 --2007

40

20

Sample Bioinformatics Challenges Analyzing protein sequence similarity Evolutionary conservation/changes

Computing structure from sequences Analyzing structure homologies Analyzing protein-2-protein interactions Inferring function from structure

COMS 4761 --2007

41

The Cell Cycle

COMS 4761 --2007

42

21

Cells Operate In Cycles  G0 Phase  cell is at rest

 G1 Phase (4hrs)  Cell either progresses into synthesis or  leaves cell cycle to differentiate  S Phase (10hrs)  DNA Synthesis  Checkpoint determines integrity of DNA  G2 Phase (4hrs)  Cell prepares for Mitosis  Checkpoint determines integrity of DNA  DNA is repaired or cell dies (Apoptosis)  Mitosis (2hrs)  Chromosomes are separated  Cell divides

COMS 4761 --2007

43

The Cell Cycle is Regulated  Transition among phases is controlled by a regulatory network  Checkpoints are used to assure quality

COMS 4761 --2007

44

22

Evolution

COMS 4761 --2007

45

Optimizing Functionality  DNA is substantially conserved through evolution  Evolution = mutation + selection Mutation = single nucleotide polymorphism (SNP); duplication of entire DNA segments mating; recombination Selection = optimize fitness of species

 Examples Metabolic nets learn to optimize energy budget (Alon 05)

 Functional similarity Sequence similarity

COMS 4761 --2007

46

23