Lecture series D3

“DNA structure, packaging, and replication”

notes based on Alberts et al 4th ed. (2002) Chapters 4 and 5

 

prepared by T. J. Newman, October 11-October 14, 2005

 

this document not for public use – all images copyright Garland Science Publishing 2002

 

INTRODUCTION

 

·        In the next few lectures we discuss the molecular basis for genetics

·        We will study

o       structure and packaging of DNA

o       replication of DNA

 

STRUCTURE OF DNA

 

·        As we noted in previous lectures, DNA is a nucleic acid

o       thus composed of a chain of nucleotides

·        The sugar in the nucleotides is deoxyribose

·        The bases are of 4 types:

o       adenine (A)            (purine)

o       cytosine (C)          (pyrimidine)

o       guanine (G)           (purine)

o       thymine (T)           (pyrimidine)

·        The nucleotides join together to form a sugar-phosphate backbone

o       with a directional polarity: 3’ end (hydroxyl group on sugar) to 5’ end (phosphate group)

·        Thus, each DNA strand, however long, has a 3’ end and a 5’ end

·        In cells, DNA exists in a double-stranded form

o       with the strands joined together by complementary base-pairing

·        The particular chemical properties of the strand-strand hydrogen bonds leads to DNA having a double-helix structure

·        The bases are hydrogen-bonded on the inside of the double-helix

·        In complementary base-pairing, purines always bond with pyrimidines

·        And, because of the particular nature of the hydrogen bonds (i.e. energetically favorable configuration),

o       adenine always bonds with thymine (A-T pair)

o       cytosine always bonds with guanine (C-G pair)

·        Thus, one strand of the double-helix is the precise complement of the other strand

 

     

 

 

 

 

HEREDITY

 

·        The double-helix structure of DNA answers two fundamental questions in molecular biology:

o       how is genetic information encoded in chemical form?

o       how can this information be copied reliably?

·        The information is encoded in the sequence of nucleotides

·        This sequence uniquely specifies a corresponding sequence of amino acids

o       thus a region of DNA uniquely specifies the primary structure of a protein

·        The key from nucleotide sequence to amino acid sequence is called the genetic code

o       each triplet of nucleotides codes for an amino acid

o       this was worked out a decade or so after Watson and Crick’s 1953 discovery of the double-helix

o       the U is uracil, the equivalent of T in RNA

o       note the redundancy in coding for many of the amino-acids

 

                                    

 

·        The entire sequence of nucleotides in the cell of an organism is called the genome for that organism

·        Only some regions of the genome (exons) code for amino acids

o       e.g. yellow coding regions for b-globin gene shown below:

 

 

·        Here are some numbers relating to the human genome:

o       each cell contains about 2 metres of DNA

o       Length of DNA:              ~3,000,000,000 base pairs

o       Number of genes:            ~30,000

o       Largest gene:                   ~2,000,000 base pairs

o       Mean gene size:               ~27,000 base pairs

o       % DNA in exons:            ~1.5%

o       % DNA in highly repetitive regions:     ~50%

o       Mean number of exons per gene:         ~9

 

·        Reliable copying of DNA is possible since each strand acts as a template for a new complementary strand to be formed

o       the mechanics of this copying relies on a complicated protein machine, as we shall see

 

 

 

PACKAGING DNA

 

·        In human cells, 2m of DNA is bundled up into a nucleus 3 microns in radius

o       equivalent to 24 miles of ultra-fine thread wrapped up inside a tennis ball

·        In eukaryotes, the DNA is exquisitely wrapped in a hierarchical fashion

·        At the largest scale, DNA is organized on DNA/protein objects called chromosomes

o       the DNA/protein complex is termed chromatin

·        The human genome consists of 22 homologous pairs of somatic chromosomes, and two sex chromosomes

o       XY in males and XX in females

·        These chromosomes (stained during mitosis) are shown below, numbered in approximate order of size

o       although used for many years to identify chromosomes, these banding structures are not well understood

o       the dashed line indicates the location of the centromeres (where duplicated chromosomes are linked during mitosis)

 

 

·        Human chromosome 22 was the first to be sequenced, in 1999.

o       it contains 48 million nucleotide pairs (about 1.5% of the genome)

·        The genomes of individual humans differ from each other in about 0.1% of nucleotides

·        Exons code for actual amino acids, while introns appear to be relatively unimportant (in terms of information coding)

·        The regulatory gene sequence adjacent to the gene contains information about when and in which cell type the gene is to be expressed

·        The average size of a human gene is 27000 nucleotide pairs, while only about 1300 pairs are necessary to code for the average protein of ~400 amino acids. The bulk of each gene consists of introns.

 

 

·        Large regions of the genome are not genes, but rather consist of highly repetitive sequences

o       these are called transposable elements, and come in a variety of types

 

 

·        It is non-trivial to actually identify the beginning and ends of genes, amidst the non-coding DNA

·        One technique to locate genes is to compare genomes of different organisms (e.g. humans and mice)

·        Genes would be highly conserved under evolution, while non-coding regions, if of no importance, would diverge over large time scales (many millions of years)

·        Such studies reveal that the humans and mice share most of the same genes, in the same order

o       such conservation of gene order between organisms is termed “conserved synteny

 

·        The replication of chromosomes occurs during the cell cycle

o       the replication depends on certain structures in the chromosome

§        replication origin – region where duplication of DNA begins

§        centromere – region where mitotic spindle attaches to separate duplicated chromosomes

§        telomere – repeated sequences at the ends of the chromosome, ensuring that entire chromosome is replicated

·        We will study the larger scale processes involved in mitosis later in the course

 

 

·        Chromosomal DNA is hierarchically packaged to enable rapid access to particular sequences allowing gene expression

 

 

·        The smallest level of organization is the nucleosome (discovered in 1974)

·        Each nucleosome consists of a stretch of DNA (146 pairs long) wrapped around a protein octomer complex

o       this complex consists of 8 histone proteins (two each of proteins H2A, H2B, H3, and H4)

o       the DNA wraps about 1 ½ turns around the complex, with over 100 hydrogen bonds firmly anchoring the DNA to the histones

o       due to the ancient evolutionary significance of histones, their sequence is very highly conserved

§        e.g. histone H4 in a cow and a pea differ at only 2 of the 102 amino acid positions

 

 

·        Nucleosomes are separated by regions of DNA about 80 pairs long (called linker DNA)

 

 

·        Nucleosomes usually exist in a condensed form in a chromatin fiber

·        Protein machines, called chromatin remodeling complexes can change the structure of the fiber

o       loosening some regions of nucleosomes, or

o       changing the relative spacing of nucleosomes

·         The remodeling complexes play a key role in gene expression

 

·        DNA arranged in chromatin fibers is not packaged tightly enough to fit inside the nucleus: further coiling of the fibers is necessary

·        Chromatin fibers either exist in a loosely looped structure (euchromatin), or else in a highly condensed form (heterochromatin)

o       the latter is especially found around chromosomal regions such as centromeres and telomeres

o       heterochromatin does not usually contain genes

·        Larger scale chromosome structure is as yet poorly understood

 

 

MUTATION OF DNA SEQUENCES

 

·        The integrity of the DNA sequence is crucial to the survival of the organism

·        Accidental changes to the genome are called mutations

·        However, a very low degree of mutations is absolutely critical to provide genetic variation for natural selection to work upon

·        Experiments on E. coli indicate that there is a mutation of one nucleotide base per billion bases per cell generation

o       in other words, a typical bacterial gene (of 1000 bases) suffers a single base mutation once every million generations

·        Estimates in mammals have been done by looking at fibrinopeptides, which are “vestigial” subunits of the protein fibrinogen

o       it is estimated that a protein of 400 amino acids will suffer an amino acid change in the germ line once every 200,000 yr

·        Normalized estimates from various organisms indicate the mutation rate of 1 base per billion per DNA replication is a reliable figure

·        Some mutations are “silent” meaning they change the nucleotide sequence but not the amino acid sequence

o       hence the phenotype is unchanged

 

DNA REPLICATION MECHANISMS

 

·        The underlying mechanism for replication is

o       separation of double-stranded DNA into two single strands

§        thus exposing the bases to new complementary bases

o       attachment of the new complementary base from an appropriate deoxyribonucleoside triphosphate molecule (e.g. ATP)

 

 

·        The reaction of joining the new base to the end of the primer strand (and hydrogen bonding to the template strand) is catalyzed by a protein called DNA polymerase (sometimes referred to as DNA-ase)

·        The DNA-ase is part of a larger multi-protein complex which moves along the original DNA molecule at the side of the Y-shaped junction where the two strands are separated – this junction is called a replication fork

·        This complex is staggering in its functionality – here is a cartoon of the whole process:

 

 

·        Part of the reason for the complexity of this replication machine is that DNA polymerases only ever catalyze nucleotide polymerization in the 5’-3’ direction

o       the double helix consists of two strands – one in the 5’-3’ direction, and the other in the 3’-5’ direction

o       thus a special mechanism is required to replicate bases on the 3’-5’ strand

o       this strand is known as the lagging strand, while the “ easier” 5’-3’ strand is