Lecture series D3
“DNA structure, packaging, and
replication”
notes based on Alberts et al 4th ed. (2002) Chapters 4 and 5
prepared by T. J. Newman, October 11-October 14, 2005
this document not
for public use – all images copyright Garland Science Publishing 2002
INTRODUCTION
·
In the next few
lectures we discuss the molecular basis for genetics
·
We will study
o
structure and
packaging of DNA
o
replication of
DNA
STRUCTURE OF DNA
·
As we noted in
previous lectures, DNA is a nucleic acid
o
thus composed of
a chain of nucleotides
·
The sugar in the
nucleotides is deoxyribose
·
The bases are of
4 types:
o
adenine (A) (purine)
o
cytosine (C) (pyrimidine)
o
guanine (G) (purine)
o
thymine (T) (pyrimidine)
·
The nucleotides
join together to form a sugar-phosphate backbone
o
with a
directional polarity: 3’ end (hydroxyl group on sugar) to 5’ end (phosphate
group)
·
Thus, each DNA
strand, however long, has a 3’ end and a 5’ end
·
In cells, DNA
exists in a double-stranded form
o
with the strands
joined together by complementary base-pairing
·
The particular
chemical properties of the strand-strand hydrogen bonds leads to DNA having a
double-helix structure
·
The bases are
hydrogen-bonded on the inside of the double-helix
·
In complementary
base-pairing, purines always bond with pyrimidines
·
And, because of
the particular nature of the hydrogen bonds (i.e. energetically favorable
configuration),
o
adenine always
bonds with thymine (A-T pair)
o
cytosine always
bonds with guanine (C-G pair)
·
Thus, one strand
of the double-helix is the precise complement of the other strand



HEREDITY
·
The double-helix
structure of DNA answers two fundamental questions in molecular biology:
o
how is genetic
information encoded in chemical form?
o
how can this
information be copied reliably?
·
The information
is encoded in the sequence of nucleotides
·
This sequence
uniquely specifies a corresponding sequence of amino acids
o
thus a region of
DNA uniquely specifies the primary structure of a protein
·
The key from
nucleotide sequence to amino acid sequence is called the genetic code
o
each triplet of
nucleotides codes for an amino acid
o
this was worked
out a decade or so after Watson and Crick’s 1953 discovery of the double-helix
o
the U is uracil,
the equivalent of T in RNA
o
note the
redundancy in coding for many of the amino-acids

·
The
entire sequence of nucleotides in the cell of an organism is called the genome for that organism
·
Only
some regions of the genome (exons) code for amino acids
o
e.g.
yellow coding regions for b-globin gene shown below:

·
Here
are some numbers relating to the human genome:
o
each
cell contains about 2 metres of DNA
o
Length
of DNA: ~3,000,000,000 base pairs
o
Number
of genes: ~30,000
o
Largest
gene: ~2,000,000 base pairs
o
Mean
gene size: ~27,000 base pairs
o
%
DNA in exons: ~1.5%
o
%
DNA in highly repetitive regions: ~50%
o
Mean
number of exons per gene: ~9
·
Reliable
copying of DNA is possible since each strand acts as a template for a new complementary strand to be formed
o
the
mechanics of this copying relies on a complicated protein machine, as we shall
see

PACKAGING DNA
·
In human cells,
2m of DNA is bundled up into a nucleus 3 microns in radius
o
equivalent to 24
miles of ultra-fine thread wrapped up inside a tennis ball
·
In eukaryotes, the
DNA is exquisitely wrapped in a hierarchical fashion
·
At the largest
scale, DNA is organized on DNA/protein objects called chromosomes
o
the DNA/protein
complex is termed chromatin
·
The human genome
consists of 22 homologous pairs of somatic chromosomes, and two sex chromosomes
o
XY in males and
XX in females
·
These chromosomes
(stained during mitosis) are shown below, numbered in approximate order of size
o
although used for
many years to identify chromosomes, these banding structures are not well
understood
o
the dashed line
indicates the location of the centromeres (where
duplicated chromosomes are linked during mitosis)

·
Human chromosome
22 was the first to be sequenced, in 1999.
o
it contains 48
million nucleotide pairs (about 1.5% of the genome)
·
The genomes of
individual humans differ from each other in about 0.1% of nucleotides
·
Exons code for actual amino acids, while introns appear to be relatively unimportant (in terms
of information coding)
·
The regulatory
gene sequence adjacent to the gene contains information about when and in which
cell type the gene is to be expressed
·
The average size
of a human gene is 27000 nucleotide pairs, while only about 1300 pairs are
necessary to code for the average protein of ~400 amino acids. The bulk of each
gene consists of introns.

·
Large regions of
the genome are not genes, but rather consist of highly repetitive sequences
o
these are called transposable elements,
and come in a variety of types

·
It is non-trivial
to actually identify the beginning and ends of genes, amidst the non-coding DNA
·
One technique to
locate genes is to compare genomes of different organisms (e.g. humans and
mice)
·
Genes would be
highly conserved under evolution, while non-coding regions, if of no
importance, would diverge over large time scales (many millions of years)
·
Such studies
reveal that the humans and mice share most of the same genes, in the same order
o
such conservation
of gene order between organisms is termed “conserved
synteny”
·
The replication
of chromosomes occurs during the cell cycle
o
the replication
depends on certain structures in the chromosome
§
replication origin – region where
duplication of DNA begins
§
centromere – region where mitotic spindle attaches to separate
duplicated chromosomes
§
telomere – repeated sequences at the ends of the chromosome,
ensuring that entire chromosome is replicated
·
We will study the
larger scale processes involved in mitosis later in the course

·
Chromosomal DNA
is hierarchically packaged to enable rapid access to particular sequences
allowing gene expression

·
The smallest
level of organization is the nucleosome
(discovered in 1974)
·
Each nucleosome
consists of a stretch of DNA (146 pairs long) wrapped around a protein octomer
complex
o
this complex
consists of 8 histone proteins
(two each of proteins H2A, H2B, H3, and H4)
o
the DNA wraps
about 1 ½ turns around the complex, with over 100 hydrogen bonds firmly
anchoring the DNA to the histones
o
due to the
ancient evolutionary significance of histones, their sequence is very highly
conserved
§
e.g. histone H4
in a cow and a pea differ at only 2 of the 102 amino acid positions

·
Nucleosomes are
separated by regions of DNA about 80 pairs long (called linker DNA)

·
Nucleosomes
usually exist in a condensed form in a chromatin
fiber
·
Protein machines,
called chromatin remodeling complexes can change
the structure of the fiber
o
loosening some
regions of nucleosomes, or
o
changing the
relative spacing of nucleosomes
·
The remodeling complexes play a key role in
gene expression
·
DNA arranged in
chromatin fibers is not packaged tightly enough to fit inside the nucleus:
further coiling of the fibers is necessary
·
Chromatin fibers either
exist in a loosely looped structure (euchromatin),
or else in a highly condensed form (heterochromatin)
o
the latter is especially
found around chromosomal regions such as centromeres and telomeres
o
heterochromatin
does not usually contain genes
·
Larger scale
chromosome structure is as yet poorly understood

MUTATION OF DNA SEQUENCES
·
The integrity of
the DNA sequence is crucial to the survival of the organism
·
Accidental
changes to the genome are called mutations
·
However, a very
low degree of mutations is absolutely critical to provide genetic variation for
natural selection to work upon
·
Experiments on E. coli indicate that there is a
mutation of one nucleotide
base per billion bases per cell generation
o
in other words, a
typical bacterial gene (of 1000 bases) suffers a single base mutation once
every million generations
·
Estimates in
mammals have been done by looking at fibrinopeptides, which are “vestigial”
subunits of the protein fibrinogen
o
it is estimated
that a protein of 400 amino acids will suffer an amino acid change in the germ
line once every 200,000 yr
·
Normalized
estimates from various organisms indicate the mutation rate of 1 base per billion
per DNA replication is a reliable figure
·
Some mutations
are “silent” meaning they change the nucleotide
sequence but not the amino acid sequence
o
hence the
phenotype is unchanged
DNA REPLICATION MECHANISMS
·
The underlying
mechanism for replication is
o
separation of
double-stranded DNA into two single strands
§
thus exposing the
bases to new complementary bases
o
attachment of the
new complementary base from an appropriate deoxyribonucleoside triphosphate
molecule (e.g. ATP)

·
The reaction of
joining the new base to the end of the primer strand (and hydrogen bonding to the template strand) is
catalyzed by a protein called DNA polymerase (sometimes referred to as DNA-ase)
·
The DNA-ase is
part of a larger multi-protein complex which moves along the original DNA
molecule at the side of the Y-shaped junction where the two strands are
separated – this junction is called a replication
fork
·
This complex is
staggering in its functionality – here is a cartoon of the whole process:

·
Part of the
reason for the complexity of this replication machine is that DNA polymerases
only ever catalyze nucleotide polymerization in the 5’-3’ direction
o
the double helix
consists of two strands – one in the 5’-3’ direction, and the other in the
3’-5’ direction
o
thus a special
mechanism is required to replicate bases on the 3’-5’ strand
o this strand is known as the lagging strand, while the “ easier” 5’-3’ strand is